Data Science Spotlight: Advancing Mortality Data through Verbal Autopsy and Machine Learning

June 9, 2025

Data Science Spotlight: Advancing Mortality Data through Verbal Autopsy and Machine Learning

dr. sam clark in a grey sweatshirt in the outdoors

Determining cause of death is not always straightforward, especially in parts of the world where medical resources are lacking. In fact, fewer than one-third of global deaths receive an official cause, creating a growing need for automation and innovation in methods like verbal autopsy, which is a structured interview with a caregiver of the deceased and designed to estimate the cause of death.

TDAI core faculty member Dr. Samuel Clark began working on this issue over 15 years ago. He now leads a multidisciplinary team including data scientists, statisticians, and experts in machine learning and artificial intelligence who are dedicated to advancing verbal autopsy research. 

“A lot of places don’t have autopsies and things of that nature readily available,” Clark explains. “So the idea of verbal autopsy is to do something a lot simpler.”

In order to combat the lack of medical resources and in countries like Africa, Asia, Latin America, and even in parts of Europe, Clark and his research team developed OpenVA, a software platform that automates the coding of verbal autopsies using advanced algorithms, to aid in their research. This was followed by the development of a new algorithm called InSilicoVA, a probabilistic ML algorithm to process VA data and produce cause of death. 

As the research evolves, so does the need for a comprehensive training dataset. The Reference Death Archive is being developed to serve as a foundational resource for current and future machine learning and AI tools, helping to enhance the accuracy of cause-of-death determination.

Looking ahead, there are several promising paths to enhance the efficiency and effectiveness of this research. Dr. Clark outlined three key projects he plans to integrate into the ongoing work. First, incorporating additional components into verbal autopsy, such as data from minimally invasive autopsies, to enrich the information available. Secondly, leveraging machine learning to further streamline the process of determining cause of death. And third, developing an algorithm capable of identifying not only the underlying cause of death, but also contributing and immediate factors.

In the coming years, the team also plans to begin work on the Harmonized Cognitive Assessment Protocol, also known as the HCAP, which is a similar interview-based methos used to understand the progression of cognitive decline. Clark's team hopes to develop algorithms that will allow trained professionals to focus more on patient care while still gaining critical insights from these assessments.

“There is a lot to do,” said Clark. “The pace with which things move is [also] complicated. There are a lot of moving pieces, and coordinating all of those things and people…is very challenging.”

As Dr. Sam Clark and his dedicated team of researchers continue to work to their goal of creating a seamless and optimal way to provide ascertainment in cause of death, it is important to understand what drove them to do this research in the first place. Clark reemphasizes the idea that at the core of this research, there is a deeply human goal: to better understand what harms people’s health, so those in positions of power can respond with effective resources and solutions.

“It’s about improving health…," Clark said. "Which I believe is foundational to all of us being happy and productive individuals. If we can understand what is killing people, we can understand a lot about what’s making them sick or what direct harms they’re facing”.

 

Principal Investigator Samuel J. Clark is a demographer, epidemiologist, and data scientist working to develop new methods and research in these respective areas of expertise. Clark is a professor in the Department of Sociology at The Ohio State University and the Translational Data Analytics Institute. His drive to improve verbal autopsy as a tool, different models and estimates of human and child mortality, and the development of OpenVA will help refine global health measurement systems.

This research would not be possible without the support from organizations and universities across the globe. OpenVA is, in part, an output of the Bloomberg Philanthropies Data for Health Initiative. In addition to the Bloomberg Health Initiative, OpenVA is supported by the R01 grant from NICHD, The ALPHA Network, Vital Strategies, the CDC Foundation in support of the World Health Organization (WHO) Verbal Autopsy Reference Group, and the Institute for Population Research at The Ohio State University.

The Reference Death Archive project receives funding from the Bill and Melinda Gates Foundation in addition to support from the World Health Organization.

 

Learn more about the work Dr. Clark and his research team here: https://samclark.net/site/pages/projects.shtml