How we go from data to discovery to decisions.

HBGDki created a central repository of existing and new study data related to human growth and development. HBGDki scientists explore and analyze the data to answer targeted questions, using state-of-the-art analysis methods and tools, in a data science rally – a 2-week sprint based on Agile software development principles.

Explore the process in action

Our methodology

Researchers have been studying birth, growth, and development for decades, but most of the data that has been collected has been stored on hard drives or in file cabinets where nobody else has access to them. HBGDki is collecting study data from around the world and creating a central, secure repository to store it.

The reason? There are many questions for which we need answers to solve devastating problems in global health, and we need to leverage as much data as possible to make new discoveries and support policy decisions. The Healthy Birth, Growth, and Development program (HBGD) addresses the dual problem of growth stunting and poor neurocognitive development, including contributing factors such as fetal growth restriction and preterm birth. The HBGD program will create one unified strategy for integrated interventions to solve complex questions about (1) life cycle, (2) pathophysiology, (3) interventions, and (4) scaling intervention delivery.

The Healthy Birth, Growth, and Development knowledge integration initiative (HBGDki) aims to answer 5 prioritized HBGD questions by analyzing the large body of existing data.

HBGDki visualizes and analyzes the data using state-of-the-art analysis methods and tools in data science rallies based on Agile software development principles. New insights, tools, and methods generated from these explorations help us answer key questions about human growth and development.

The high-level vision and guide to our methodology can be summed up into 3 phases: data, discovery, and decisions.

5 Key Questions
  1. To what extent is growth faltering explained by pre- vs postnatal insults?
  2. What kind of recovery can we expect in infants born small for gestational age (SGA)?
  3. Can we quantitatively characterize the relation and interaction between preterm birth, physical growth, and brain development?
  4. Are there disproportionately large contributions to growth faltering from specific pathways, and can we rank-order risk factors?
  5. Are there specific pathways directly impacting linear growth faltering that coincide with increased risk of noncommunicable diseases such as cardiovascular disease, obesity, and diabetes?


Researchers from around the world have provided data to HBGDki from more than 120 observational studies and randomized controlled clinical trials representing almost 10 million children from 25 countries and almost 1900 variables. The knowledge base also includes census data and population surveys.

The data included in the HBGDki database is expanding rapidly, and includes longitudinal anthropometric, clinical, and pathophysiologic covariates; genomic and proteomic data; and cross-sectional surveys. By curating aggregated data sets into the HBGDki knowledge base, researchers can ask bigger and broader questions and apply complementary hypothetical-deductive, data-driven inductive, or hypothesis-free modeling approaches.

We have established collaborations with principal investigators that enable access to many measured clinical variables related to growth and development, including gut function capacity, a child’s history of enteric infection, and cognitive function.


modeling tools

HBGDki is using modern modeling and visualization methods to understand and analyze data in the knowledge base. We use 4 approaches to modeling.

  • Causal models describe cause and effect. A causal model may help determine how an intervention or combination of interventions may affect physical growth or neurocognitive development.
  • Population models describe populations at large. For example, a population model could help us understand how the burden of disease is changing.
  • Empirical models help us make sense of data from studies by fitting a curve to the data to identify key trends. Empirical models, based on observed data, are distinct from theoretical models that assume specific conditions.
  • Mechanistic models address an underlying biological mechanism that may be relevant to the outcome, such as the relation between gut dysfunction and stunting. Therefore, HBGDki data analysts are building a model of the human gut using mathematical equations to describe how the gut processes nutrients, and how gut function is related to physical growth and neurocognitive development.


collaborative data science rallies

A key part of the discovery phase of HBGDki involves data science rallies, which are based on efficient, sprint-like efforts adapted from the Agile software development method. This approach to data science is producing actionable answers to specific questions about child growth and development. HBGDki models are used during rally sprints, which usually run for 2 weeks and address a specific question, hypothesis, and deliverable. The key to the success of a rally is the teaming of data scientists with domain experts who are in continual communication throughout the course of the rally.

The overarching process involves using new modeling and data visualization methods to understand and analyze data from the knowledge base, and provide insights about child health assessment and interventions. Rally members begin each sprint with a planning session to determine sprint objectives. Throughout the rally, team members review the analysis and modeling methods and results, and report on progress and impediments.

As the rally is completed, team members report the outcomes and review the experience to help improve future rallies or other projects. Rally reports are submitted, and next steps are recommended.


Rallies are a cyclical process and start with a question


See Active and Past Rallies


The data and discovery processes are geared toward informing decisions about how the Foundation and larger global health community can use limited resources to have the greatest impact on children’s lives.

The strategy discussed by Foundation CEO Sue Desmond-Hellmann is Precision Public Health – replacing one-size-fits-all solutions with customized solutions that deliver the right interventions to the right child at the right time, place, and cost.

By gathering sufficient data and analyzing them at a high enough resolution, we can start to organize children into clinically coherent groups, predict the risks each group faces, and design interventions to limit those risks. This new approach to global public health relies on detailed insights that HBGDki rallies are designed to generate.

The HBGDki team includes more than 150 people who help inform decisions, including principal investigators, data scientists, and Gates Foundation staff.

  • Principal investigators contribute data to the knowledge base, advise the analysts about how to interpret the data, and collaborate with data scientists and Foundation staff in data science rallies and other data analysis projects.
  • Data analysts, modelers, and visualization experts work with the knowledge base and collaborate with principal investigators and Foundation staff.
  • Gates Foundation staff provide global health expertise and collaborate with the analysts and principal investigators to create strategies for answering key questions.
  • HBGDki also includes multiple teams that tackle varied projects for the initiative.

Explore Further