The aggregation of research data across multiple studies, multiple methods, and over longer time spans has been an on-going problem for researchers for decades. The ability for scientists, data analysts, academic research institutions, and others to build on work done before is necessary to move research forward. But, often such prior research uses completely different models and methods, thereby limiting its inclusion in new models. They have needed a single metamodel that allows for a broader meta-analysis of any factors to be included.

A new method, Generalized Model Aggregation (GMA), co-created by MIT Sloan School of Management Hazhir Rahmandad, MIT Sloan research scientist Mohammad Jalali, and Prof. Kamran Paynabar of Georgia Tech has solved this long-standing issue by creating a model that allows the combination of any prior studies with diverse variable and designs.

GMA is allowing researchers across industries to use one metamodel for their data analysis. According to Dr. Rahmandad in a recent press release, “With a growing volume of research globally, we need methods to combine, contrast, and build on others’ work. This is reflected in the exponential growth of meta-analysis papers over the last decade. He noted that, “The value of a broader method for quantitative aggregation of prior research can be immense across many disciplines.”

Dr. Rahmandad’s current research revolves around basal metabolic rate (BMR) as “as a function of different body measures like fat, lean mass, age, and height.” The application of such a metamodel could ultimately span any industry seeking to combine data from a range of studies:

“The ability to combine these into a single equation would benefit research and practice. In the energy sector, multiple methods exist to estimate diffuse solar energy in a location using data from distant sensors, yet there is no method for a model that aggregates these methods into a single estimating equation.”

DATAVERSITY® recently interviewed Dr. Rahmandad about his new research and the GMA metamodel. Their paper can be found at: PLOS One.

DATAVERSITY (DV): What is your background and what led your work into meta-analysis and metamodels and the Generalized Model Aggregation?

Hazhir Rahmandad: I did my undergrad in industrial engineering and then my PhD in management with a concentration in system dynamics, which is modeling socio-technical systems. And then I was on the faculty of industrial engineering at Virginia Tech for a few years and recently joined the MIT Sloan faculty.

I have a research portfolio that is largely about building dynamic models of socio-technical systems, largely defined. For example, some of my work is in understanding why product development projects may go over budget and over time, or lose their quality. Some of my other work is about building good jobs in low cost retail.

And then a set of my projects has been related to healthcare domains, specifically obesity and depression were two of the areas I focused on. The genesis of this meta-analysis project comes from my work in obesity research. I was building a model of how human body weight changes over time and body composition in terms of fat mass versus fat free mass changes. In building that analytical model, if you’re feeding the energy intake, which is how much we eat, and physical activity in terms of exercise and so on, it would be able to create what is the best estimate for the weight trajectory and body composition trajectory. And, in the childhood years for growth and height, which is another component of overall human growth dynamic.

That is where it started. I looked around and didn’t find any metamodel that really solved the problem in a clean and consistent way, so I ended up essentially working on building a new metamodel to do that, and the GMA came out of that effort and it has been a few years that I have been collaborating with a couple of partners in this project to get it together.

DV: In terms of creating a model like this, for you in the terms of the research you’ve found, is this a new approach?

Hazhir Rahmandad: Yes, as far as I can tell there is really nothing that does this with the level of flexibility that GMA currently does. This falls into the general category of meta-analysis research which basically is, how can we bring together the results of prior empirical studies to get a more precise estimate for some causal effect or some empirical effect?

What is the impact of cholesterol on heart disease risk – there may be a dozen studies that have looked at this question, can be pooled together those prior studies outcomes into a single effect size that is more powerful and potentially has smaller confidence interval so that we can be more confident about the outcome. That is the basic idea of meta-analysis. Now there are very different methods for doing meta-analysis under different assumptions.

Many of them, not all, but the majority of them essentially assume that the prior studies are fairly similar in their design. You are looking at essentially the impact of x on y, x and y being two different construct variables for which we have data, and they have used relatively similar methods for measuring x and y and the relationship between the two. It might be a simple regression or some other estimation technique for looking at impact of x and y.

If those assumptions are in place and reasonable for all the prior studies, then the traditional meta-analysis does the job. But if you have studies that are a little bit different in their prior designs or measurement methods, then things get pretty complicated very quickly, either we cannot combine them or we can only combine them under very certain conditions in terms of those designs.

What GMA does is essentially allows us to combine these prior studies if for example x and y that we are looking at are measured using different instruments or different transformations have been applied to the original data, so maybe we have been estimating the impact of x on y in one study, and the pure x and y in another.

DV: In terms of your metamodel, and to aid our readers in understanding this new approach, what is a metamodel and how are you using one?

Hazhir Rahmandad: In the case of basal metabolic rate, we may say the relationship between explanatory variables, in that case, explanatory variables are things like your weight, your height, your age, your gender, your fat mass, and fat free mass, and others. There is a relationship that leads from these explanatory variables to our dependent variable of basal metabolic rate. And one can stipulate a linear relationship or a non-linear one, or a linear relationship in terms of such as age to the power of 2, or natural logarithm of your weight and things like that to add some kind of non-lineararity to our relationship between the explanatory variables and basal metabolic rate.

You can stipulate a structure for that model, and by structure I just mean whether it is a linear relationship or some non-linear relationship, and then the question is how do you estimate the parameters of that metamodel so that the metamodel is consistent with all the prior empirical results that have been captured in prior studies and reported as regression coefficients in prior studies.

So maybe you have, in our case we had 40 some equations coming from these prior studies, some of these equations only had age and height as the explanatory variables for explaining basal metabolic rate, some others had weight and height, and some others had fat mass and fat free mass and age, and other combinations of the explanatory variables. Each of them gave us an equation which is kind of the input into GMA.

Then the metamodel that we defined, we used a few different structures and you can essentially have different theories for how the explanatory variables are generating our response variable or dependent variable, and you can essentially estimate multiple metamodels and compare those to each other and see which of those theories underlying the metamodel are better supported by the data. We did that and in that one case we found a model that had all the linear terms but also the natural logarithm of fat mass and fat free mass to be the best explanatory model for the basal metabolic rate exactly.

DV: Let’s talk about other use cases or applications now. In terms of your research you can bring many different studies together that may not line up and gain a better understanding of them all, is that correct?

Hazhir Rahmandad: There are really a few different use cases, one is exactly what you mention, that is you have a bunch of prior studies, and they don’t quite look consistent with each other, so one of them may show that weight has a strong effect on basal metabolic rate, and the other one doesn’t show as big of an effect for weight, and you don’t know why that is. It might be that prior studies have measured things differently, it might be that some of those variables that are missing from study 1, but are present in study 2 explain the difference between these outcomes, or there might be some other explanation.

By building a metamodel that captures all of those relationships at the same time and estimates them using the results of prior studies, you can find out if a metamodel actually does a good job of reconciling those differences between the prior studies, in that case you don’t have much of a conflict, or you may find out no, there is something else that we have not captured in our metamodel and therefore we need to go and collect further data and find some other variable that is explaining the difference between those conflicting results and resolve it through additional empirical work. That is one important use case for the GMA.

Besides resolving prior potential confusion in prior work, it also gives us a way to kind of build more general models that look at a phenomenon in space and rather than sticking to one or another theory, combines multiple theories together, so that’s kind of building theory using prior empirical studies is the other type of benefit it gives us.

DV: The press release stated that, “When we compared our equation’s predictive power against other equations as well as state of the art equations from groups like the World Health Organization, we found that our equation outperforms all other equations available.”

Hazhir Rahmandad: To estimate our metamodel, our equation, we didn’t have any raw data, we had no single individual measurements that we were using in our estimation. All that we had was the published results of these 47 other equations from various prior studies. We didn’t even have the raw data, so we couldn’t run any regression or any direct way of estimating the equation using their data, all we had was their equations.

We used GMA to combine those equations into a metamodel and now that we had estimated our metamodel we could go and find some data set that was not used for the purpose of estimating any of those prior studies so it was independent of those and therefore neither those studies nor our data was informed by that data set, and we could test the predictions of our model against the actual basal metabolic rates data in that specific study and compare that predictive power with the predictive power of all those 47 equations, as well as state of the art equations that World Health Organization and Institute of Medicine currently use and have on their website for people who are interested in estimating basal metabolic rate.

We reached out to health researchers who were working in this space we told them do you have some data sets that you can share with us that has not been used and not been published in these prior studies. We found one data set with a couple of hundred data points individual measurements and we did the testing and the result of testing showed that our metamodel actually had a better predictive power than all those other equations that were in the literature. So, that was a little bit of additional confidence that this method can work in the real world as well.

About the author

Charles Roe is the Digital Content Manager at DATAVERSITY. He is responsible for both the article and blogging programs, as well as the DATAVERSITY Training Center. He has been a professional freelance writer and copy editor for more 15 years, and has been writing for the Data Management industry since 2009. Charles has written on a range of industry topics in numerous articles, white papers, and research reports including Data Governance, Big Data, NoSQL technologies, Data Science, Cognitive Computing, Business Intelligence & Analytics, Information Architecture, Data Modeling, Executive Management, Metadata Management, and a host of others.
Charles is backed with advanced degrees in English, History, and a Cambridge degree in Language Instruction. He worked for almost 20 years as an instructor of English, History, Culture, and Writing at the college level in the USA, Europe, and Turkey. He writes creatively in his spare time.