Genepattern and Dataverse Network

Lead investigators

Gary King (Harvard Faculty of Arts and Sciences Institute for Quantitative Social Sciences ) and Jill Mesirov (Broad Institute)

Description

Sophisticated mathematical methods find increasing application in both biomedical and social
science research. Computer algorithms implementing such methods are frequently assembled in “pipeline” or “workflow” structures to process very large sets of data. Ad-hoc scripting of pipeline analysis can be error-prone, difficult to reproduce, and labor-intensive. This challenge becomes even more relevant in the context of integrative approaches, where a multitude of data sources and methods are combined in the analysis of a single problem. Finally, the ability to capture the provenance of both data and the analytic method, including versions of software,
order of application, and parameter settings is key for the replication of studies and their results.

This project’s long-range goal is to provide tools and data repository support for a wide range of
interdisciplinary research in physical, life, and social sciences. Initial work, funded here, will create the specific capacity for transparent interoperability between the IQSS’s Dataverse Network framework and the Broad Institute’s Genepattern framework, which were originally developed for entirely unrelated Social Science and Genomics applications, respectively. In so doing, the infrastructure for data archiving, statistical methods from both fields, and a wide range of services developed in each project for its own users will be widely available to both.