Synthesis center for cell biology: Difference between revisions

m
no edit summary
(Creating a synthesis center to enable grassroots contributions for conceptual and quantitative models in cell biology.)
mNo edit summary
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{Project
|Description=Enabling the grassroots generation of conceptual and quantitative models in cell biology through the creation of a Synthesis Center for Cell and Molecular Biology
|Affilitated With=Allen Institute, University of Washington, University of Connecticut
|Contributors=Matthew Akamatsu, Eran Egmon, Michael Gartner
}}
We are putting together a proposal to make a Synthesis Center for the field of cell and molecular biology. Its goal would be to synthesize the vast quantities of available cell and molecular data (protein types, locations, abundances, interactions) into both conceptual and quantitative models, that allow us to explain and predict the remarkable transition from nonliving molecules to living cells. The center (if they choose our proposal) would be funded by the National Science Foundation for 5-10 years and be housed at the Allen Institute for Cell Science in Seattle. During this workshop, I'd love to bounce around ideas for the synthesis center, and to identify points of intersection between this proposed center and your favorite tool or area.  
We are putting together a proposal to make a Synthesis Center for the field of cell and molecular biology. Its goal would be to synthesize the vast quantities of available cell and molecular data (protein types, locations, abundances, interactions) into both conceptual and quantitative models, that allow us to explain and predict the remarkable transition from nonliving molecules to living cells. The center (if they choose our proposal) would be funded by the National Science Foundation for 5-10 years and be housed at the Allen Institute for Cell Science in Seattle. During this workshop, I'd love to bounce around ideas for the synthesis center, and to identify points of intersection between this proposed center and your favorite tool or area.  


[[File:Conceptual and quantitative models.png|center|500x500px]]
[[File:Conceptual and quantitative models.png|center|500x500px]]


While most of the "big data" cell biology community is focused on creating new data sets, we are proposing to synthesize existing data into quantitative and conceptual models. For one set of quantitative models, we are using large datasets of the locations and interactions of cellular components to train generative (a la DALL-E-2) models of cells. The goal is for these synthetic cells to behave realistically in novel environments. For those models to be predictive, we need to constrain them with 1) quantitative parameter values from the literature and 2) mechanistic and biophysical information about the underlying processes. We need some help with #1 (an NLP challenge). For #2, we are building a platform to make mechanistic biophysical models of cellular processes that are interoperable, modular, and accessible.  But how do we as a field synthesize existing cell biology data into higher-level concepts, models, and theories?  
While most of the "big data" cell biology community is focused on creating new data sets, we are proposing to synthesize existing data into quantitative and conceptual models. For one set of quantitative models, we are using large [https://www.proteinatlas.org/humanproteome/subcellular datasets] of the locations and interactions of cellular components to train generative (a la DALL-E-2) models of cells. The goal is for these synthetic cells to behave realistically in novel environments. For those models to be predictive, we need to constrain them with 1) quantitative parameter values from the literature and 2) mechanistic and biophysical information about the underlying processes. We need some help with #1 (an NLP challenge). For #2, we are building a platform to make mechanistic biophysical models of cellular processes that are interoperable, modular, and accessible.  But how do we as a field synthesize existing cell biology data into higher-level concepts, models, and theories?  


[[File:Quantitative model generation.png|center|350x350px]]
[[File:Quantitative model generation.png|center|350x350px]]


To make conceptual models, we would like to use the power and modularity of the discourse graph schema - Questions, Claims, and Evidence - to structure the state of knowledge for our favorite research question(s).  Furthermore, we'll extend the discourse graph schema to guide our ''ongoing'' research contributions to address these questions. We call these [https://youtu.be/P0KUt2yrUkw results graphs]. Our lab has begun to create discourse and results graphs to track our understanding and contributions to our current research questions. Using Roam Research and Joel Chan's discourse graph extension, we classify a given research Question, collect Evidence from the literature and our lab notebooks, and use them to support Conclusions, which claim to address the research question.  It is early days, but this schema appears to help students structure their thinking, track their progress, and - most importantly - frame their work less as an individual endeavor and more as a contribution to a collective project (i.e. we are all trying to uncover the answer together).
To make conceptual models, we would like to use the power and modularity of the [https://network-goods.notion.site/The-Discourse-Graph-starter-pack-312374c813b24ec6b4d53a054371ee5a discourse graph] schema - Questions, Claims, and Evidence - to structure the state of knowledge for our favorite research question(s).  Furthermore, we'll extend the discourse graph schema to guide our ''ongoing'' research contributions to address these questions. We call these [https://youtu.be/P0KUt2yrUkw results graphs]. Our lab has begun to create discourse and results graphs to track our understanding of and contributions to our current research questions. Using Roam Research and Joel Chan's discourse graph extension, we classify a given research Question, collect Evidence from the literature and our lab notebooks, and use them to support Conclusions, which claim to address the research question.  It is early days, but this modular schema appears to help students structure their thinking, track their progress, and - most importantly - frame their work less as an individual endeavor and more as a contribution to a collective project (i.e. we are all trying to uncover the answer together).


[[File:Purpose and users of cell biology discourse graphs.png|center|350x350px]]
[[File:Purpose and users of cell biology discourse graphs.png|center|350x350px]]
Line 14: Line 21:




With the current tooling, paired with some ease-of-use improvements, and a 'captive audience' in the form of initial users who are also beneficiaries of the synthesis center, we think that discourse and results graphs in cell biology will allow for ''grassroots'' contributions from students, scientists, and community researchers, to build overarching concepts, models, and theories in cell biology.  
With the current tooling, paired with some ease-of-use improvements, and a 'captive audience' in the form of initial users who will also be beneficiaries of the synthesis center, we think that discourse and results graphs in cell biology will allow for ''grassroots'' contributions from students, scientists, and community researchers, to build overarching concepts, models, and theories in cell biology.  


[[File:Progress to theories.png|center|800x800px]]
[[File:Progress to theories.png|center|800x800px]]
Line 27: Line 34:
* Can we use NLP to help convert between a discourse graph and drafting a narrative (i.e. paper or proposal)? Is the structure of a manuscript sufficiently formulaic to pull this off (I think yes)?
* Can we use NLP to help convert between a discourse graph and drafting a narrative (i.e. paper or proposal)? Is the structure of a manuscript sufficiently formulaic to pull this off (I think yes)?
* Can we use NLP to help to extract all the instances of a given parameter value from the literature?
* Can we use NLP to help to extract all the instances of a given parameter value from the literature?
* Would you want to be part of this synthesis center? As a tool builder, synthesizer, other?
14

edits