Interdisciplinary Models: Difference between revisions

From Synthesis Infrastructures
No edit summary
No edit summary
 
(30 intermediate revisions by 5 users not shown)
Line 5: Line 5:
|Discord Channel URL=https://discord.com/channels/1029514961782849607/1040385502026682408
|Discord Channel URL=https://discord.com/channels/1029514961782849607/1040385502026682408
|Facilitator=Wayne Lutters
|Facilitator=Wayne Lutters
|Members=James Howison, Paul Itoi, Peter Murray-Rust, Wayne Lutters, Konrad Hinsen
|Members=Elianna DeSota, James Howison, Konrad Hinsen, Leo Ware, Paul Itoi, Peter Murray-Rust, Wayne Lutters
}}
}}
== What ==
== What ==
Line 29: Line 29:
Projects discussed:
Projects discussed:


- Scraping literature in geosciences to spatially map out contributions http://globe.umbc.edu/
- Scraping literature in geosciences to spatially map out relevant variables and contributions across disciplines http://globe.umbc.edu/
- Materials Genome Initiative mentioned: infrastructure well-supported but still siloed
- Materials Genome Initiative mentioned: infrastructure well-supported but still siloed
- OPTIMADE: common API format between existing materials databases
- OPTIMADE: common API format between existing materials databases


"grassroot tech assemblage can work at scale. "Shoddy" now works."
"grassroot tech assemblage can work at scale. "Shoddy" now works." Shoestring budgets driving open source innovation.


"need to flourish long enough to been seen by other disciplines"
"need to flourish long enough to been seen by other disciplines" Idea that each of these initiatives have a typical academic funding life of 3-7 years and then are sunset. Do they twinkle in the sky long enough to be seen by other disciplines? Core sustainability issues not just of the tools / platforms but of the motivating ideas beyond them.


"So: how do we work in a way that others can learn from in future" -- without being discouraged from starting new things
"So: how do we work in a way that others can learn from in future" -- without being discouraged from starting new things, encourage high-risk, high-reward innovations.


Reaching plateau of open data --- metrics on who is using and what using for
Reaching plateau of open data --- metrics on who is using and what using for  


Similar challenges in enterprise: what data do we have within an org, and who is using it? https://data.world/ vs more public initiatives like https://coleridgeinitiative.org/
Similar challenges in enterprise: what data do we have within an org, and who is using it? https://data.world/ vs more public initiatives like https://coleridgeinitiative.org/
Insight around longevity -- is it the infrastructure that lives on? the vision? or the relationships? Unique value of EC framework initiatives (e.g., Horizon 2020) that are as much political projects as they are scientific ones. Those connections between people and labs persists.


Discovering and forming communities of practice around datasets -- how does one person's use leave traces that others can discover?  How do we align the challenges across time (ie my experience when I was grappling with a specific column in a dataset, aligned with someone doing just that a year later).
Discovering and forming communities of practice around datasets -- how does one person's use leave traces that others can discover?  How do we align the challenges across time (ie my experience when I was grappling with a specific column in a dataset, aligned with someone doing just that a year later).
Academic model of competition rubs against open science --- both in sunk time and possessiveness of data
Academic model of competition rubs against open science --- both in sunk time and possessiveness of data
Disciplines have different reductionist traditions, what is the well-defined focus of study. Is this the substrate that enables cross disciplinary data engagement?
Where do people gather to have these conversations? What are the communities of practice, publication venues to share knowledge about working across the disciplines? Where do these happen within disciplines and where is the meta-science narrative developing? Historically in e-science-like funding; especially international laboratories
Open notebook science  https://en.wikipedia.org/wiki/Open-notebook_science<nowiki/>-- show the world as you are doing it, make connections on the day of publication. Very well defined strategy with templates. http://opensourcemalaria.org/
Bold vision of what is possible, e.g. automated recombination & discovery:  https://materialsproject.org, https://materialsproject.github.io/fireworks/ - would also like to plug OPTIMADE here, which is then unifying datasets between several endeavors in this field
Domain differences between contributing individual data points vs entire datasets
Can grassroots emulate giant centralisation within industrial monoliths
Analogy between web frameworks/OSS: emerging from many hands working towards similar problems
Gift economy of software applied to data? Frictionless data as an example https://frictionlessdata.io/
Collectivization as a model -- being able to push upstream to graphs at different scales
Incentivizing collectivization
'''Ellie + Leo thoughts during break -'''
Seems like there is a three fold problem,
- easy to share data/info
- easy to use data/info
- easy to cite data/info
Right now - none of this is free. It takes so much time to actually find all the weight of evidence, and to connect all the data. And also,.. the finantialization model which attempts to make this 'incentivized' at a greater scale seems relatively meh? Unless the returns are pretty big, I feel like I am much more likely to be lazy than to care about a few extra ETH. In addition, just thinking about reputation networks feels like it should/would ened to be more connected to your actual community in which you had established systems of practice.
Infrastructure -> Make an overleaf that automatically brings in citations? How do we sync GPT into this?
What is an equivalent of GPT for data? Where you want to look for specific DATA - you aren't concerned initially with the original questions asked to the pieces of data.
SUPER cool project we should talk about -> DeSci Labs.


'''Identifying key questions:'''
'''Identifying key questions:'''
Line 53: Line 95:
- What are the differences between (ontological, socio-political, economical) domains that lend themselves to different solutions?
- What are the differences between (ontological, socio-political, economical) domains that lend themselves to different solutions?


- Extending the concept of "discipline" to e.g., cataloguing human infrastructure (cities, roads etc), "Discipline as a search across a reasonably well defined search space"
- Extending the concept of "discipline" to e.g., cataloguing human infrastructure (cities, roads etc), "Discipline as a search across a reasonably well defined search space". Possible with a small number of disciplines (e.g. medicinal plant chemistry needs three domains - all with good ontologies)


- Alignment of primitives --- example of plants in expressed different locales and the effect on local climate
- Alignment of primitives --- example of plants in expressed different locales and the effect on local climate
- Aligning communities of practice with a wider goal?
- Innovation in seeing interdisciplinarity "downstack" (ie not in the front-line science/research, but in the tools that groups use).  See that collaboration as also interdisciplinary.




'''Possible outcomes of this group'''
'''Possible outcomes of this group'''


- Compendium of practices in different fields
- Compendium of practices in different fields
 
--- Collection of venues: where are the discussions happening now at the discipling and meta level
 
--- Collection of case studies around primitives in different disciplines
 
- Collecting ideas from other attendees from disciplines within the workshop in a survey: how would you/your field do things differently were all these things in place? Different life cycles and capturing nascent knowledge
 
 
'''Key themes for reporting back:'''
 
- Longevity and sustainability
 
- Lowering initial costs
 
- Designing work such that it can contribute upstream to a "practice"
 
- Differences in solutions by scientific disciplines, mechanisms of production, budgets, motivations and governance
 
- Alignment of primitives within a discipline: do you contribute a data point or a dataset? Different approaches required
 
- Synergies with other groups: interfaces, graphs, social systems
 
- Didn't really mention papers once!
 
>>> DAY #2 RUNNING
 
1.) STRUCTURE & USE
IPCC -> claims in obsidian
beyond info architecture
holy scripture available for the masses
find new connections
index
venues
women for open climate
NAS reports
 
oil exploration & production
well logs
 
2.) MOTIVATION
indisciplinary doing primary science together
infrastructure vs. primary science
front line science vs. backstage
FAIR, need joint infrastructure
pushing upstream
data accessibility
desci / halogen
community structure, customizable, constrained
paper replication
 
synthesis study <<< STILL CAN STUDY THIS
karen baker not just PIs, get right level on team
edge researchers do the actual work core business doesn't care about
funding on topics PIS don't care about
MIT look-it, psychology
 
3.) INFRASTRUCTURE
research external to academica
crowdsourced science, what happend to that?
Infrastrucutre
research librarian / career path / dev
infrastructure -- machine shop, glassblowers, super computer centers
Pooled resources
The SW people
eSci UK - roles $ sw sustainability institutes
 
"when infrastructure becomes a research project itself vs. a resource for community often drifts off"
The GRID - Physics
Libraries - pandisciplinarity
Not aware of "embedded librarian" as a thing.
Domain scientist expertise =/= meta data
Platform emergence - core & tipping
Gawer, A., & Cusumano, M. A. (2008). How companies become platform leaders. MIT Sloan Management Review, 49(28).
Too wide a view,
Need one person in each department that understands and can translate that discipline, and something networking them together
 
4.) MOTIVATION - FUTURE SELVES
Motivation for FLOSS, reduce the maintenance cost now for anticipated future work
What is it for open science?
Mark up our grant proposals, future reuse
 
Semantic bibliography
This was the first half of that project - https://abstract-poetry.fly.dev/bibliography it related various references to each other so you could see the global narrative. We never built out the ability to annotate each of the papers and how the related to the paper, or to add papers necessarily. (We have a more exploratory /searchy version of this at abstract-poetry.fly.dev/search which can more document a search process.
 
Tremendous value vision, sustainable overhead?
Finding the right structure
 
 
Matthew's answers to our draft questions:
 
- Research Data Alliance (RDA), Materials Research Data Alliance (MaRDA), workshops
 
-Taking interdisciplinary in a very narrow sense, an example to bridge gaps between existing materials databases is OPTIMADE. We spent 5 years devising a common API format between 15 or so existing crystal structure databases, primarily of competitors, and with varying uptake. Should now be possible for anyone, to create a database and join the federation, e.g., a new project just tripeld the number of structures available. Each database has different slants even if the quanta of data shared is common. Unclear if this is socially sustainable. Hinged on developing a lightweight federation with "shoddy" infrastructure on top of a relatively API format.


- Collecting ideas from other attendees from disciplines within the workshop in a survey
- Specific example: as this data is machine-actionable, new software projects can now query all known materials, potentially significantly accelerating materials discovery. New databases can trivially join the federation, and previously "dead" data can be brought back to life by adding an API on top of it. Automated labs can exchange information.

Latest revision as of 16:25, 13 November 2022

Interdisciplinary Models
Description How do we define minimal information models tuned for synthesis that can interoperate across various disciplines?
Related Topics Interoperability
Discord Channel #interdisciplinary-models
Facilitator Wayne Lutters
Members Paul Itoi, Elianna DeSota, Leo Ware, Konrad Hinsen, James Howison, Wayne Lutters, Peter Murray-Rust

What

How do we define minimal information models tuned for synthesis that can interoperate across various disciplines?

Concrete problem expressed by Peter-Murray Rust here: https://discord.com/channels/1029514961782849607/1040214388554084372/1040299259930611833: "The idea of Hypothesis testing is common in some disciplines, unknown in others. For example chemical synthesis or materials science is "can we make X?" and many sciences are exploratory - what can we see with a new telescope, plants in Antarctica, etc. You have to design your project but I suspect Hypothesis doesn't come into it."


And to a certain extent, the issue of representing/discussing the discourse of computational research (e.g., model parameters), discussed by Konrad Hinsen here: https://discord.com/channels/1029514961782849607/1038988750677606432/1039576903838859326

This connects also with Peter-Murray Rust's work on Semantic Climate (semantifying the IPCC report).

And also connects to emerging discussions around interoperability and Surfacing/managing/resolving disagreements in ontologies/terms/federation

Initial discussion

Matthew, Peter, Wayne, James, Ellie, Leo

Projects discussed:

- Scraping literature in geosciences to spatially map out relevant variables and contributions across disciplines http://globe.umbc.edu/ - Materials Genome Initiative mentioned: infrastructure well-supported but still siloed - OPTIMADE: common API format between existing materials databases

"grassroot tech assemblage can work at scale. "Shoddy" now works." Shoestring budgets driving open source innovation.

"need to flourish long enough to been seen by other disciplines" Idea that each of these initiatives have a typical academic funding life of 3-7 years and then are sunset. Do they twinkle in the sky long enough to be seen by other disciplines? Core sustainability issues not just of the tools / platforms but of the motivating ideas beyond them.

"So: how do we work in a way that others can learn from in future" -- without being discouraged from starting new things, encourage high-risk, high-reward innovations.

Reaching plateau of open data --- metrics on who is using and what using for

Similar challenges in enterprise: what data do we have within an org, and who is using it? https://data.world/ vs more public initiatives like https://coleridgeinitiative.org/

Insight around longevity -- is it the infrastructure that lives on? the vision? or the relationships? Unique value of EC framework initiatives (e.g., Horizon 2020) that are as much political projects as they are scientific ones. Those connections between people and labs persists.

Discovering and forming communities of practice around datasets -- how does one person's use leave traces that others can discover? How do we align the challenges across time (ie my experience when I was grappling with a specific column in a dataset, aligned with someone doing just that a year later).

Academic model of competition rubs against open science --- both in sunk time and possessiveness of data

Disciplines have different reductionist traditions, what is the well-defined focus of study. Is this the substrate that enables cross disciplinary data engagement?

Where do people gather to have these conversations? What are the communities of practice, publication venues to share knowledge about working across the disciplines? Where do these happen within disciplines and where is the meta-science narrative developing? Historically in e-science-like funding; especially international laboratories

Open notebook science https://en.wikipedia.org/wiki/Open-notebook_science-- show the world as you are doing it, make connections on the day of publication. Very well defined strategy with templates. http://opensourcemalaria.org/

Bold vision of what is possible, e.g. automated recombination & discovery: https://materialsproject.org, https://materialsproject.github.io/fireworks/ - would also like to plug OPTIMADE here, which is then unifying datasets between several endeavors in this field

Domain differences between contributing individual data points vs entire datasets

Can grassroots emulate giant centralisation within industrial monoliths

Analogy between web frameworks/OSS: emerging from many hands working towards similar problems

Gift economy of software applied to data? Frictionless data as an example https://frictionlessdata.io/

Collectivization as a model -- being able to push upstream to graphs at different scales

Incentivizing collectivization

Ellie + Leo thoughts during break -

Seems like there is a three fold problem,

- easy to share data/info

- easy to use data/info

- easy to cite data/info

Right now - none of this is free. It takes so much time to actually find all the weight of evidence, and to connect all the data. And also,.. the finantialization model which attempts to make this 'incentivized' at a greater scale seems relatively meh? Unless the returns are pretty big, I feel like I am much more likely to be lazy than to care about a few extra ETH. In addition, just thinking about reputation networks feels like it should/would ened to be more connected to your actual community in which you had established systems of practice.


Infrastructure -> Make an overleaf that automatically brings in citations? How do we sync GPT into this?

What is an equivalent of GPT for data? Where you want to look for specific DATA - you aren't concerned initially with the original questions asked to the pieces of data.


SUPER cool project we should talk about -> DeSci Labs.

Identifying key questions:

- Which solutions have worked in other domains?

- What are the differences between (ontological, socio-political, economical) domains that lend themselves to different solutions?

- Extending the concept of "discipline" to e.g., cataloguing human infrastructure (cities, roads etc), "Discipline as a search across a reasonably well defined search space". Possible with a small number of disciplines (e.g. medicinal plant chemistry needs three domains - all with good ontologies)

- Alignment of primitives --- example of plants in expressed different locales and the effect on local climate

- Aligning communities of practice with a wider goal?

- Innovation in seeing interdisciplinarity "downstack" (ie not in the front-line science/research, but in the tools that groups use). See that collaboration as also interdisciplinary.



Possible outcomes of this group

- Compendium of practices in different fields

--- Collection of venues: where are the discussions happening now at the discipling and meta level

--- Collection of case studies around primitives in different disciplines

- Collecting ideas from other attendees from disciplines within the workshop in a survey: how would you/your field do things differently were all these things in place? Different life cycles and capturing nascent knowledge


Key themes for reporting back:

- Longevity and sustainability

- Lowering initial costs

- Designing work such that it can contribute upstream to a "practice"

- Differences in solutions by scientific disciplines, mechanisms of production, budgets, motivations and governance

- Alignment of primitives within a discipline: do you contribute a data point or a dataset? Different approaches required

- Synergies with other groups: interfaces, graphs, social systems

- Didn't really mention papers once!

>>> DAY #2 RUNNING

1.) STRUCTURE & USE IPCC -> claims in obsidian beyond info architecture holy scripture available for the masses find new connections index venues women for open climate NAS reports

oil exploration & production well logs

2.) MOTIVATION indisciplinary doing primary science together infrastructure vs. primary science front line science vs. backstage FAIR, need joint infrastructure pushing upstream data accessibility desci / halogen community structure, customizable, constrained paper replication

synthesis study <<< STILL CAN STUDY THIS karen baker not just PIs, get right level on team edge researchers do the actual work core business doesn't care about funding on topics PIS don't care about MIT look-it, psychology

3.) INFRASTRUCTURE research external to academica crowdsourced science, what happend to that? Infrastrucutre research librarian / career path / dev infrastructure -- machine shop, glassblowers, super computer centers Pooled resources The SW people eSci UK - roles $ sw sustainability institutes

"when infrastructure becomes a research project itself vs. a resource for community often drifts off" The GRID - Physics Libraries - pandisciplinarity Not aware of "embedded librarian" as a thing. Domain scientist expertise =/= meta data Platform emergence - core & tipping Gawer, A., & Cusumano, M. A. (2008). How companies become platform leaders. MIT Sloan Management Review, 49(28). Too wide a view, Need one person in each department that understands and can translate that discipline, and something networking them together

4.) MOTIVATION - FUTURE SELVES Motivation for FLOSS, reduce the maintenance cost now for anticipated future work What is it for open science? Mark up our grant proposals, future reuse

Semantic bibliography This was the first half of that project - https://abstract-poetry.fly.dev/bibliography it related various references to each other so you could see the global narrative. We never built out the ability to annotate each of the papers and how the related to the paper, or to add papers necessarily. (We have a more exploratory /searchy version of this at abstract-poetry.fly.dev/search which can more document a search process.

Tremendous value vision, sustainable overhead? Finding the right structure


Matthew's answers to our draft questions:

- Research Data Alliance (RDA), Materials Research Data Alliance (MaRDA), workshops

-Taking interdisciplinary in a very narrow sense, an example to bridge gaps between existing materials databases is OPTIMADE. We spent 5 years devising a common API format between 15 or so existing crystal structure databases, primarily of competitors, and with varying uptake. Should now be possible for anyone, to create a database and join the federation, e.g., a new project just tripeld the number of structures available. Each database has different slants even if the quanta of data shared is common. Unclear if this is socially sustainable. Hinged on developing a lightweight federation with "shoddy" infrastructure on top of a relatively API format.

- Specific example: as this data is machine-actionable, new software projects can now query all known materials, potentially significantly accelerating materials discovery. New databases can trivially join the federation, and previously "dead" data can be brought back to life by adding an API on top of it. Automated labs can exchange information.