Synthesis Infrastructures - User contributions [en]

Interdisciplinary Models

2022-11-12T17:20:24Z

Petermr:

{{Group
|Decription=How do we define minimal information models tuned for synthesis that can interoperate across various disciplines?
|Topics=Interoperability
|Discord Channel Name=#interdisciplinary-models
|Discord Channel URL=https://discord.com/channels/1029514961782849607/1040385502026682408
|Facilitator=Wayne Lutters
|Members=Elianna DeSota, James Howison, Konrad Hinsen, Leo Ware, Paul Itoi, Peter Murray-Rust, Wayne Lutters
}}
== What ==

How do we define minimal information models tuned for synthesis that can interoperate across various disciplines?

Concrete problem expressed by [[Peter-Murray Rust]] here: https://discord.com/channels/1029514961782849607/1040214388554084372/1040299259930611833: "The idea of Hypothesis testing is common in some disciplines, unknown in others. For example chemical synthesis or materials science is "can we make X?" and many sciences are exploratory - what can we see with a new telescope, plants in Antarctica, etc. You have to design your project but I suspect Hypothesis doesn't come into it."

And to a certain extent, the issue of representing/discussing the discourse of computational research (e.g., model parameters), discussed by [[Konrad Hinsen]] here: https://discord.com/channels/1029514961782849607/1038988750677606432/1039576903838859326

This connects also with [[Peter-Murray Rust]]'s work on [[Semantic Climate]] (semantifying the IPCC report).

* see discussion here: https://discord.com/channels/1029514961782849607/1040057721044598788/1040060670907002973
* and here: https://discord.com/channels/1029514961782849607/1033091746139230238/1040226423346040853

And also connects to emerging discussions around interoperability and Surfacing/managing/resolving disagreements in ontologies/terms/federation

=== Initial discussion ===

Matthew, Peter, Wayne, James, Ellie, Leo

Projects discussed:

- Scraping literature in geosciences to spatially map out relevant variables and contributions across disciplines http://globe.umbc.edu/
- Materials Genome Initiative mentioned: infrastructure well-supported but still siloed
- OPTIMADE: common API format between existing materials databases

"grassroot tech assemblage can work at scale. "Shoddy" now works." Shoestring budgets driving open source innovation.

"need to flourish long enough to been seen by other disciplines" Idea that each of these initiatives have a typical academic funding life of 3-7 years and then are sunset. Do they twinkle in the sky long enough to be seen by other disciplines? Core sustainability issues not just of the tools / platforms but of the motivating ideas beyond them.

"So: how do we work in a way that others can learn from in future" -- without being discouraged from starting new things, encourage high-risk, high-reward innovations.

Reaching plateau of open data --- metrics on who is using and what using for

Similar challenges in enterprise: what data do we have within an org, and who is using it? https://data.world/ vs more public initiatives like https://coleridgeinitiative.org/

Insight around longevity -- is it the infrastructure that lives on? the vision? or the relationships? Unique value of EC framework initiatives (e.g., Horizon 2020) that are as much political projects as they are scientific ones. Those connections between people and labs persists.

Discovering and forming communities of practice around datasets -- how does one person's use leave traces that others can discover? How do we align the challenges across time (ie my experience when I was grappling with a specific column in a dataset, aligned with someone doing just that a year later).

Academic model of competition rubs against open science --- both in sunk time and possessiveness of data

Disciplines have different reductionist traditions, what is the well-defined focus of study. Is this the substrate that enables cross disciplinary data engagement?

Where do people gather to have these conversations? What are the communities of practice, publication venues to share knowledge about working across the disciplines? Where do these happen within disciplines and where is the meta-science narrative developing? Historically in e-science-like funding; especially international laboratories

Open notebook science https://en.wikipedia.org/wiki/Open-notebook_science<nowiki/>-- show the world as you are doing it, make connections on the day of publication. Very well defined strategy with templates. http://opensourcemalaria.org/

Bold vision of what is possible, e.g. automated recombination & discovery: https://materialsproject.org, https://materialsproject.github.io/fireworks/ - would also like to plug OPTIMADE here, which is then unifying datasets between several endeavors in this field

Domain differences between contributing individual data points vs entire datasets

Can grassroots emulate giant centralisation within industrial monoliths

Analogy between web frameworks/OSS: emerging from many hands working towards similar problems

Gift economy of software applied to data? Frictionless data as an example https://frictionlessdata.io/

Collectivization as a model -- being able to push upstream to graphs at different scales

Incentivizing collectivization

'''Ellie + Leo thoughts during break -'''

Seems like there is a three fold problem,

- easy to share data/info

- easy to use data/info

- easy to cite data/info

Right now - none of this is free. It takes so much time to actually find all the weight of evidence, and to connect all the data. And also,.. the finantialization model which attempts to make this 'incentivized' at a greater scale seems relatively meh? Unless the returns are pretty big, I feel like I am much more likely to be lazy than to care about a few extra ETH. In addition, just thinking about reputation networks feels like it should/would ened to be more connected to your actual community in which you had established systems of practice.

Infrastructure -> Make an overleaf that automatically brings in citations? How do we sync GPT into this?

What is an equivalent of GPT for data? Where you want to look for specific DATA - you aren't concerned initially with the original questions asked to the pieces of data.

SUPER cool project we should talk about -> DeSci Labs.

'''Identifying key questions:'''

- Which solutions have worked in other domains?

- What are the differences between (ontological, socio-political, economical) domains that lend themselves to different solutions?

- Extending the concept of "discipline" to e.g., cataloguing human infrastructure (cities, roads etc), "Discipline as a search across a reasonably well defined search space". Possible with a small number of disciplines (e.g. medicinal plant chemistry needs three domains - all with good ontologies)

- Alignment of primitives --- example of plants in expressed different locales and the effect on local climate

- Aligning communities of practice with a wider goal?

'''Possible outcomes of this group'''

- Compendium of practices in different fields

--- Collection of venues: where are the discussions happening now at the discipling and meta level

--- Collection of case studies around primitives in different disciplines

- Collecting ideas from other attendees from disciplines within the workshop in a survey: how would you/your field do things differently were all these things in place? Different life cycles and capturing nascent knowledge

'''Key themes for reporting back:'''

- Longevity and sustainability

- Lowering initial costs

- Designing work such that it can contribute upstream to a "practice"

- Differences in solutions by scientific disciplines, mechanisms of production, budgets, motivations and governance

- Alignment of primitives within a discipline: do you contribute a data point or a dataset? Different approaches required

- Synergies with other groups: interfaces, graphs, social systems

- Didn't really mention papers once!

Peter Murray-Rust

2022-11-05T23:26:02Z

Petermr: Semantic tools and content for climate science and policy

{{Participant
|Timezone=Europe/London (GMT+00:00/GMT+01:00)
|Affiliation=University of Cambridge
|Projects=https://semanticclimate.github.io/p/en/
|Interests=ontologies textMining Wikidata
|Discord Handle=handle#1941
|ORCID=0000-0003-3386-3972
|Twitter=petermurrayrust
|Github=petermr
}}
{{Workshop Submission
|Interest=Have been running Open Free Collaborative systems for 25 years (BBK-PPS, BlueObelisk, ContentMine, and now #semanticClimate. This is a global Open community, centered in India, where we are making climate documents (such as the IPCC's recent 10,000 pages) semantic.
|Frame=Tool-builder, Researcher
|Materials=HTML versions of the IPCC report.
A suite of Python tools for searching, processing, cleaning and analysing,
Semantic climate dictionaries linked to Wikidata and hence into the Linked Open Data cloud
|Organizer Topics=Documents, Scientific Publishing
}}
Was invited to join (2022-11-04).
I enjoy sparking off Open community action in creating interoperable systems for science. Worked with W3C in the creation of XML, which provides a framework for modelling the world. Models are driven by discourse (i.e. what people write) rather than god-given classifications. Henry Rzepa and I created Chemical Markup Language (CML) which is capable of modelling much of published chemistry and interoperates with MathML, HTML and SVG. I have always believed in "Rough consensus and running code" so I have written code (Java) that supports the design, The problems are now not conceptual but sociopolitical (scientists are conservative and prefer PDF which in not machine processable).

The advent of Wikidata (2012-) now means we can map much of our discourse into the Linked Open Data cloud.

Because we model discourse there is a major requirement for text-mining so I and colleagues have developed a suite of tools. pygetpapers (Ayush Garg) is a rapid search and download for Open Access science, docanalysis applies NLP tools and AMI is a framework for analysing and recombining scientific publications. We have recently concentrated on applying this to climate literature and making it semantic

Peter Murray-Rust

2022-11-05T23:23:32Z

Petermr: Semantic tools and content for climate science and policy

{{Participant
|Timezone=Europe/London (GMT+00:00/GMT+01:00)
|Affiliation=University of Cambridge
|Projects=https://semanticclimate.github.io/p/en/
|Interests=ontologies textMining Wikidata
|Discord Handle=handle#1941
|ORCID=0000-0003-3386-3972
|Twitter=petermurrayrust
|Github=petermr
}}
{{Workshop Submission
|Interest=Have been running Open Free Collaborative systems for 25 years (BBK-PPS, BlueObelisk, ContentMine, and now #semanticClimate. This is a global Open community, centered in India, where we are making climate documents (such as the IPCC's recent 10,000 pages) semantic.
|Frame=Tool-builder, Researcher
|Materials=HTML versions of the IPCC report.
A suite of Python tools for searching, processing, cleaning and analysing,
Semantic climate dictionaries linked to Wikidata and hence into the Linked Open Data cloud
}}
Was invited to join (2022-11-04).
I enjoy sparking off Open community action in creating interoperable systems for science. Worked with W3C in the creation of XML, which provides a framework for modelling the world. Models are driven by discourse (i.e. what people write) rather than god-given classifications. Henry Rzepa and I created Chemical Markup Language (CML) which is capable of modelling much of published chemistry and interoperates with MathML, HTML and SVG. I have always believed in "Rough consensus and running code" so I have written code (Java) that supports the design, The problems are now not conceptual but sociopolitical (scientists are conservative and prefer PDF which in not machine processable).

The advent of Wikidata (2012-) now means we can map much of our discourse into the Linked Open Data cloud.

Because we model discourse there is a major requirement for text-mining so I and colleagues have developed a suite of tools. pygetpapers (Ayush Garg) is a rapid search and download for Open Access science, docanalysis applies NLP tools and AMI is a framework for analysing and recombining scientific publications. We have recently concentrated on applying this to climate literature and making it semantic

Peter Murray-Rust

2022-11-05T22:52:55Z

Petermr: Created page with "{{Participant |Timezone=Europe/London (GMT+00:00/GMT+01:00) |Affiliation=University of Cambridge |Projects=https://semanticclimate.github.io/p/en/ |Interests=ontologies textMining Wikidata |Discord Handle=handle#1941 }} {{Workshop Submission}}"