Interdisciplinary Models

Interdisciplinary Models
Description	How do we define minimal information models tuned for synthesis that can interoperate across various disciplines?
Related Topics	Interoperability
Discord Channel	#interdisciplinary-models
Facilitator	Wayne Lutters
Members	Paul Itoi, Elianna DeSota, Leo Ware, Konrad Hinsen, James Howison, Wayne Lutters, Peter Murray-Rust

What

How do we define minimal information models tuned for synthesis that can interoperate across various disciplines?

Concrete problem expressed by Peter-Murray Rust here: https://discord.com/channels/1029514961782849607/1040214388554084372/1040299259930611833: "The idea of Hypothesis testing is common in some disciplines, unknown in others. For example chemical synthesis or materials science is "can we make X?" and many sciences are exploratory - what can we see with a new telescope, plants in Antarctica, etc. You have to design your project but I suspect Hypothesis doesn't come into it."

And to a certain extent, the issue of representing/discussing the discourse of computational research (e.g., model parameters), discussed by Konrad Hinsen here: https://discord.com/channels/1029514961782849607/1038988750677606432/1039576903838859326

This connects also with Peter-Murray Rust's work on Semantic Climate (semantifying the IPCC report).

see discussion here: https://discord.com/channels/1029514961782849607/1040057721044598788/1040060670907002973
and here: https://discord.com/channels/1029514961782849607/1033091746139230238/1040226423346040853

And also connects to emerging discussions around interoperability and Surfacing/managing/resolving disagreements in ontologies/terms/federation

Initial discussion

Matthew, Peter, Wayne, James, Ellie, Leo

Projects discussed:

- Scraping literature in geosciences to spatially map out relevant variables and contributions across disciplines http://globe.umbc.edu/ - Materials Genome Initiative mentioned: infrastructure well-supported but still siloed - OPTIMADE: common API format between existing materials databases

"grassroot tech assemblage can work at scale. "Shoddy" now works." Shoestring budgets driving open source innovation.

"need to flourish long enough to been seen by other disciplines" Idea that each of these initiatives have a typical academic funding life of 3-7 years and then are sunset. Do they twinkle in the sky long enough to be seen by other disciplines? Core sustainability issues not just of the tools / platforms but of the motivating ideas beyond them.

"So: how do we work in a way that others can learn from in future" -- without being discouraged from starting new things, encourage high-risk, high-reward innovations.

Reaching plateau of open data --- metrics on who is using and what using for

Similar challenges in enterprise: what data do we have within an org, and who is using it? https://data.world/ vs more public initiatives like https://coleridgeinitiative.org/

Insight around longevity -- is it the infrastructure that lives on? the vision? or the relationships? Unique value of EC framework initiatives (e.g., Horizon 2020) that are as much political projects as they are scientific ones. Those connections between people and labs persists.

Discovering and forming communities of practice around datasets -- how does one person's use leave traces that others can discover? How do we align the challenges across time (ie my experience when I was grappling with a specific column in a dataset, aligned with someone doing just that a year later).

Academic model of competition rubs against open science --- both in sunk time and possessiveness of data

Disciplines have different reductionist traditions, what is the well-defined focus of study. Is this the substrate that enables cross disciplinary data engagement?

Where do people gather to have these conversations? What are the communities of practice, publication venues to share knowledge about working across the disciplines? Where do these happen within disciplines and where is the meta-science narrative developing? Historically in e-science-like funding; especially international laboratories

Open notebook science https://en.wikipedia.org/wiki/Open-notebook_science-- show the world as you are doing it, make connections on the day of publication. Very well defined strategy with templates. http://opensourcemalaria.org/

Bold vision of what is possible, e.g. automated recombination & discovery: https://materialsproject.org, https://materialsproject.github.io/fireworks/ - would also like to plug OPTIMADE here, which is then unifying datasets between several endeavors in this field

Domain differences between contributing individual data points vs entire datasets

Can grassroots emulate giant centralisation within industrial monoliths

Analogy between web frameworks/OSS: emerging from many hands working towards similar problems

Gift economy of software applied to data? Frictionless data as an example https://frictionlessdata.io/

Collectivization as a model -- being able to push upstream to graphs at different scales

Incentivizing collectivization

Ellie + Leo thoughts during break -

Seems like there is a three fold problem,

- easy to share data/info

- easy to use data/info

- easy to cite data/info

Right now - none of this is free. It takes so much time to actually find all the weight of evidence, and to connect all the data. And also,.. the finantialization model which attempts to make this 'incentivized' at a greater scale seems relatively meh? Unless the returns are pretty big, I feel like I am much more likely to be lazy than to care about a few extra ETH. In addition, just thinking about reputation networks feels like it should/would ened to be more connected to your actual community in which you had established systems of practice.

Infrastructure -> Make an overleaf that automatically brings in citations? How do we sync GPT into this?

What is an equivalent of GPT for data? Where you want to look for specific DATA - you aren't concerned initially with the original questions asked to the pieces of data.

SUPER cool project we should talk about -> DeSci Labs.

Identifying key questions:

- Which solutions have worked in other domains?

- What are the differences between (ontological, socio-political, economical) domains that lend themselves to different solutions?

- Extending the concept of "discipline" to e.g., cataloguing human infrastructure (cities, roads etc), "Discipline as a search across a reasonably well defined search space". Possible with a small number of disciplines (e.g. medicinal plant chemistry needs three domains - all with good ontologies)

- Alignment of primitives --- example of plants in expressed different locales and the effect on local climate

- Aligning communities of practice with a wider goal?

Possible outcomes of this group

- Compendium of practices in different fields

--- Collection of venues: where are the discussions happening now at the discipling and meta level

--- Collection of case studies around primitives in different disciplines

- Collecting ideas from other attendees from disciplines within the workshop in a survey: how would you/your field do things differently were all these things in place? Different life cycles and capturing nascent knowledge

Key themes for reporting back:

- Longevity and sustainability

- Lowering initial costs

- Designing work such that it can contribute upstream to a "practice"

- Differences in solutions by scientific disciplines, mechanisms of production, budgets, motivations and governance

- Alignment of primitives within a discipline: do you contribute a data point or a dataset? Different approaches required

- Synergies with other groups: interfaces, graphs, social systems

- Didn't really mention papers once!

>>> DAY #2 RUNNING

1.) STRUCTURE & USE IPCC -> claims obsidian beyond info artecure holy scripture find new connections index venues women for open climate NAS reports

oil exploration & production well logs

2.) MOTIVATION indisciplinary doing primary science together infrastructure vs. primary science front line science vs. backstage FAIR, need joint infrastructure pushing upstream data accessibility desci / halogen community structure, customizable, constrained paper replication

synthesis study <<< STILL CAN STUDY THIS karen baker not just PIs, get right level on team edge researchers do the actual work core business doesn't care about funding on topics PIS don't care about MIT look-it, psychology

3.) INFRASTRUCTURE research external to academica crowdsourced science, what happend to that? Infrastrucutre research librarian / career path / dev infrastructure -- machine shop, glassblowers, super computer centers Pooled resources The SW people eSci UK - roles $ sw sustainability institutes

"when infrastructure becomes a research project itself vs. a resource for community often drifts off" The GRID - Physics Libraries - pandisciplinarity Not aware of "embedded librarian" as a thing. Domain scientist expertise =/= meta data Platform emergence - core & tipping Gawer, A., & Cusumano, M. A. (2008). How companies become platform leaders. MIT Sloan Management Review, 49(28). Too wide a view, Need one person in each department that understands and can translate that discipline, and something networking them together

4.) MOTIVATION - FUTURE SELVES Motivation for FLOSS, reduce the maintenance cost now for anticipated future work What is it for open science? Mark up our grant proposals, future reuse Semantic bibliography This was the first half of that project - https://abstract-poetry.fly.dev/bibliography it related various references to each other so you could see the global narrative. We never built out the ability to annotate each of the papers and how the related to the paper, or to add papers necessarily. (We have a more exploratory /searchy version of this at abstract-poetry.fly.dev/search which can more document a search process.

Tremendous value vision, sustainable overhead? Finding the right structure Maintaining semantic bibliography for a grant is a great form of legitimate peripheral participation We plan to try this out for preprints in climate. Sensemaking structure to 15,000 refs

DAY 2

question how can we use AI processing to discover non-explicit connections?

- claim - the simplest job to get here is to create an index.

question what are the venues where we can use open/interdisciplinary science?

- evidence - women for open climate already exists and would would support this

claim - we conceptualize inter-disciplinarity as multiple groups doing their primary research together.

- evidence - (counter to truth of claim) Django as an example. This is collaboration btwn website builders making the infrastructure. Or scikitlearn etc. None of these are collaboration on their primary activity.

- question - how might we ensure that these groups can collaborate on projects that aren't their direct work?

claim - there is a disconnect between the team level incentives and the global level need for collaborative infrastructure

- evidence synthesis centers couldn't resolve this algorithmically, and so simply needed to create a space where everyone came in to resolve the connections.

Centralized pool for resources - the university used to have machine labs or computing centers meaning that the university cuts it back bc its too costly

- Documenting these stories.

claim - Teaching mission of the university is missing, I think they should engage in teaching during setting up these resources.

- evidence - super computing centers used to do this, but then get's lost.

Research Software Engineering groups -

- They get attached to grants and become the software engineering groups.

- evidence - the escience program in UK moved into software sustainability institute.

- - There was an atmosphere that this was super interesting vs a dedicated national service.

- evidence - the grid - came out of particle physics. Everyone needs to buy into it. It's been largely overtaken by major software companies.

claim - libraries have attempted to make sure that you can manage all the different data but scientists need to know super deeply what the data is in itself, and a library can't actually provide this service. You NEED domain specialists. Similarly, there can't just be a 'random' person who is focused on integrating data -> they won't be able to give you this info.

- evidence - how platforms emerge. Coring + tipping. The breadth of view is so large that they aren't going to spot the core of the data.

- evidence - cambridge at the institutional level can enforce data sharing. BUT sharing it isn't necessarily what makes it useful.

- Claim - we're not making the most of the networks that we have right now, it feels like you need a single person who has the knowledge to share and integrate data to the 'institutional' repository and then the institutional repository can be integrated at the more global level.

- is there not enough interest for this to happen?

- claim - main motivation for open source is to reduce the maintenance cost of things we build on in the future.

- evidence - when you write a grant application, you write a lit review but then when you need to write an updated lit review -> you need to bring a 1.5 year out of date lit review that you haven't explored.

OPEN QUESTIONS -

- How do we identify the goldilocks level - wide enough infrastructure to allow for everyone to be on the infrastructure, and narrow enough for specific groups to be able to customize to their ideal needs.

- evidence - data analysis, genome searching, these are places where everyone from many disciplines come together to ensure that you have a shared tech which brings ppl together.