James Howison

Property "Interested In" (as page type) with input value "The query result could not be obtained from the SPARQL database. This error might be temporary or indicate a bug in the database software." contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process. Property "Member of" (as page type) with input value "The query result could not be obtained from the SPARQL database. This error might be temporary or indicate a bug in the database software." contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

James Howison
Timezone	America/Chicago (GMT−06:00/GMT−05:00)
Relevant Projects	SoftCite, CiteAs, GROBID
Topic Interests	Machine Learning, Bibliometrics, Credit Assignment, Research Software, Citation
Group(s)	Table 4, Interdisciplinary Models
Table Assignment	Table 4

Discord

sneakers-the-rat#discourse-modeling22-11-12 19:32:48

Page Schemas#Creating a new Schema Page schemas is mostly a handy way to generate boilerplate templates and link them to semantic properties. A Form (using Page Forms is something that is an interface for filling in values for a template.

For an example of how this shakes out, see Category:Participant Template:Participant Form:Participant

go to a `Category:CategoryName` page, creating it if it doesn't already exist.
Click "Create schema" in top right
If you want a form, check the "Form" box. it is possible to make a schema without a form. The schema just defines what pages will be generated, and the generated pages can be further edited afterwards (note that this might make them inconsistent with the schema)
Click "add template" If you are only planning on having one template per category, name the template the same thing as the category.
Add fields! Each field can have a corresponding form input (with a type, eg. a textbox, token input, date selector, etc.) and a semantic property.
Once you're finished, save the schema
Click "Generate pages" on the category page. Typically you want to uncheck any pages that are already bluelinks so you don't overwrite them. You might have to do the 'generate pages' step a few times, and it can take a few minutes, bc it's pretty buggy.

Workshop Submission

What's your interest in this workshop?

With what "frame" do you approach the workshop? (or identity)?

Tool-builder

What materials can you contribute to the workshop for consideration?

We have created and published: A gold-standard human annotation dataset, called SoftCite, consisting of just under 5,000 annotated academic PDFs (in biomedicine and economics) https://github.com/howisonlab/softcite-dataset/ A machine learning model for software entity extraction pipeline which processes scholarly PDFs directly and recognizes software mentions as good levels of performance with real distribution (F-scores ~0.80) https://github.com/softcite/software_mentions_client Published new method approaches for machine learning on unbalanced datasets (Lopez et al., 2021) A dataset of extractions from the CORD-19 paper collection (Du et al., 2021), which we have used to manually sample and describe current progress on software citation https://zenodo.org/record/5140437 and https://peerj.com/articles/cs-1022/

A prototype knowledge base which links software mentions and bibliometric data, allowing location of publications mentioning specific packages. https://github.com/softcite/softcite_kb

I have also bee part of ground studying the connection between "Theory" and "Code": https://arxiv.org/abs/1910.09902

Organizer-estimated Topics

Machine Learning, Credit Assignment, Bibliometrics