James Howison | |
---|---|
Timezone | America/Chicago (GMT−06:00/GMT−05:00)
|
Relevant Projects | SoftCite, CiteAs, GROBID |
Topic Interests | Machine Learning, Bibliometrics, Credit Assignment, Research Software, Citation |
Group(s) | Table 4, Interdisciplinary Models |
Table Assignment | Table 4
|
Discord
Page Schemas#Creating a new Schema Page schemas is mostly a handy way to generate boilerplate templates and link them to semantic properties. A Form (using Page Forms is something that is an interface for filling in values for a template.
For an example of how this shakes out, see Category:Participant Template:Participant Form:Participant
- go to a `Category:CategoryName` page, creating it if it doesn't already exist.
- Click "Create schema" in top right
- If you want a form, check the "Form" box. it is possible to make a schema without a form. The schema just defines what pages will be generated, and the generated pages can be further edited afterwards (note that this might make them inconsistent with the schema)
- Click "add template" If you are only planning on having one template per category, name the template the same thing as the category.
- Add fields! Each field can have a corresponding form input (with a type, eg. a textbox, token input, date selector, etc.) and a semantic property.
- Once you're finished, save the schema
- Click "Generate pages" on the category page. Typically you want to uncheck any pages that are already bluelinks so you don't overwrite them. You might have to do the 'generate pages' step a few times, and it can take a few minutes, bc it's pretty buggy.
Workshop Submission
What's your interest in this workshop?
With what "frame" do you approach the workshop? (or identity)?
Tool-builder
What materials can you contribute to the workshop for consideration?
We have created and published: A gold-standard human annotation dataset, called SoftCite, consisting of just under 5,000 annotated academic PDFs (in biomedicine and economics) https://github.com/howisonlab/softcite-dataset/ A machine learning model for software entity extraction pipeline which processes scholarly PDFs directly and recognizes software mentions as good levels of performance with real distribution (F-scores ~0.80) https://github.com/softcite/software_mentions_client Published new method approaches for machine learning on unbalanced datasets (Lopez et al., 2021) A dataset of extractions from the CORD-19 paper collection (Du et al., 2021), which we have used to manually sample and describe current progress on software citation https://zenodo.org/record/5140437 and https://peerj.com/articles/cs-1022/
A prototype knowledge base which links software mentions and bibliometric data, allowing location of publications mentioning specific packages. https://github.com/softcite/softcite_kb
I have also bee part of ground studying the connection between "Theory" and "Code": https://arxiv.org/abs/1910.09902
Organizer-estimated Topics
Machine Learning, Credit Assignment, Bibliometrics