James Howison: Difference between revisions

From Synthesis Infrastructures
(Undo revision 466 by Jonny (talk))
No edit summary
 
Line 1: Line 1:
{{Participant
{{Participant
|Timezone=America/Chicago (GMT−06:00/GMT−05:00)
|Timezone=America/Chicago (GMT−06:00/GMT−05:00)
|Projects=SoftCite
|Projects=SoftCite, CiteAs, GROBID
|Interests=Research Software, Citation, Credit Assignment, Bibliometrics
|Table Assignment=Table 4
|Table Assignment=Table 4
}}
}}

Latest revision as of 14:35, 13 November 2022



James Howison
Timezone America/Chicago (GMT−06:00/GMT−05:00)


Relevant Projects SoftCite, CiteAs, GROBID
Topic Interests Machine Learning, Bibliometrics, Credit Assignment, Research Software, Citation
Group(s) Table 4, Interdisciplinary Models
Table Assignment Table 4






Discord

sneakers-the-rat#discourse-modeling22-11-12 19:32:48

Page Schemas#Creating a new Schema Page schemas is mostly a handy way to generate boilerplate templates and link them to semantic properties. A Form (using Page Forms is something that is an interface for filling in values for a template.

For an example of how this shakes out, see Category:Participant Template:Participant Form:Participant

  • go to a `Category:CategoryName` page, creating it if it doesn't already exist.
  • Click "Create schema" in top right
  • If you want a form, check the "Form" box. it is possible to make a schema without a form. The schema just defines what pages will be generated, and the generated pages can be further edited afterwards (note that this might make them inconsistent with the schema)
  • Click "add template" If you are only planning on having one template per category, name the template the same thing as the category.
  • Add fields! Each field can have a corresponding form input (with a type, eg. a textbox, token input, date selector, etc.) and a semantic property.
  • Once you're finished, save the schema
  • Click "Generate pages" on the category page. Typically you want to uncheck any pages that are already bluelinks so you don't overwrite them. You might have to do the 'generate pages' step a few times, and it can take a few minutes, bc it's pretty buggy.


Workshop Submission

What's your interest in this workshop?

With what "frame" do you approach the workshop? (or identity)?

Tool-builder

What materials can you contribute to the workshop for consideration?

We have created and published: A gold-standard human annotation dataset, called SoftCite, consisting of just under 5,000 annotated academic PDFs (in biomedicine and economics) https://github.com/howisonlab/softcite-dataset/ A machine learning model for software entity extraction pipeline which processes scholarly PDFs directly and recognizes software mentions as good levels of performance with real distribution (F-scores ~0.80) https://github.com/softcite/software_mentions_client Published new method approaches for machine learning on unbalanced datasets (Lopez et al., 2021) A dataset of extractions from the CORD-19 paper collection (Du et al., 2021), which we have used to manually sample and describe current progress on software citation https://zenodo.org/record/5140437 and https://peerj.com/articles/cs-1022/

A prototype knowledge base which links software mentions and bibliometric data, allowing location of publications mentioning specific packages. https://github.com/softcite/softcite_kb

I have also bee part of ground studying the connection between "Theory" and "Code": https://arxiv.org/abs/1910.09902

Organizer-estimated Topics

Machine Learning, Credit Assignment, Bibliometrics