Welcome to the THYME project

The overarching long-term vision of our research is to create novel technologies for processing clinical free text. Such technologies will enable sophisticated and efficient indexing, retrieval and data mining over the ever increasing amounts of electronic clinical data. Processing free text poses a number of challenges to which the fields of Artificial intelligence, natural language processing and computer science in general have made advances. Methods for processing free text are informed by linguistic theory combined with the power of statistical inferencing. A key component to the next step, natural language understanding, is discovering events and their relations on a timeline. Temporal relations are of prime importance in biomedicine as they are intrinsically linked to diseases, signs and symptoms, and treatments. Understanding the timeline of clinically relevant events is key to the next generation of translational research where the importance of generalizing over large amounts of data holds the promise of deciphering biomedical puzzles.

The best methods have been/will be released as part of the cTAKES (ctakes.apache.org) for the larger community to use and contribute to. We will test the methods against biomedical queries.

Funding

Phase 1 of the project (2010-2014) was supported in part by the i2b2 project (U54LM008748 from the National Library of Medicine) and THYME R01LM010090 from the National Library Of Medicine.

Phase 2 (2015-2018) is supported by THYME R01LM010090 from the National Library Of Medicine. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Library Of Medicine or the National Institutes of Health.

Drs. Pustejovsky, Palmer and Savova are members of the Program Committee of the 2012 i2b2 shared task whose topic is temporal relations in the clinical domain. The THYME annotation guidelines are the basis of the annotation guidelines for that shared task.

Participation in the State of the Art of Clinical NLP workshop organized by the NLM in April, 2012. Dr. Savova chaired a session, Prof. Pustejovsky was an invited speaker presenting on Temporal relations/TimeML.

Participation and presentation in the AMIA Fall symposium workshop on Natural Language Processing and data. Dr. Savova presented THYME work as part of the data workshop.

Getting access to the THYME corpus and gold standard annotations

The THYME corpus with the gold standard annotations is available to others involved in NLP research under a data use agreement (DUA) with Mayo Clinic. The steps for obtaining a DUA are outlined below. After the DUA has been completed, the THYME corpus is available via a secure download mechanism. Distribution of the corpus is supported by grant LM010090 from the NIH; include the funding acknowledgment in your publications.

The corpus is released to an established or junior NLP investigator, formally associated with an institution; thus it is not released to a student. However, all students working with the investigator can have full access to the corpus under the DUA of the investigator. The investigator is urged to have the students work on the corpus on workstations that stay within the laboratory space of the investigator.

The steps for obtaining a DUA are:

Submit the THYME corpus request form, informing us about your institution, your principal investigator, and your intended use of the data.

A THYME investigator will send your principal investigator a DUA for you to add information to, and for you to have signed by your site's official signatory. The THYME investigator will provide instructions for returning the signed and completed DUA.

When you return the DUA, a THYME investigator will arrange to talk with your principal investigator. Of note, the discussion must be with the lab's principal investigator, not a student/postdoc/administrator. Topics that will be addressed include allowable uses of the data and proper security measures.

Once the DUA is complete and a THYME investigator has confirmed your understanding of the DUA, you will be sent instructions for obtaining the corpus via a secure downloading mechanism.

Annotation guidelines

i2b2 Simplified THYME Guidelines (PDF) The guidelines provided to the organizers of the 2012 Temporal relations i2b2 challenge for consideration during planning. They reflect an earlier stage of our guidelines.

Tool for viewing the gold standard annotations - Anafora

Chen, Wei-Te and Styler, Will. 2013. Anafora: A Web-based General Purpose Annotation Tool. Proceeding of the North American Association for Computational Linguistics Conference. Atlanta, GA, June 9-13. http://www.aclweb.org/anthology/N13-3004.

We are also developing a visualization tool (THYME viz tool) which will be made available in cTAKES. A prototype and details of the THYME vizualization tool was presented by Sean Finan at several annual workshops.