17
Natural Text: MEDLINE Journal Abstracts BACKGROUND: The most challenging aspect of revision hip surgery is the management of bone loss. A reliable and valid measure of bone loss is important since it will aid in future studies of hip revisions and in preoperative planning. We developed a measure of femoral and acetabular bone loss associated with failed total hip arthroplasty. The purpose of the present study was to measure the reliability and the intraoperative validity of this measure and to determine how it may be useful in preoperative planning. METHODS: From July 1997 to December 1998, forty-five consecutive patients with a failed hip prosthesis in need of revision surgery were prospectively followed. Three general orthopaedic surgeons were taught the radiographic classification system, and two of them classified standardized preoperative anteroposterior and lateral hip radiographs with use of the system. Interobserver testing was carried out in a blinded fashion. These results were then compared with the intraoperative findings of the third surgeon, who was blinded to the preoperative ratings. Kappa statistics (unweighted and weighted) were used to assess correlation. Interobserver reliability was assessed by examining the agreement between the two preoperative raters. Prognostic validity was assessed by examining the agreement between the assessment by either Rater 1 or Rater 2 and the intraoperative assessment (reference standard). RESULTS: With regard to the assessments of both the femur and the acetabulum, there was significant agreement (p 0.75. There was also significant agreement (p 0.75. CONCLUSIONS: With use of the newly developed classification system, preoperative radiographs are reliable and valid for assessment of the severity of bone loss that will be found intraoperatively. Extract number of subjects, type of study, conditions, etc.

26
How can we pose this as a classification (or learning) problem? classifier

27
Landscape of ML Techniques for IE: Any of these models can be used to capture words, formatting or both. Classify Candidates Abraham Lincoln was born in Kentucky. Classifier which class? Sliding Window Abraham Lincoln was born in Kentucky. Classifier which class? Try alternate window sizes: Boundary Models Abraham Lincoln was born in Kentucky. Classifier which class? BEGINENDBEGINEND BEGIN Finite State Machines Abraham Lincoln was born in Kentucky. Most likely state sequence? Wrapper Induction Abraham Lincoln was born in Kentucky. Learn and apply pattern for a website PersonName

41
Learning: IE as Classification The set of training examples is all of the boundaries in a document The goal is to approximate two extraction functions Begin and End : 1 if i begins a field 0 otherwise Begin(i)= Date : Thursday, October 25 Time : 4 : 15 - 5 : 30 PM End Begin POSITIVE (1) ALL OTHERS NEGATIVE (0)

44
BWI: Learning to detect boundaries Another formulation: learn three probabilistic classifiers: –Begin(i) = Prob( position i starts a field) –End(j) = Prob( position j ends a field) –Len(k) = Prob( an extracted field has length k) Then score a possible extraction (i,j) by Begin(i) * End(j) * Len(j-i) Len(k) is estimated from a histogram Begin(i) and End(j) learned by boosting over simple boundary patterns and features [Freitag & Kushmerick, AAAI 2000]

45
Problems with Sliding Windows and Boundary Finders Decisions in neighboring parts of the input are made independently from each other. –Sliding Window may predict a seminar end time before the seminar start time. –It is possible for two overlapping windows to both be above threshold. –In a Boundary-Finding system, left boundaries are laid down independently from right boundaries, and their pairing happens as a separate step.

47
Some sequential patterns Something interesting in the sequence of fields that wed like to capture –Authors come first –Title comes before journal –Page numbers come near the end –All types of things generally contain multiple words

57
Data regularity is important! As the regularity decreases, so does the performance Algorithms interact differently at with different levels of regularity Natural text Highly structured Partially structured

58
How important are features? wildcardsspeakerlocationstimeetime none15.169.295.783.4 just 49.473.599.395.0 default67.776.799.494.6 lexical73.5--- default: a set of eight wildcards lexical: task specific lexical resources: : common first names released by U.S. Census Bureau. : common last names : tokens not found in /usr/dict/words on Unix - One of the challenges for IE methods is generalizability - Wildcards can help with this

62
Collaborative Searching What are other gains that can be achieved through collaborative searching? What are cons to collaborative searching? Who do you think will be the primary users of collaborative searching sites?