Filatova

Event-Based Extractive Summarization: Event-Based Extractive Summarization Elena Filatova, Vasileios Hatzivassiloglou Columbia University
What I am not going to talk about: What I am not going to talk about What summarization is Summarization task is very difficult How to evaluate summaries What I am going to talk about Usage of events for identifying the text snippets containing important information Comparing events with other information defining features What features are better in what case
What do we need to create an extractive summary?: What do we need to create an extractive summary? Identify what information is important and should be included into the summary Break the input text into textual units (sentences, clauses, etc.) Score every textual unit according to what information is covered in it textual units information features Choose the textual unit that should be added to the summary repeat until we reach the desired length rescore the textual units based on what information is already covered by the summary
Identifying text snippets containing important information: Identifying text snippets containing important information Lexical features: Words: tf*idf weights show what words are important Words used in titles and section headings (Luhn’59) Presence of cue phrases in the textual unit: in conclusion, significant (Kupiec et al’95) Co-occurrence of some particular terms: lexical chains (Barzilay & Elhadad’97), topic signatures (Lin & Hovy’2000) Non-lexical features: Textual unit’s position in the input text: head-line, first sentence in the paragraph (Baxedale’58) Rhetorical representation of the source text (Marcu’97) We suggest to use atomic events as features signaling out the important sentences
Atomic events: Atomic events Atomic events = Relation + Connector (potential label for the relation) Relation is a pair of Named Entities or significant nouns For the input text, get all possible pairs of named entities within one sentence For every relation analyze all the verbs and action defining nouns in-between the named entities in the relation, these verbs/nouns can be used as labels for the extracted relations Atomic events described in (Filatova & Hatzivassiloglou 2003)
Topic: China Airlines Crash: Topic: China Airlines Crash The flight was from Bali to Taipei. It crashed several yards short of the runway and all 196 on board were believed dead. This crash also killed many people who lived in the residential neighborhood where the plane hit the ground. China Airlines Flight 676 from Bali to Taipei crashes PLACE: Taipei, Taiwan WHEN: February 16, 1998 Topic is a collection of text clustered around one major event
Extract pairs of named entities (relations): Extract pairs of named entities (relations) For a collection of texts analyze those sentences that contain more than one named entity (BBN IdentiFinder) Extract all possible pairs of named entities (relations) Calculate normalized frequencies for all relations Normalized frequency of a relation = n/N, where n – frequency of the current relation in a topic N – overall frequency of all relations in a topic
Extract words in-between each pair of named entities (connectors): Extract words in-between each pair of named entities (connectors) for every relation extract all the verbs that appear in-between the elements of this relation (in-between named entities) norm.freq. of a connector=c/S, where c – frequency of the current connector in a relation S – overall frequency of all connectors for a relation
Atomic Event = Relation + Connector: Atomic Event = Relation + Connector . China Airlines/ORG - crashed/VBD – Taiwan/LOCATION China Airlines/ORG - crashed/VBD – Monday/DATE Example of an atomic event Atomic event score: The score of the atomic event predicts how well the important this atomic event for the collection of texts is 0.0212*0.0312 0.0170*0.0311
Experiment 1: Experiment 1 Identify what information is important and should be included into the summary Break the input text into textual units (sentences, clauses, etc.) Score every textual unit according to what information is covered in it textual units information features Choose the textual unit that should be added to the summary repeat until we reach the desired length rescore the textual units based on what information is already covered by the summary Words (tf*idf) Atomic events sentences
Experiment 1: Experiment 1 Identify what information is important and should be included into the summary Break the input text into textual units (sentences, clauses, etc.) Score every textual unit according to what information is covered in it textual units information features Choose the textual unit that should be added to the summary repeat until we reach the desired length Words (tf*idf) Atomic events sentences
Scoring textual units: Scoring textual units T1 T2 T3 T4 F1 F2 F3 F4 F5 Every textual feature used for marking important information can have a weight: F1  W1 1 1 0 1 0 1 0 1 1 0 0 1 0 0 1 1 0 1 0 1 W1+W2+W4 W1+W3+W4 W2+W5 W1+W3+W5 COLING’2004 has more details on the model, based on mapping of the set of features marking important information onto the set of textual units words/events sentences
Scoring sentences: Scoring sentences China Airlines Flight 676 crashed this Monday in Taiwan Sentence score (in events) = 0.0170*0.0311 + 0.0212*0.0312 = 0.00118 Sentence score (in words) = 107*3.78 + 66*3.99 + … + 52*4.79 = 2079.9 Events Event Scores CA Flight 676 – crashed – Monday 0.0170*0.0311 CA Flight 676 – crashed – Taiwan 0.0212*0.0312 Words tf*idf (tf*log(idf)) China 107*3.78 (404.46) Airlines 66*3.99 (269.34) Flight 47*4.13 (194.11) 676 41*0 (0) crashed 116*5.79 (671.64) this 245*0.56 (137.2) Monday 49*2.76 (135.24) in 378*0.05 (18.9) Taiwan 52*4.79 (249.08)
Experiment 1: Experiment 1 Identify what information is important and should be included into the summary Break the input text into textual units (sentences, clauses, etc.) Score every textual unit according to what information is covered in it textual units information features Choose the textual unit that should be added to the summary repeat until we reach the desired length Words (tf*idf) Atomic events sentences
Experiment 1 (choosing sentences): Experiment 1 (choosing sentences) Algorithm Choose the sentence with the maximum score Continue choosing sentences until the overall length of the summary exceeds the allowed limit (e.g., 100 words) Truncate the last sentence leaving in the summary exactly the allowed number of words Goal Compare evaluation scores of summaries obtained with words (weighted by tf*idf) vs. events (weighted by event scores)
Data and evaluation: Data and evaluation 30 document sets used in DUC 2001 Summaries of length 50, 100, 200 and 400 words For every possible output 4 human models against which we evaluate our summaries ROUGE scores for evaluation Unigrams, average Stopwords are disregarded For every DUC 2001 document set get 8 summaries: 50, 100, 200, 400 word summaries based on words’ tf*idf weights 50, 100, 200, 400 word summaries based on events’ weights
Evaluation methodology: Evaluation methodology ROUGE scores (Lin & Hovy’2003) based on the presence in the evaluated summary of n-grams in one or more model summaries Official evaluation method in DUC 2004 Strength: Fully automatic Weakness: The scores are not absolute => pure averaging of a system scores across all the document sets is not informative To address this, we compare in how many cases event-based summarizer outperforms word-based summarizer and vice versa It is not clear whether human summaries would have been the same if the annotators had been asked not for a general summary but for a list of major events
Experiment 1. Results (Static Greedy Algorithm): Experiment 1. Results (Static Greedy Algorithm) 30 document sets The number of cases one feature outperforms the other feature Base-line
Experiment 1 (length: 400 words): Experiment 1 (length: 400 words)
Domain dependence of features: Domain dependence of features For what document sets atomic events got better ROUGE scores than words with tf*idf weights? Documents are clustered around one specific event In what cases words with tf*idf weights outperformed atomic events? The same types of documents for which atomic events have very low weights History of airplane crashes Clarence Thomas ascendancy to the Supreme Court
Avoiding redundancy: Avoiding redundancy Clustering (McKeown et al’99) Cluster all the textual units and add to the final summary only one representative from every class MMR: Maximal Marginal Relevance (Goldstein et al’00) Minimizing textual overlap Our suggestion: separate information and text Minimize the overlap of the features marking important information No mechanism for avoiding redundancy is used in Experiment 1
Experiment 2: Experiment 2 Identify what information is important and should be included into the summary Break the input text into textual units (sentences, clauses, etc.) Score every textual unit according to what information is covered in it textual units information concepts Choose the textual unit that should be added to the summary repeat until we reach the desired length rescore the textual units based on what information is already covered by the summary Words (tf*idf) Atomic events sentences
Rescoring sentences: Rescoring sentences After a sentence is added to the final output recalculate the scores of all the sentences: From the initial score of every sentence subtract the weights of all the information features which are already covered in the output
Rescoring sentences: Rescoring sentences T1 1 1 0 1 0 Output covers: F1, F2, F4
Rescoring sentences: Rescoring sentences T1 T2 T3 T4 F1 F2 F3 F4 F5 1 1 0 1 0 1 0 1 1 0 0 1 0 0 1 1 0 1 0 1 W1+W2+W4 = 0 W1+W3+W4 = W3 W2+W5 = W5 W1+W3+W5 = W3+W5 Output covers: F1, F2, F4 0 0 0 0 0 0 0 T1 1 1 0 1 0 T4 1 0 1 0 1 , F3, F5
Experiment 2 (Adaptive Greedy Algorithm): Experiment 2 (Adaptive Greedy Algorithm) Choose the sentence with the max score Rescore all the sentences Continue choosing sentences until the overall length of the summary is >= than allowed 50, 100, 200, 400 words Truncate the last sentence leaving in the summary the exactly the allowed amount of words
Experiment 2 (Adaptive Greedy). Results. : Experiment 2 (Adaptive Greedy). Results. Usage of Adaptive Greedy Algorithm gives greater advantage to event-based summarizer over tf*idf-based summarizer than Static Greedy Algorithm
Observation: Static vs Adaptive Greedy: but also word-based summarizer works worse using Adaptive Greedy Algorithm Observation: Static vs Adaptive Greedy Not only event-based summarizer works better using Adaptive Greedy Algorithm Events Words (tf*idf) Why tf*idf is not compatible with the presented information redundancy component? In contrast to atomic events words exhibit more dependence on each other: the sentences containing the important words (for ex., names of people) already in the final output are not likely to be selected, even despite the fact that in the new sentences these words can depict different relations between each other
Information features’ weights: Information features’ weights The weights of the information features can be sorted F1 – w1 F2 – w2 F3 – w3 F4 – w4 F5 – w5 It is more important to cover feature F1 than feature F5 => First, analyze only those sentences that contain F1 Extract the one with the highest score Check the features that are covered Choose the next most important feature T1 covers: F1, F2, F4 Analyze the sentences containing F3 go to step 2 This modification prevents putting into the summary very long sentences containing a lot of unimportant information features
Experiment 3 (Modified Adaptive Greedy): Experiment 3 (Modified Adaptive Greedy) Among the sentences containing the most important uncovered information feature choose the one with the max score Rescore all the sentences Continue choosing sentences (go to step 1) until the overall length of the summary is >= than allowed 50, 100, 200, 400 words Truncate the last sentence leaving in the summary the exactly the allowed amount of words
Experiment 3. Results. (Modified Adaptive): Experiment 3. Results. (Modified Adaptive) Tackles the redundancy problem for word features Event features still work better than word features
Conclusions: Conclusions Event-based summarizer gives high performance for newswire documents When it is not possible to extract good atomic events, the word-based summarizer gives better performance. The choice of the information-marking feature influences the algorithm that should be used Algorithm giving good performance for some features can be bad for other features Separation of information (e.g., events) from the text helps avoiding redundancy Overlap of information features is more informative than pure text overlap
Thank you very much: Thank you very much For attending my talk on this pleasant evening  Do you have questions?
Slide34: Why tf*idf is not compatible with the presented information redundancy component? In contrast to atomic events words exhibit more dependence on each other: the sentences containing the important words (for ex., names of people) already in the final output are not likely to be selected, even despite the fact that in the new sentences these words can depict different relations between each other.
Slide35: Clustering (McKeown et al’99) Cluster all the textual units and add to the final summary only one representative from every class MMR: Maximal Marginal Relevance (Goldstein et al’00) Minimizing textual overlap between the textual unit that is about to be added to the summary and the textual units that are already in the summary Our suggestion: separate information and text Minimize the overlap of the features marking important information between those that are covered in the textual unit under consideration and the concepts already covered in the summary No mechanism for avoiding redundancy is used in Experiment 1
Adaptive Greedy Algorithm (length: 400 words): Adaptive Greedy Algorithm (length: 400 words)
What is the optimal way?: What is the optimal way? Get all the information that is of interest Because this is why we are creating the output Avoid repetitions of the same information in the output Because the final output usually has a limitation on the length and it is important to put as much of the information as possible while repetitions can waste valuable space.
Maximum Coverage Problem: Maximum Coverage Problem T1 T2 T3 T4 C1 C2 C3 C4 C5 1 1 0 1 1 1 0 0 1 0 0 1 0 0 1 1 0 1 1 1 Given a set of vectors T={Ti, i=1..N}, where every Ti covers a subset of weighted elements from the set C={Cj, j=1..N} and integer k, choose the subset of k elements from T so that the coverage of the elements from C is maximized Extract sentences (as many as the length permits) to cover as many concepts of interest as possible T – set of textual units each of which might contains some concepts C - set of concepts that should be covered If extractive summarization task is treated as Maximum Coverage Problem then we need: Set of textual units Set of conceptual units Length of the summary (in sentences textual units) COLING’2004 detailed analysis of the model, based on mapping of the set of conceptual units onto the set of textual units
Slide41: Summarization task Maximum coverage problem Set cover problem NP-hard Summarization task is a very difficult problem There are so many topics for PhD dissertations  No exact solution exists. Only algorithms for approximation
Solving Maximum Coverage Problem: Solving Maximum Coverage Problem Greedy algorithm is provably the best solution for MCP On every step add the subset (textual units) that maximizes the overall score (the overall weight of the concepts covered by the summary) I(OPT) >= I(GREEDY) >= [1 – (1 – 1/k)k] I(OPT) > (1 – 1/e) I(OPT) ~ 0.6321* I(OPT) Where, I is the overall weight of the concepts covered by the summary K is the maximal amount of textual units that the summary can contain
Avoiding redundancy: Avoiding redundancy Clustering (McKeown’99) Cluster all the textual units and add to the final summary only one representative from every class MMR: Maximal Marginal Relevance (Goldstein’00) Minimizing textual overlap between the textual unit that is about to be added to the summary and the textual units that are already in the summary Our suggestion Minimize the overlap of concepts between those that are covered in the textual unit under consideration and the concepts already covered in the summary Greedy algorithms for summarization task formulated in terms of Maximum Coverage Problem can tackle the problem of redundancy reduction
Experiments: Experiments 3 variations of the greedy algorithm for MCP Textual units – sentences Conceptual units: All the words with their tf*idf scores as concept weights Atomic events with their scores as concept weights Collection: 30 document sets from DUC 2001 with the human created models of various lengths: 50, 100, 200 and 400 words Evaluation methodology: ROUGE scores Length: exactly 50, 100, 200, 400 words from the top N sentences.
Static Greedy Algorithm: Static Greedy Algorithm Input document(s) is broken into textual units For every textual unit, calculate its weight the sum of the weights of all the concepts covered by it. Choose the textual unit with the maximum weight and add it to the final output. Continue extracting other textual units in order of total weight till we get the summary of the desired length. This algorithm does not have any mechanism for avoiding redundancy
Results. Static Greedy Algorithm: Results. Static Greedy Algorithm
Static Greedy Algorithm (length: 400 words): Static Greedy Algorithm (length: 400 words)
Domain dependence of features: Domain dependence of features For what document sets atomic events got better ROUGE scores than words with tf*idf scores? Documents are clustered around one specific event In what cases words with tf*idf scores outperformed atomic events? The same types of documents for which atomic events have very low scores History of airplane crashes Clarence Thomas ascendancy to the Supreme Court
Adaptive Greedy Algorithm: Adaptive Greedy Algorithm Input document(s) is broken into textual units For every textual unit, calculate its weight the sum of the weights of all the concepts covered by it. Choose the textual unit with the maximum weight and add it to the final output. Add the concepts covered by this textual unit to the list of concepts covered in the final output. Recalculate the weights of the textual units: subtract from each unit’s weight the weight of all concepts in it that are already covered in the output. Continue extracting text units in order of their total weight (going back to step 3 each time) until we get a summary of the desired length. This algorithm provides a mechanism for avoiding redundancy by disregarding from the analysis those concepts which are already covered.
Results. Adaptive Greedy Algorithm: Results. Adaptive Greedy Algorithm Using Adaptive Greedy Algorithm gave greater advantage to event-based summarizer over tf*idf-based summarizer than Static Greedy Algorithm
Adaptive Greedy Algorithm (length: 400 words): Adaptive Greedy Algorithm (length: 400 words)
Observation: Static vs Adaptive Greedy: but also words-based summarizer works worse using Adaptive Greedy Algorithm Observation: Static vs Adaptive Greedy Not only event-based summarizer works better using Adaptive Greedy Algorithm Events Words (tf*idf) Why tf*idf is not compatible with the presented information redundancy component? In contrast to atomic events words exhibit more dependence on each other: the sentences containing the important words (for ex., names of people) already in the final output are not likely to be selected, even despite the fact that in the new sentences these words can depict different relations between each other.
Modified Adaptive Greedy Algorithm: Modified Adaptive Greedy Algorithm Input document(s) is broken into textual units For every textual unit, calculate its weight the sum of the weights of all the concepts covered by it. Consider only those textual units that contain the concept with the highest weight that has not yet been covered. Out of these, choose the one with highest total weight and add it to the final output. Add the concepts which are covered by this textual unit to the list of concepts covered in the final output. Recalculate the weights of the textual units: subtract from each unit’s weight the weight of all concepts in it that are already covered in the output. Continue extracting text units in order of their total weight (going back to step 3 each time) until we get a summary of the desired length.
Results. Modified Adaptive Greedy Algorithm: Results. Modified Adaptive Greedy Algorithm Tackles the redundancy problem for word features Events features still work better than word features
Usage of ROUGE for evaluation: Usage of ROUGE for evaluation Completely automatic Fair correspondence of unigram scores to human scores The scores are not absolute => not possible to compare results for different document sets Advantages: Disadvantages: Not clear whether human summaries would have been the same if the annotators had been asked not for a general summary but for a list of major events Advantages:
Conclusions: Conclusions The nature of the input text suggests what feature should be used for marking important information Overall, newswire documents summarization events-based summarizer gives better performance For the document sets, for which it is not possible to extract good events, words-based summarizer (using tf*idf scores) gives better performance. The choice of the information-marking feature influences the algorithm that should be used Algorithm giving good performance for some features can be bad for other features
Thank you very much: Thank you very much For attending my talk on this pleasant evening  Do you have questions?
What is an atomic event? (2): What is an atomic event? (2) Atomic events = Relation + Label for the relation Relation = 2 Named Entities (or high frequency nouns which are related to each other) Connector = Verb or Action-noun defining/labeling the type of the relation between the 2 Named Entities Score = The score of the atomic event predicts how well the important this atomic event for the collection of texts is. *

Alina Filatova | LinkedIn

These presentations are classified and categorized, so you will always find everything clearly laid out and in context.
You are watching Dios Quiere 4 Transformar tu carácter presentation right now. We are staying up to date!