Categories

Meta

xml

Proaches for calculating recall and precision. In Herrera and Puri approach

Proaches for Acadesine site calculating recall and precision. In Herrera and Puri approach [17], they do not indicate any threshold. After ranking words according to an importance index, the last word of the glossary in the MK-886 web ranked list is found. Then, the number of words from the ranked list which include all the glossary words are selected as keywords. In this approach, they introduce fpsyg.2017.00209 a cut-off frequency; they keep only the words with frequencies greater or equal to the cut-off frequency both in ranked list and in the glossary and omit all other words with lower frequencies. For example cut-off frequency equal to 2 means only words with frequencies more than 1 are kept and other words are omitted. The number of words from ranked list and from the glossary for various choices of cut-off frequency are written in Table 4. In Fig 9, the recall and precision are plotted against the cut-off frequency. According to Fig 9, recall for Combined Measure is higher than other methods for cut-off frequencies greater than 5. This means that the proposed fractal method is superior to the others as a method for keyword extraction. The precision of Combined Measure is higher than C Value for all cut-off frequencies. If we rank the words according to their fractality we will find a power law relationship between the fractality of a word and its rank. Therefore, it is rational to choose the words with rank lesser than a specific value as the retrieved keywords list instead of using the fractality threshold. In Mehri and Darooneh approach [18], after ordering words due to their fractality, aPLOS ONE | DOI:10.1371/journal.pone.0130617 June 19,12 /The Fractal Patterns of Words in a TextTable 2. List of the twenty top-ranked words according to Combined Measure from the book The Origin of Species. These words are important according to the subject of the book. The word, f is related to some classification of species such as f8, f10, f14, . . . and some proper names. f is kept because non-alphabetical characters are removed in our method. Words slaves wax hybrids instincts sterility cuckoo illegitimate floated instinct masters lamellae pedicellariae cell nest f pupae cells fertility spheres clover Frequency 34 42 135 87 100 32 21 18 63 17 20 15 28 55 46 13 58 80 19 15 Fractality 17.42 15.65 10.89 11.85 11.27 14.89 wcs.1183 16.52 15.98 10.62 15.52 14.67 16.03 12.71 10.23 10.62 15.72 9.84 9.08 13.46 14.55 Combined Measure 26.68 25.40 23.20 23.00 22.53 22.40 21.85 20.07 19.11 19.10 19.09 18.85 18.39 17.80 17.66 17.51 17.36 17.27 17.22 17.doi:10.1371/journal.pone.0130617.tTable 3. List of ten words and their ranks from the book The Origin of Species. Words with equal Combined Measures take equal ranks. Words forward months saved treat observers gone inferiority agree icebergs laying really doi:10.1371/journal.pone.0130617.t003 Combined Measure 3.31199 3.31199 3.31115 3.31115 3.30809 3.30749 3.30647 3.30564 3.30447 3.30447 3.30164 rank 2128 2128 2130 2130 2132 2133 2134 2135 2136 2136PLOS ONE | DOI:10.1371/journal.pone.0130617 June 19,13 /The Fractal Patterns of Words in a TextTable 4. Number of vocabulary words and number of glossary words for various cut-off frequencies. Nv and Ng are the number of vocabulary words from the book and number of glossary words for each cut-off frequency, respectively. Cut-off Frequency 1 Nv Ng 8842 229 2 5351 157 3 4092 126 4 3428 109 5 2957 89 6 2624 79 7 2352 72 8 2141 65 9 1968 57 10 1855doi:10.1371/journal.pone.0130617.tFig 9. Results of calculating Rec.Proaches for calculating recall and precision. In Herrera and Puri approach [17], they do not indicate any threshold. After ranking words according to an importance index, the last word of the glossary in the ranked list is found. Then, the number of words from the ranked list which include all the glossary words are selected as keywords. In this approach, they introduce fpsyg.2017.00209 a cut-off frequency; they keep only the words with frequencies greater or equal to the cut-off frequency both in ranked list and in the glossary and omit all other words with lower frequencies. For example cut-off frequency equal to 2 means only words with frequencies more than 1 are kept and other words are omitted. The number of words from ranked list and from the glossary for various choices of cut-off frequency are written in Table 4. In Fig 9, the recall and precision are plotted against the cut-off frequency. According to Fig 9, recall for Combined Measure is higher than other methods for cut-off frequencies greater than 5. This means that the proposed fractal method is superior to the others as a method for keyword extraction. The precision of Combined Measure is higher than C Value for all cut-off frequencies. If we rank the words according to their fractality we will find a power law relationship between the fractality of a word and its rank. Therefore, it is rational to choose the words with rank lesser than a specific value as the retrieved keywords list instead of using the fractality threshold. In Mehri and Darooneh approach [18], after ordering words due to their fractality, aPLOS ONE | DOI:10.1371/journal.pone.0130617 June 19,12 /The Fractal Patterns of Words in a TextTable 2. List of the twenty top-ranked words according to Combined Measure from the book The Origin of Species. These words are important according to the subject of the book. The word, f is related to some classification of species such as f8, f10, f14, . . . and some proper names. f is kept because non-alphabetical characters are removed in our method. Words slaves wax hybrids instincts sterility cuckoo illegitimate floated instinct masters lamellae pedicellariae cell nest f pupae cells fertility spheres clover Frequency 34 42 135 87 100 32 21 18 63 17 20 15 28 55 46 13 58 80 19 15 Fractality 17.42 15.65 10.89 11.85 11.27 14.89 wcs.1183 16.52 15.98 10.62 15.52 14.67 16.03 12.71 10.23 10.62 15.72 9.84 9.08 13.46 14.55 Combined Measure 26.68 25.40 23.20 23.00 22.53 22.40 21.85 20.07 19.11 19.10 19.09 18.85 18.39 17.80 17.66 17.51 17.36 17.27 17.22 17.doi:10.1371/journal.pone.0130617.tTable 3. List of ten words and their ranks from the book The Origin of Species. Words with equal Combined Measures take equal ranks. Words forward months saved treat observers gone inferiority agree icebergs laying really doi:10.1371/journal.pone.0130617.t003 Combined Measure 3.31199 3.31199 3.31115 3.31115 3.30809 3.30749 3.30647 3.30564 3.30447 3.30447 3.30164 rank 2128 2128 2130 2130 2132 2133 2134 2135 2136 2136PLOS ONE | DOI:10.1371/journal.pone.0130617 June 19,13 /The Fractal Patterns of Words in a TextTable 4. Number of vocabulary words and number of glossary words for various cut-off frequencies. Nv and Ng are the number of vocabulary words from the book and number of glossary words for each cut-off frequency, respectively. Cut-off Frequency 1 Nv Ng 8842 229 2 5351 157 3 4092 126 4 3428 109 5 2957 89 6 2624 79 7 2352 72 8 2141 65 9 1968 57 10 1855doi:10.1371/journal.pone.0130617.tFig 9. Results of calculating Rec.