Hello everyone! I have a table @tous_mots and which contains all the words of two documents and I want to find the TFIDF score for each ward and print it. Do you know how I could do it? The following code doesn't work. If you have any suggestions or idea, it would be really helpful, I am new with modules. my $i; foreach $i(@tous_mots){ my $a=new Text::TFIDF(file=>["a.sans_outils.txt","b.sans_outils.txt"]);

Thank you very much Bill!! I changed the code and it seems much better, I have one last question. I am using the module Text::TFIDF for a french text and for the function TFIDF I get a lot of "Use of Uninitialized value in multiplication <*> at TFIDF". Do you have any idea why this is happenig? My prof in the university didn't know :(

Before you give up and post all your code and data, there are several things you should do. Try to limit the scope of your question.

Does the code appear to "work" despite the error?

Do you get the error for every word?

How are the words that cause the error special?

Are they all long words? Do they all contain special characters? Do the all appear in one document? Both documents? Neither document?

Does a short list of words always work? Does the error occur with English documents? Do the words in you list contain whitespace or other separation characters at their start or end.

Now create a very short program (one that we can execute) which demonstrates the error. The list should be no more than about five words (probably hard coded in the script). The program does not have to print correct answers, only demo the error. Post this program (and related documents) as attachments. Good Luck, Bill

You have some errors in the code you posted. Loop should be: foreach my $I (0 .. $#tous_mots) to get the index of the word's array. Also, though not an error, you shouldn't use $a and $b variables because they're special vars. for the sort routine (and some others). You might want to use something like: my $tf = new Text::TFIDF(file => ["a.sans_outils.txt","b.sans_outils.txt"]); . (Also, that should be properly declared before the loop begins).

my $b=$a->TFIDF("a.sans_outils.txt",$mots1[$i]); should avoid the $b variable and better written as my $wgt = $tf->TFIDF("a.sans_outils.txt",lc($tous_mots[$i]));

The print is using an array you didn't have earlier and you probably want print "$tous_mots[$i] $wgt\n";

Note that this module lowercases the words internally, so you should be lowercasing the word you give it. (see above where lc($tous_mots[$i]));

The code for Text::TFIDF can be examined and it shows the low case operation. You'll find this on line 93 (my $line = lc($_);).

You stated in your post I am using the module Text::TFIDF for a french text and for the function TFIDF I get a lot of "Use of Uninitialized value in multiplication <*> at TFIDF". Do you have any idea why this is happening?

The reason most likely is that any words you search for weight that have uppercase letters will not be found by the module because internally, it lowercases all the words in the document.

Hopefully, this will get you on the way to a solution.

The reason you are getting negative results is because of the the calculation for word frequency involves log base 10 and if the calculation yields a number lass than 1, the log will be negative. (See the IDF function in Text::TFIDF).