Hi,
currently the score of a match is influenced only by the position and the
number of occurences of a term. Shouldn't the length of the document also
play a role? If a word occurs twice in a short document, isn't that more
relevant than twice in a very long document?
Something like this:
$faktor = ($tdf{$doc_id}/$size+0.5);
$weight = $tdf{$doc_id} * $faktor * log($DN / $df);
(0.5 ist just some trial'n'error value)
Regards
Daniel