Bug Description

This bug has expanded a bit since it was originally filed in 2006. Here is the current state of things.

The Problem:
============

Doing a bug search can fail (turn up no results) despite the fact that the *exact* search string appears in the titles of some bugs that should be within the scope of the search.

Examples:
=========

* See bug 2753 (now a dupe of this). The thing being searched for was 'div' and the text indexed contained ' <div> '.

* See bug #360642, which is now marked as a dup of this. The
reporter says that searching for "from" failed to find results,
even though that's in the title ofhttps://bugs.edge.launchpad.net/ubuntu/+source/thunderbird/+bug/357864.
Here is the title of that bug, using "/" as the delimiter since the
title itself contains both double quotes and parens: /Editing the
"From" field for the current email only (as text, not dropdown)/.

I re-tested on 2010-02-22, and searching for either "from" or
"From", with or without double quotes around it, still fails to
turn up that bug. When I did a search for "from" (with no double
quotes) with the "Across all project" radio button selected, I got
exactly one result: 508760. It seems very unlikely that there'd be
exactly one hit for a search on "from" :-).

* There are two bugs with the string "community-contributions.py" in
their titles: bug #513608 (as of 2010-02-22 was in state
"confirmed", with summary "community-contributions.py script should
use Launchpad to determine who is not a Canonical employee") andbug #432742 (state "fix committed", with summary
"community-contributions.py script erroring on some Unicode (?)
input"). Both are in launchpad-foundations (not sure why, but no
matter).

Anyway, searching for "community-contributions.py" fails to turn up
any results when done across all projects, nor in
"launchad-project", nor in "launchpad-foundations", nor in
"launchpad".

Removing the ".py" and searching for "community-contributions" in
launchpad-project gets two hits: 393407 (which contains the words
"community" and "contributed" separately) and 484824 (which
contains "community" and "contributions" separately), but we still
don't get the bugs that have the exact match in their titles.

* In the original repro recipe for this bug, the reporter said "If I
search for 'sqlobject' onhttps://launchpad.net/products/launchpad/+bugs , I get no results
despite this term being in the title of Bug #3096, which is
currently in 'confirmed' status. Interestingly, you can see this
bug in the full bug list."

But bug #3096 is in "launchpad-foundations", and I'm not sure that
searching for it in "launchpad" would work anyway, since
"launchpad" is (AFAICT) just a grab-bag temporary holding area
anyway. So it may be that the original bug report here was a
misunderstanding, but that coincidentally, there is a real bug
whose symptoms match those that the original report described!

Possible causes
===============

Tokenisation of terms is done both in-DB and in-python, if these are mismatched we may have terms that simply cannot be searched on because the supplied search query won't ever match the indexed terms,

I've assigned this to myself for the purposes of investigation; since Yellow Squad is but a couple of weeks (excepting the Christmas break and the Thunderepic in January) from feature rotation I might not be able to fix this in the time available, but I might at least be able to shed some more light on it.

Another example: In the Mixxx project, searching for '1.10 crash' or even '1.10 crashes' does not include a bug titled 'mixxx.exe 1.10 Beta1 immediately crashes at startup' in its results, I'm guessing due to the presence of the period within the version number.

Note that this is some of our oldest code, based on tsearch2 with
PostgreSQL 8.0. Since then, tsearch has been moved into PostgreSQL
core, improved and documented (Chapter 12 of the PostgreSQL 9.1
manual, Full Text Search). Many of the issues may be fixable with
properly configuration (stop word lists etc.), and new facilities may
make it possible to simplify this old code (pluggable parsers etc.)

Thank you so much, for fixing such a longstanding bug. (Excuse me you consider this confirmation just bug spam.) It affected me recently, when LP did not find existing reports with this subject:

package manpages 3.35-0.1ubuntu1 failed to install/upgrade: trying to overwrite '/usr/share/man/man1/getent.1.gz', which is also in package libc-bin 2.15-0ubuntu15

(That was bug 1017289.) I had to browse the list of recent bugs for that package, to find the report, and mark duplicates. I tested an upload of the same crashfile, today, and LP directed me to the existing bug. To me this is the difference between the bug reporter feeling lost in a maze of twisty little passages, or, feeling well-guided, and that's the difference between participating or giving up.

Seeing a bug that's a million numbers old, getting fixed, is a morale booster in itself, too. A million thanks. :)

On 29.06.2012 21:31, Edward Donovan wrote:
> Thank you so much, for fixing such a longstanding bug. (Excuse me
> you consider this confirmation just bug spam.) It affected me
> recently,

Bug reports are for conversations about bugs, and conversations have
also some social aspects. So your comment is definitely not spam.

> when LP did not find existing reports with this subject:
>
> package manpages 3.35-0.1ubuntu1 failed to install/upgrade: trying
> to overwrite '/usr/share/man/man1/getent.1.gz', which is also in
> package libc-bin 2.15-0ubuntu15
>
> (That was bug 1017289.) I had to browse the list of recent bugs
> for that package, to find the report, and mark duplicates. I
> tested an upload of the same crashfile, today, and LP directed me
> to the existing bug. To me this is the difference between the bug
> reporter feeling lost in a maze of twisty little passages, or,
> feeling well-guided, and that's the difference between
> participating or giving up.
>
> Seeing a bug that's a million numbers old, getting fixed, is a
> morale booster in itself, too. A million thanks. :)