OSNews: http://www.osnews.com/story/16984/Comparison_of_Desktop_Indexers
Exploring the Future of Computingen-usCopyright 2001-2015, David Adamsadam+nospam@osnews.comMon, 03 Aug 2015 00:54:42 GMThttp://www.osnews.com/images/osnews.gifOSNews.comhttp://www.osnews.com
odd inconsistencyhttp://www.osnews.com/thread?203077
http://www.osnews.com/thread?203077For some reason Beagle lost points for being written in C# and needing Mono, but JIndex didn't lose points for being written in Java and requiring a jvm.

Preferences aside, that seems a bit odd.Thu, 18 Jan 2007 16:55:00 GMTdonotreply@osnews.com (r.m.graham)Commentsnot perfect but still nicehttp://www.osnews.com/thread?203081
http://www.osnews.com/thread?203081i think this is a nice article, tough it has some shortcomings. one is the remarks on cpu usage - the author seems to fail to realize beagle using not 100% cpu is in fact a bad thing, for several reasons.

second, on a laptop, it's better to use 100% cpu for 1 min than 50% for 2 mins, in terms of power usage...

thus the fact Strigi uses max cpu is positive, not negative. and it makes for a good choice, being up to 40 times faster in indexing (4 min vs 2 hours for beagle vs 3 hours for tracker) - the author states most noted problems are rather trivial to fix.http://www.kdedevelopers.org/node/2639

anyway, a common plugin engine and dbus-interface would be good, for sure.Thu, 18 Jan 2007 16:57:00 GMTdonotreply@osnews.com (superstoned)CommentsRE: odd inconsistencyhttp://www.osnews.com/thread?203084
http://www.osnews.com/thread?203084well, it's written by a sun (Java lovers, duh) guy. so it makes sense he doesn't trash mono and JIndex too much, even tough they clearly suck in terms of performance and memory usage...

if they would make a smart choice, they would go for the best performance and least dependencies - strigi Thu, 18 Jan 2007 16:58:00 GMTdonotreply@osnews.com (superstoned)CommentsRE: not perfect but still nicehttp://www.osnews.com/thread?203090
http://www.osnews.com/thread?203090I imagine most users would run Beagle as 'nice -n 19 beagle' or the like, so the kernel pushes the priority down and workflow isn't interrupted.

Still a big fan of slocate, personally. Can force an update when I want (which can take under 10 minutes, if you have updated recently and are on a relatively new computer). The results are in a simple list format, etc. Not as advanced or user friendly, but it's nice to have a simple version of a desktop indexer available still.Thu, 18 Jan 2007 17:04:00 GMTdonotreply@osnews.com (situation)CommentsRE[2]: not perfect but still nicehttp://www.osnews.com/thread?203094
http://www.osnews.com/thread?203094if you run beagle nice -n 19, and it throttles, the kernel will increase it's priority with... surprise, 19 points, thus it'll run as prio 0. better than -19 (yeah inverted blablabla) which would be the case without the nice, but still not what you want.

and even if it does run on +19, it STILL uses cpu, even when you do a game. a scheduler policy like sched_batch would ensure it NEVER interupts another running process - that's what you want.

and slocate, does that look in files as well? anyway, i'd rather have incremental updates like beagle & friends have.Thu, 18 Jan 2007 17:18:00 GMTdonotreply@osnews.com (superstoned)Comments&quot;Strigi fastest and smallest&quot;http://www.osnews.com/thread?203095
http://www.osnews.com/thread?203095Here's a commentary from the strigi's creator (which will be used in kde4, i think): http://www.kdedevelopers.org/node/2639Thu, 18 Jan 2007 17:22:00 GMTdonotreply@osnews.com (diegocg)CommentsRE: not perfect but still nicehttp://www.osnews.com/thread?203102
http://www.osnews.com/thread?203102While I think, strigi is nice and has potential, it proved to be horribly unstable (0.3.11) in my own tests, left around many zombies etc.
It still has a lot of rough edges.
I also can't confirm, that tracker is 40x slower. It is a bit slower, but more in the region of 20% to 30% (*not* times).
Also, what is it good for to be lightning fast when your search results are not good?
Again, strigi seems to be a very young project, so there is hope that these issues are fixed.Thu, 18 Jan 2007 17:38:00 GMTdonotreply@osnews.com (meebee)CommentsRE[2]: odd inconsistencyhttp://www.osnews.com/thread?203105
http://www.osnews.com/thread?203105so it makes sense he doesn't trash mono and JIndex too much, even tough they clearly suck in terms of performance and memory usage...

I could never for the life of me understand why the decision was made to use mono as a framework for beagle, something intended to run as a transparent background service. That's not really a slag against mono per se, it's just a question of whether it makes sense to use it for something that should by design be fast and light. In fact Novell compounded this mistake by using mono for zen as well, which led to much of the griping and complaints with SL 10.1 and how slow and freaking unstable the package management was.

Not sure I understand why Sun is making the same mistake with Java either, aside from the obvious. I guess in a way it does provide a benchmark for optimizing the performance of an app like this to accomodate the overhead costs etc. but still...

I've played with Strigi, and I do like the fact that you never "feel" it running. That's how it should be. I don't care if it's spiking my CPU when my system is idle, as long as I get my juice back the moment I need to do something else. The cool thing is that the core engine seems to be pretty much worked through, now it's just further optimizing and building the hooks and plugins. Collaboration with the tracker team on unified interfaces would be fantastic.Thu, 18 Jan 2007 17:50:00 GMTdonotreply@osnews.com (elsewhere)CommentsRE[3]: odd inconsistencyhttp://www.osnews.com/thread?203111
http://www.osnews.com/thread?203111well, novell wrote beagle to show how great mono is and promote it, sun wrote their java indexer to show how great java is and promote it. of course, it's not really advertising in my mind, if you see those numbers it's clear there are some problems

yeah, tracker and strigi are the way to go - and as the latter seems to be much faster (30 times?), can index in zipfiles etc, has less dependencies and is working with the Nepomuk project to add contextual information, i'm glad KDE 4 will be using strigi.Thu, 18 Jan 2007 18:02:00 GMTdonotreply@osnews.com (superstoned)CommentsRE[2]: not perfect but still nicehttp://www.osnews.com/thread?203113
http://www.osnews.com/thread?203113well, indeed, all these projects are pretty young, so we'll have to wait to see which one will stand out as the best solution. tracker and strigi of course have (imho) the best chance, being reasonably performant and not depending on controversial stuff like mono/java. if both happen to deliver the same d-bus interface, the best will be used the most, and that's the most optimal solution.

btw strigi also delivers database services and is going to be the foundation of meta-data extraction and manipulation in KDE 4, in addition to having Nepomuk (contextual linking, labeling etc) integration, so i think it has the best cards right now... on the other hand, tracker is close to integration in gnome, and even tough gnome mostly doesn't integrate things very deeply (or at least, does so slowly), gnomes don't like stuff smelling kde'ish. after all, they even rejected aRts, even tough it was plain C, had a gnome-lib dependency and was the only technically reasonable solution by then...

but things can change.Thu, 18 Jan 2007 18:08:00 GMTdonotreply@osnews.com (superstoned)CommentsWell one thing is obvioushttp://www.osnews.com/thread?203115
http://www.osnews.com/thread?203115If there's one thing that this report makes absolutely clear, it's that managed environments like Mono and Java have absolutely no place in the world of low-level userspace daemons. Beagle's memory usage was more than 10 times that of Tracker and Strigi, and JIndex was even worse. Beagle is good at what it does, but anyone who thinks that it is a long-term solution for metadata indexing is either mad, or working for Novell.

So given that we are left with a choice between Tracker and Strigi, this question is of course, which one? Clearly both of them still have some deficiencies (which is why most distros are going for Beagle right now), and both are in heavy development. A clear winner is hard to pick.

Unfortunately I think it might be the case that Gnome goes for Tracker and KDE goes for Strigi, which would be a shame, as this is definitely an opportunity for the two desktops to work together to achieve a common goal. Even if they do go for separate solutions, I hope that they could work out a common search API, so that the user wouldn't have to have to daemons running to be able to use Gnome and KDE apps at the same time.

It's also worth noting that Tracker aims to be not "just" an indexer, but a complete metadata database. So, for example, Rhythmbox (or Amarok!) wouldn't need to maintain its own database, but would instead just be able to query tracker for all the audio files on the system. I don't know whether the author of Strigi has similar plans, or indeed whether such a thing is practical.

Lastly, I have to say that this is the least professionally-written report I've ever seen. I realise that it's primarily for internal use, but if I were to hand my boss something like this ("yes, I do like cakes") I would probably find myself in some quite hot water...Edited 2007-01-18 18:14Thu, 18 Jan 2007 18:10:00 GMTdonotreply@osnews.com (tristan)CommentsUhm...http://www.osnews.com/thread?203128
http://www.osnews.com/thread?203128...why would a search engine need to be desktop environment specific?

What if I run gnome, kde, and xfce.
Why should I have 3 indexes built for each environment?Thu, 18 Jan 2007 18:34:00 GMTdonotreply@osnews.com (FunkyELF)CommentsRE: not perfect but still nicehttp://www.osnews.com/thread?203133
http://www.osnews.com/thread?203133so using 100% of CPU is now a feature and not mayor bug?

I don't think so.Edited 2007-01-18 18:49Thu, 18 Jan 2007 18:44:00 GMTdonotreply@osnews.com (Hiev)CommentsGrokhttp://www.osnews.com/thread?203156
http://www.osnews.com/thread?203156Does the average user really need more than an occasional grok? Maintaining the db would seem to be a completely wasted effort. Something more like an illustra image datablade would seem much more useful.

#!/bin/sh

if [ "$1" == "" ]; then
echo "

descend directory tree and search files

usage:
grok '' ''

"
else
find . -name "$2" -print | while read i
do
whack=`grep -n $1 $i`
if [ "$?" -eq "0" ]; then
echo "
FOUND: $i"
echo $whack
fi
done
fiThu, 18 Jan 2007 19:07:00 GMTdonotreply@osnews.com (Sphinx)CommentsRE: Well one thing is obvioushttp://www.osnews.com/thread?203163
http://www.osnews.com/thread?203163From what I've read (sort of hinted at in the review), Tracker and Strigi are working on a common API so that, in theory, one could use Tracker as the front end database (that'll be used for more than just indexing, e.g. a bookmarks database) and Strigi as the backend indexer. Not sure if this will pan out.Thu, 18 Jan 2007 19:14:00 GMTdonotreply@osnews.com (g2devi)CommentsRE: odd inconsistencyhttp://www.osnews.com/thread?203165
http://www.osnews.com/thread?203165For some reason Beagle lost points for being written in C# and needing Mono, but JIndex didn't lose points for being written in Java and requiring a jvm.

I guess the summaries have been written by different authors and haven't been synchronized prior to publication.

For example the Strigi summary lists "not clear ANSI C" as a cons(begin written in C++), which cleary also applies to the the programs written in C# and Java

I found it odd that obviously only Strigi needs a build framework while all other projects seem to deliver hand written make files. I would have expected that Tracker would be using autotools and the Java application something like Ant.Thu, 18 Jan 2007 19:16:00 GMTdonotreply@osnews.com (anda_skoa)CommentsRE[3]: not perfect but still nicehttp://www.osnews.com/thread?203168
http://www.osnews.com/thread?203168they even rejected aRts, even tough it was plain C

aRts is written in C++, not plain C

had a gnome-lib dependency

It didn't unless you you are referring to using glib to some extend, however I think it had a reduced copy if it inside its own source tree.Thu, 18 Jan 2007 19:19:00 GMTdonotreply@osnews.com (anda_skoa)CommentsRE: Well one thing is obvioushttp://www.osnews.com/thread?203172
http://www.osnews.com/thread?203172If there's one thing that this report makes absolutely clear, it's that managed environments like Mono and Java have absolutely no place in the world of low-level userspace daemons

True, however I found it quite awesome how fast the Java index starts up after the first time (startup times diagram, page 10)

Well, Strigi already indexes metadata and I think there is a goal to make it collaborate with Nepomuk's data relation frameworkThu, 18 Jan 2007 19:23:00 GMTdonotreply@osnews.com (anda_skoa)CommentsRE[2]: not perfect but still nicehttp://www.osnews.com/thread?203175
http://www.osnews.com/thread?203175I also can't confirm, that tracker is 40x slower.

I can confirm its most definitely not!

The article in question tested the ancient 0.5.0 release of tracker which was the first version to include our new indexer framework which was completely unoptimised.

The lastest release, 0.5.3, is tons faster and should be much closer to strigi. We are doing some more optimisation work in the next version so will be interesting to see how they compare then.

Note also strigi does not currently do any string processing like stemming so it lacks the ability to do accurate searches on plurals and stems. This might account for its impressive raw speed as well so we really need to see strigi get features like this to do a more "fair" comparison.Thu, 18 Jan 2007 19:25:00 GMTdonotreply@osnews.com (Jamie)CommentsRE[4]: not perfect but still nicehttp://www.osnews.com/thread?203177
http://www.osnews.com/thread?203177i stand corrected on the c++ issue, i see it's indeed C++. but g(nome)lib is definitely a dependency, it's the main reason apt-getting KDE pulls it in...Thu, 18 Jan 2007 19:26:00 GMTdonotreply@osnews.com (superstoned)CommentsRE[3]: not perfect but still nicehttp://www.osnews.com/thread?203178
http://www.osnews.com/thread?203178but strigi does create a md5 (or something) hash for every file it indexes, to find duplicates. doe tracker do that?

btw good to hear tracker is getting some performance improvements Thu, 18 Jan 2007 19:27:00 GMTdonotreply@osnews.com (superstoned)CommentsRE: Uhm...http://www.osnews.com/thread?203180
http://www.osnews.com/thread?203180yeah, 3 index tools would be bad in the long run. but they'll all support d-bus, hopefully standardized, so you can swap 'em and use the same in each environment.

the mere fact we (now...) have several index daemons is NOT a bad thing, btw. it's how Free Software works, and in the end, the one with the best design hopefully wins and gets used most. darwinism rules the FOSS world, and that's a good thing.Thu, 18 Jan 2007 19:30:00 GMTdonotreply@osnews.com (superstoned)CommentsRE[2]: Well one thing is obvioushttp://www.osnews.com/thread?203181
http://www.osnews.com/thread?203181Well, Strigi already indexes metadata and I think there is a goal to make it collaborate with Nepomuk's data relation framework

Not as good as having an all in one integrated database/indexer like tracker (and vista). The problem with a dedicated indexer is they cant efficiently be coupled with a database without duplicating all the metadata in both databases and then which engine do you use for searching: Lucene or the DB?

Tracker was designed from the ground up to integrate both using a tightly coupled sqlite and a custom inverted word index and this gives you tremendous power as a result without duplicating metadata and having one interface for searching.Edited 2007-01-18 19:35Thu, 18 Jan 2007 19:31:00 GMTdonotreply@osnews.com (Jamie)CommentsRE: Grokhttp://www.osnews.com/thread?203184
http://www.osnews.com/thread?203184doesn't seem to give instant results on a 500 gb disk, don't you think? and does it give results from compressed files (strigi does...), or meta-data from images and stuff? and does it tell you when and where you downloaded the file, or from who you got it? does it give contacts if you search for a friend's name? and his emails and the files he send you?

sorry to bring this so harsh, but it's 2007.Thu, 18 Jan 2007 19:33:00 GMTdonotreply@osnews.com (superstoned)CommentsRE[5]: not perfect but still nicehttp://www.osnews.com/thread?203188
http://www.osnews.com/thread?203188but g(nome)lib is definitely a dependency

As I said it depends on glib, but it definitely does not depend on gnome-lib, two very different libraries.

From someone as well informed as you I'd almost consider it flamebait Thu, 18 Jan 2007 19:37:00 GMTdonotreply@osnews.com (anda_skoa)CommentsRE[3]: Well one thing is obvioushttp://www.osnews.com/thread?203190
http://www.osnews.com/thread?203190Not as good as having an all in one integrated database/indexer like tracker

Strigi does this as well (SQLite if I remember correctly). There should be slides from the Strigi presentation at aKademy06 somewhere which have detail about the Strigi architectureThu, 18 Jan 2007 19:39:00 GMTdonotreply@osnews.com (anda_skoa)CommentsRE[6]: not perfect but still nicehttp://www.osnews.com/thread?203193
http://www.osnews.com/thread?203193hmmm. i'm getting confused now. i thought for a long time glib wasn't related to gnome, then something came up (don't remember what, see my nick) which apparently made me think it WAS gnome-related...

now you say it isn't. hmmm
and google says:
GLib is the low-level core library that forms the basis of GTK+ and GNOME.

so, who's right? you or google?Thu, 18 Jan 2007 19:47:00 GMTdonotreply@osnews.com (superstoned)CommentsRE[2]: not perfect but still nicehttp://www.osnews.com/thread?203196
http://www.osnews.com/thread?203196When nothing else is competing for the CPU, why not use it all? With the appropriate nice setting, strigi will happily settle into the background and give up the CPU to interactive tasks, so no impact on interactive users (or other, higher priority, batch tasks either). Thats why OS' have schedulers...Thu, 18 Jan 2007 20:00:00 GMTdonotreply@osnews.com (borker)CommentsRE[3]: not perfect but still nicehttp://www.osnews.com/thread?203199
http://www.osnews.com/thread?203199Thats not accurate - the most the linux kernel can adjust nice levels is +/- 5Thu, 18 Jan 2007 20:03:00 GMTdonotreply@osnews.com (Jamie)CommentsRE[7]: not perfect but still nicehttp://www.osnews.com/thread?203203
http://www.osnews.com/thread?203203so, who's right? you or google?

I am, of course!

glib is definitely a core dependecy of GNOME, it is the GTK+ platform abstraction library.

However it has move to a general purpose C utility library a long time ago, several projects not related to GNOME use as well, it is always packaged separately, etc.Thu, 18 Jan 2007 20:11:00 GMTdonotreply@osnews.com (anda_skoa)CommentsRE[2]: odd inconsistencyhttp://www.osnews.com/thread?203216
http://www.osnews.com/thread?203216"best performance" in terms of speed and memory usage, but not "best performance" in terms of actual usefulness (i.e. it missed a bunch of results that the other tools got).Thu, 18 Jan 2007 20:44:00 GMTdonotreply@osnews.com (AdamW)CommentsRE[3]: odd inconsistencyhttp://www.osnews.com/thread?203226
http://www.osnews.com/thread?203226I'm not sure you can even compare performance all that usefully since the fixes required to get those extra hits may end up slowing down the app. Still, it was quite impressive for a young project and the developers seem to think the problems were rather minor.

For sure! The report is very useful in showing problems too. The main point that must be addressed the low result count. This is something I was totally unaware of. The reason for that is that I've not come around to writing unit tests to test search reliability. This is the first thing to pay attention to after the KDE metainfo work is finished.

Nevertheless, the overall impression is very good. Most negative points are rather vague and revolve around smaller issues. Please forgive me for being overjoyed at the huge speed differences.

-oeverThu, 18 Jan 2007 21:03:00 GMTdonotreply@osnews.com (smitty)CommentsRE[5]: not perfect but still nicehttp://www.osnews.com/thread?203227
http://www.osnews.com/thread?203227glib is basically as set of c functions to handle data structures plus a few things like event loops and basic thread (see http://developer.gnome.org/doc/API/2.2/glib/index.html ).

There's nothing GNOME-specific about it and it's extremely useful if you're doing any C programming since it allows you not to re-invent the wheel (e.g. virtually every C programmer re-invents the linked list). The danger of this constant re-inventing is that you might accidentally re-invent it wrong or re-invent it non-optimally.Thu, 18 Jan 2007 21:05:00 GMTdonotreply@osnews.com (g2devi)CommentsNamazuhttp://www.osnews.com/thread?203237
http://www.osnews.com/thread?203237Anyone tried Namazu ? I used it when I needed to "crunch"
5 MB documentation, mixed PDF, HTML and Word .doc, in a very short time. What I liked the most was an ability to execute a simple, Google-like query from the command line, and results were displayed in a preferred browser, just like Yahoo or Google.

One can have different index sets with Namazu, too.

I have not tried any of reviewed programs, so I can't compare them to Namazu. Namazu can be usefull as a search engine for single website, because it can run as CGI, too.

DGThu, 18 Jan 2007 21:48:00 GMTdonotreply@osnews.com (trenchsol)CommentsRE[6]: not perfect but still nicehttp://www.osnews.com/thread?203278
http://www.osnews.com/thread?203278The danger of this constant re-inventing is that you might accidentally re-invent it wrong or re-invent it non-optimally.

The danger of constant re-inventing is that you'll make one or both of the above mistakes 99% of the time!Fri, 19 Jan 2007 00:00:00 GMTdonotreply@osnews.com (MattV)CommentsRE[8]: not perfect but still nicehttp://www.osnews.com/thread?203414
http://www.osnews.com/thread?203414well, i was more-or-less wrong, then, but at least it's understandable Fri, 19 Jan 2007 07:48:00 GMTdonotreply@osnews.com (superstoned)CommentsRE[4]: not perfect but still nicehttp://www.osnews.com/thread?203415
http://www.osnews.com/thread?203415yes, that might be true. i don't use mainline cpu scheduler, but the staircase one, which might have more levels to change...Fri, 19 Jan 2007 07:50:00 GMTdonotreply@osnews.com (superstoned)Commentsmemory usagehttp://www.osnews.com/thread?203426
http://www.osnews.com/thread?203426It may look strange that the reviewer went as far as using a November 8th cvs release for one indexer and instead used a several months old release of beagle (when a beagle release happened on November 1st). It may not be malice, but just ignorance on his part.
Anyway, beagle had many fixes for memory usage in the releases following 0.2.7 and some more even after November 1st: the actual memory usage of bagle is much lower now.Fri, 19 Jan 2007 08:59:00 GMTdonotreply@osnews.com (lupus)CommentsRE[3]: not perfect but still nicehttp://www.osnews.com/thread?203427
http://www.osnews.com/thread?203427indeed. when indexing, you want 100% cpu - anything less is just waisting cpu cycles... and of course, when indexing is done, cpu usage and mem usage should be as little as possible (and afaik most indexers do that right...).Fri, 19 Jan 2007 09:03:00 GMTdonotreply@osnews.com (superstoned)CommentsRE[3]: not perfect but still nicehttp://www.osnews.com/thread?203523
http://www.osnews.com/thread?203523aRts also sucked.Fri, 19 Jan 2007 18:09:00 GMTdonotreply@osnews.com (rayiner)CommentsRE[4]: not perfect but still nicehttp://www.osnews.com/thread?203531
http://www.osnews.com/thread?203531Not always you want 100% CPU,
Here are examples:

--
MigiFri, 19 Jan 2007 18:28:00 GMTdonotreply@osnews.com (migi)CommentsRE[4]: not perfect but still nicehttp://www.osnews.com/thread?203883
http://www.osnews.com/thread?203883yeah, esound was great... now, after arts being unmaintained for 3 years, gstreamer is just a little better. ok, 0.10 is seriously better, but hey, it's beating 5 year old tech...Sat, 20 Jan 2007 21:53:00 GMTdonotreply@osnews.com (superstoned)CommentsRE[5]: not perfect but still nicehttp://www.osnews.com/thread?203885
http://www.osnews.com/thread?203885yeah, i read that, it's my own discussion with their developers .

and i think you should read the rest of the discussion...Sat, 20 Jan 2007 21:54:00 GMTdonotreply@osnews.com (superstoned)Comments