For all the users wanting to better know how the Kat desktop search program works, Roberto Cappuccio explains the inner workings of Kat, the difficulties encountered during development and the future of this long awaited (and still under heavy development) piece of software in the article Busy Kat on Linux Magazine.

Comments

A KAT-io-slave would take the project a _big_ step forward. The benefits of an io-slave can be seen in the demo of kio-clucene (http://kioclucene.objectis.net/). It's an easy but very powerful way to embed the desktop search into the desktop.
I hope KAT's development moves forward at same speed as in the last months - keep up your good work :-)

I don't remember where I read this, but won't KDE4 use postgresql to store metadata information? I realize this is a minor detail, but I was quite happy about that choice, provided this will be transparent for the user (no need to set up pgsql manually: create a default db for each user, and ask for a password once when you install KDE?).

The new architecture of Kat (the one which will be published with Kat 0.7.0, codename Lilith) is based on plugins and is therefore fully expandable.
You can provide plugins for both the Repositories (the storage layer, for example SQLite3, PostgreSQL, Lucene or even Reiser4 or XML...) and the Spaces (the information layer, for example FileSystem which indexes files, Communication which indexes emails and contacts, or Links which indexes the connections that hold between objects of the other spaces).

So, if KDE will incorporate a metadata layer, it will be easy to build a Kat Repository plugin for it.

It works perfect for me on KDE3.5. Most likely you have a build of KPDF using the poppler PDF library. Depending on the version of poppler you are using, you get rendering accordingly. Older version are not particularly good, and is known to have problems.

Try a newer version of freetype. I had a similar problem with another PDF file and the nice guys over on the kpdf team pointed me in the right direction when I raised a bug. Updated version of freetype fixed the problem.

For the moment Kat creates a repository for each user. We are planning to add the possibility to share the entire repository or single information spaces.
If you want to know better what repositories and information spaces are in Kat, please read the online API documentation at: http://kat.mandriva.com/apidox/
The documentation is under development but you will find some interesting information about Kat::Repository and Kat::Space.

I like the basic concept of indexing the contents of your data, but in most of my Linux installs, the homedirectories of users are on a network drive. Keeping a copy of all the indexed file in a database (I'm assuming this will also be located in the homedirectory, but I couldn't determine that from the article) seems like a huge overhead on the network and server diskspace!

I could imagine that the index and cache is actually managed by a central service running on the network, not under the user's administration, but by the system admin. If it is not central or outside the network drive space, I would (as a sysadmin) have to disable the kat functionality entirely (which I will do when we switch to Mandriva 2006.x)

Good observation. For situations like the one you describe, we are planning to suggest the use of a centralized database (PostgreSQL, MySQL, MSSQLserver, whatever) running on a central machine.
Every user will have his own repository and will only have access to it and to the repositories the other users will mark as shared.

Yep, Tenor is the one I want to know about. I don't know why there's so much fuss about Beagle and Kat, or why GNOME's Dashboard project seems to have died. That was really astounding in its utility -- a true killer app. At this rate, windows will have it (I believe they're working on it for Vista) before the Free Software community gets it from the drawing board and obscure projects to the everyday users' desktops. That's a real tragedy, since the kind of integration needed for Dashboard/Tenor is something open source patches should enable easily.

But, to answer your question from what little I've heard... Tenor is coming in KDE 4, and the Kat folks are working with them. I think Tenor is going to be another backend for Kat. Someone mentioned those backends above. Personally, I'm hoping it really focuses on Tenor technology, rather than watering down the possibilities of Tenor for a generic search system. After all, with no disrespect to the Kat team, Kat's own technology isn't really much more than find or grep.

I really wanted to like Kat, in the absence of something like Tenor, but I'm sad to say that Kat was basically useless for me. When indexing things, it doesn't really do anything except say that it found it, and in what file. That's really not good enough, if you're trying to, say, search irc conversations for a discussion that happened five hours in, when you only remember a few keywords. Likewise, when I search PDFs for text, I don't want to just know that the text is in that file *somewhere*.

For IRC, it would need to display READABLE context, preferably in the normal IRC log format, and have a "Open" or "View" or maybe even a plugin-aware button like "View discussion", which opens the appropriate app in a highly-integrated way.

So, for instance, when I find an IRC log, clicking View might bring it up in Kopete's log viewer, already centered on that first conversation, and automatically jumping to the right place if I select another IRC search hit.

Likewise, if I select a PDF, I need it to open at the page that actually has that text. Otherwise, I just know that the 500-page PDF contains the phrase "secure network infrastructure". Which, honestly, I probably already knew.

I don't need a file search tool. "find" and "grep" do that. I need an information search tool, that brings up what I ask for, ready to use.

Please don't take this as criticism. I'm not trying to insult Kat -- nor to demand things. I would help if I could. I'm just hoping you can make it fit my needs, and maybe give us all a really great tool that will make KDE even better :)

I like criticism, especially when it is constructive, and your post points out some of the most frequently criticized features (or the lack of them) of the actual version of Kat.

The new version, on which I'm currently working, addresses most of them. In particular, it will feature the Google-like "two lines preview" that shows 2 (or more) lines of the text where the searched words are highlighted.

To open a PDF right at the page where the words have been found, well, it depends on bot Kat and the PDF viewer. We can issue a command like "open that file at page x" but then it is up to the PDF viewer to actually show that page.

For this kind of things we will need collaboration from the authors of the applications.

> I don't need a file search tool. "find" and "grep" do that.

Well, if you really think that, continue to use them, but, as I said a thousand times on other forums, it won't work for the vast majority of file formats (like PDF, PS, XLS and the like) because they don't contain clear text.

Then, if the documents are saved in an encoding which is not the one you use on your command line (probably UTF8), you will not find anything at all.

Moreover, if you have Gigabytes of documents to search, "find" and "grep" will take hours to give you what you need.

So, please don't say that "find" and "grep" are equivalent to Kat. They aren't.

You mention that with the current interface you can only search for a single word, but you plan on adding more advanced capabilities like AND, NOT, et cetera, later on. How about using the same sort of syntax for this as Google has? It has the advantage of being relatively simple, and probably the most well known (or should I rather say, the least obscure) -- out of the people out there who know any kind of search syntax at all and aren't programmers, this is probably your best bet. I've already added support to amaroK, as well :)

Basically, words are ANDed by default unless you put an OR between them yourself, to exclude something you put a - before it, use ""s to match exact phrases, and you can search in specific fields/attributes with field:word. GMail has basically the same thing, but you can use parentheses to group things as well.

For an example, bats OR "flying mice" -baseball site:wikipedia.org
would search wikipedia for either bats or "flying mice", but not for baseball bats.