After a short delay due to a heavy dosage of Real Life(tm), I return to bring you more on the technologies behind KDE 4. This week I am featuring Strigi, an information extraction subsystem that is being fully deployed for KDE 4.0. KDE has previously had the ability to extract information about files of various types, and has used them in a variety of functional contexts, such as the Properties Dialog. Strigi promises many improvements over the existing versions. Read on for more...

Strigi is a library that sits at a lower level than KDE. It is written in C++, and is designed to present a series of generic calls that a program can use to find more information about a given file or files. It is in no way tied to KDE except that the development version lives in KDE's SVN repository. It also has search capabilities, which are not really the focus of this article.

The Strigi libraries are used to get information from within files, such as the dimensions of an image, or the length of an audio clip, embedded thumbnails, number of lines in a log, source code licensing info or just to search a text file for a given string. Strigi has other advantages, as it can work inside compressed files, archives, and so forth seamlessly. In fact, it ships a few useful utility programs, called deepgrep and deepfind. These useful command line programs allow you to search for information within binary file formats as easily as using grep or find on plain text files. KDE is inheriting the same libraries, so we also get this unique advantage of being able to pull information out of files that are buried within binary formats, such as .tgz files. There is a prototype kio_jstreams powered by Strigi that treats archives like local folders, allowing you to visit /home/user/tarball.tar.gz/icons/ for example... This works great when you are using solely KDE integrated applications, but there are currently problems when mixing with other programs. For example, if you're browsing with Konq, and click on a file within a tarball, and you want to open it in the Gimp, well passing that sort of directory would obviously break the Gimp. So for the time being, this mode of operation is an experimental io_slave only, and will continue to be until these sorts of problems are solved. (The other problem is making a tgz or odp file behave as both a file and a directory simultaneously.)

There are many useful ways that Strigi can return data, once a query has been performed. For example, Jos notes: "The program xmlindexer is useful for extracting data from files in a very efficient manner. Because it outputs xml, it is easy to use from any program. Other search projects such as Beagle and Tracker would greatly benefit from using xmlindexer." The xmlindexer program is a binary, so programs can easily call it externally without having to link to Qt or use C++. That said, there are many ways to directly use the Strigi libraries...

The KDE libraries have had methods of extracting information (such as meta data via KFileMetaInfo) from files before, but in many cases they were either slow, or of limited functionality. With Strigi, we have seen as much as a several-fold increase in speed for extracting data from PNG files. I am not aware of any other speeds tests actually being performed, but the general impression is that it is much faster at retrieving file data than most of the previously existing methods.

So in KDE, there are not really any good screenshots to show Strigi in action, as it's really just a library. That's not to say that its effects will be invisible though, as things like the File Properties dialogs are already taking advantage of the Strigi backend to pull the data that was previously provided by KFileMetaInfo. Also, for things like thumbnail and other metadata that is being displayed in the file browsers, Strigi is planned to be used (or already in use in some cases) and preliminary results show massive speed improvements. But so far, this has had little effect on the actual KDE experience to the end user, at least in a visual sense. However, as more KDE subsystems become aware of Strigi, we should start to see more unique and useful uses for all the features that Strigi supports.

For example: One of the biggest benefactors of the Strigi work is NEPOMUK. According to Jos: "Nepomuk is a big European research project on enhancing computer applications to make them semantic and connected. Nepomuk-KDE is the work on a KDE implementation of the standards and ideas that come out of that project. I work together with the people of Nepomuk and especially Sebastian Trueg of Nepomuk-KDE to make sure our work fits together. At the moment Sebastian is writing [an] additional index implementation for Strigi that is better able to work with semantic data." This project uses a lot of metadata and other file contents (like the text of IRC logs, for example) to provide a easy to use search system for the desktop. NEPOMUK will undergo a name change before its final implementation is set.

So while Strigi does the actual digging through the data, other applications such as the Dolphin/Konqueror, the File Properties Dialog or NEPOMUK are the applications that will see the fruits of this work. At the moment, however, work is mostly focused on porting the previously existing KFilePlugins to use the new backend classes. For status reports on this effort, check out the Porting KFilePlugins Progress page on the kde wiki.

To learn more about Strigi, visit the website or join #strigi on irc.kde.org.

Comments

Thank you Troy for another great article about interesting technology behind KDE4.It has become a habit to read these series every week, and I was very happy to see this new article about Strigi today.

Yeah, I actually moved during the break here, and haven't had a net connection until recently... no net means no SVN builds... therefor I had to choose a more abstract topic. Thankfully Jos was very helpful in answering all the questions I had while preparing the article :)

I can't guarantee it'll be weekly articles (at least for the next few weeks as I'm now in my exam block at the uni) but I'll try to keep 'em rolling...

I do see that I should be able to use tar://path/to/tarfile and file://path/to/tarfile and it sure would be nice, if there were a way to find their relation by means open a double-click, open action in Dolpin of KDE4.

If done correctly, up of the file browser Dolphin would switch to the file:// protocol again.

Unsolved, forever, is the nesting of IO-Slaves, isn't it? What if want to do tar IO-slave over ssh? fish://path/to/tarfile, can't be browsed with tar:// can it? The chaining of IO-Slaves would be nice.

Strigi reads files as a stream. This fits very well with kioslaves. On its own, Strigi can read embedded files from other files. To read files embedded in other files that are read over a network protocol like ftp is a bit more tricky. There would need to be a way to really nest kioslaves to make that possible.

I don't get it. I can do this stuff already without using Strigi on KDE 3? It sounds like nothing new at all. KIO slaves have rocked a long time, and being able to navigate tar's and other archives right from your trusted Konq is a very old feature. Works like a charm.

Yes and no. You can do some of what Strigi does in KDE 3, but it's slower than using Strigi, and you can't extract the same detail of information (the infrastructure is not there). For example, you can navigate a tarball in KDE 3, but you can't pull embedded thumbnails out of images when browsing within a tarball... Strigi can do that, and fast...

I used Dolphin (the IO-Slave shell) to specifically express that I regard the job as different from what Konqueror (the IO-Slave and KPart shell) would do, like using a KPart for tar and an IO-Slave for the ssh.

That said I didn't think of bringing up the pointless debate. Which is btw mostly pointless, because it's about a decision already done, and only about people not understanding what it is.

And other than that, the debate is by itself not bad. I think it helps to show the developers that the "people" (the mass of slightly informed users) really appreciate Konqueror and want it to stay and want to see continuation of this success story.

So please don't police my use of "Dolphin". While I feel that it was not well communicated by the developers, I do feel and appreciate the role of it. And my wish is exactly a point where Konqueror and Dolphin should behave different. In Konqueror browsing a tar should open a complex KPart with all the details, and in Dolphin it should be like browsing the files inside the tar.

konqu may load Ark as a kpart (which translates to "embed ark in konqu")

dolphin can _not_ load any kpart, so it's limited to the one file-browsing interface hardcoded into dolphin. Still it uses the same kio-slaves like konqu or all other kde-apps can use (making it possible to dive into a tgz-file e.g.)

> For example, if you're browsing with Konq, and click on a
> file within a tarball, and you want to open it in the Gimp,
> well passing that sort of directory would obviously break the
> Gimp.

You can easily view all KIO slaves in non-KDE apps (such as GIMP, Firefox, OpenOffice, even commandline utilities) through KIO-FUSE. It works by mounting remote locations (or tar archives, in your example) into the root filesystem hierarchy:

I already today can use OpenOffice to open files via IO-Slaves. It just takes a temporary file, created behind my back. And why not monitor that file for changes and push these backto the IO-Slave where it came from?

Admited, a lame work-around, but with inotify its going to work nicely.

> I already today can use OpenOffice to open files via IO-Slaves.
> It just takes a temporary file, created behind my back. And
> why not monitor that file for changes and push these backto
> the IO-Slave where it came from?

Because when OpenOffice crashes or misbehaves it leaves your /tmp directory with Gigs of orphaned temporary files.

It's a pain to make OpenOffice and other non-KDE applications aware of IO slaves, and it's outright impossible to do so in closed-source apps. With FUSE, they don't have to be recompiled or modified at all - they see remote files as a normal local files. So it's great for backward compatibility.

this already works in KIO, it CAN create a temporary file, monitors it for changes, and uploads the changed file back to the original location. I agree FUSE is cool, but not the cross-platform solution we need.

Getting the KIO-Slaves to work on the other platforms should be much easier than getting the applications. Especially on OS X where the primary differences between OS X and FreeBSD (or OS X vs Darwin+X) exist only with regards to the GUI.

I really like where KDE and its libraries are going. Except for the ioslaves.

It is becoming very obvious that ioslaves do not belong into kde/gnome/openoffice, but at least one layer deeper than that. They should be part of linux, available everywhere, so that commands like "less http://kdenews.org" become possible.

I guess that there is some good reason why this has never happened, but the current state really sucks!

The main problem is that it gets a lot more difficult to implement at a deeper layer. Even now ioslaves have some complex issues to solve.

One of them is that most interresting ioslaves may require user interaction.
lets say "less http://kdenews.org" actually works but the given URL requires authentication. "less" obviously can't handle that, so what will?

What about progress bars? Even small files may take a long time to access. And in many cases "download, work locally, upload" is the only reasonable approach to work with a file.

It would just need a deeper interaction between a generic library and the above DE / applications. Even the mentioned interactions can be done.

For sure it is a huge task getting it done in a way every DE / app is accepting it and the flexibility in adding new / adjusting current IO-slaves from one to another KDE version is gone. I.e. it needs to be at least pure LGPL so it cannot be done with Qt.

>Yes, it needs kernel support from BSD, Solaris, HP-UX, Linux and Win32.
If this would really be the case, then KDE would not be at all possible on these platforms.
If you can do it within KDE, then you can do it seperately as well.

its possible in kde on all platforms because kde is programmed to do it on all platforms. sure, you can do this with every application, but that means you need to change the code. thats the problem - you just can't change the code of all applications for half a dozen platforms...

so if you want something like kio that works with all applications, you need kernel support.

For sure some support is in the kernel, as some basic network- and filesupport is there for other reasons, but why on earth needs a kernel knowledge about IMAP or EXIF?
The kernel doesn't need to know all these different things.

What you want is a functionality available on Linux platforms and this can be done with a library as well.
You are worried it cannot be used in the bash? For sure bash needs then to be adjusted as well.

Yes, we can't change the code of all applications on all platforms. But if you keep it in KDE alone and in Gnome alone and in other Apps, it for sure will never happen.

just putting ioslaves into another library will not do anything. noone is going to rewrite their software just because some geeks like to really use networks and such.

the only way to make ioslaves fully work for non-kde software now (not in 10 years...) is to use kernel extensions. not because the kernel needs to know about imap or whatever, but because its the only common api for filesystems access every application uses. thats what fuse is about, to make things like that easier.

rewriting the ioslaves in pure c, with a minimum of depedencies - so that maybe someone else would use it, is not a good idea. the kde devs allready have enough work ahead. and you still won't get less-over-http anytime soon, because these apps arn't maintained anyway.

With so many changes for KDE 4 I wonder how long will it take to make KDE 4 stable/usable.
I stopped using Konqueror because it crashes a lot when dealing with embedded multimedia. Kaffeine since a couple of months crashes when I open the playlist and a video is running... I hope we get a stable KDE 4 before jumping to an even cooler KDE 5.

all in all, that nasty crash when viewing embedded videos with kaffeine (which i personally don't like (ohh, see how lame it is)) was not konquerors fault but a misuse of Xlib in means of multithreading.

Nice of them to have fixed it. Unfortunately I only use Kaffeine 0.4.3. After that, I hate the UI. (Old UI: http://img64.imageshack.us/my.php?image=kaffeine7gu.png ) To prevent the crashes, a small patch of Konqueror is sufficient. That's obviously not the "right" answer, fixing Kaffeine was, but this is nice for those of us using older versions.

The differences amount to
+#include
and
+ XInitThreads(); // fix for kaffeine

(patch file attached)

Again, not so useful now that Kaffeine is fixed, but it allows me to continue using my preferred 0.4.3

Hmmm... That may be caused by poor packaging (if you used binary packages) or extreme CXXFLAGS and LDFLAGS (if you compiled KDE from source). Also, if many othert programs start segfaulting with no aparent reason, check your computer - this may be a hardware failure (faulty cooling, dust, bad capacitors etc) -- but this is less likely.

Everyone is switching to KPlayer it seems. It's so much more stable, and the default interface is more simple and easier to use. But it also lets you get under the hood and tweak things the way you like them. And it has a multimedia library to organize your stuff. And it uses MPlayer, which means more compatibility with formats, codecs, and so on.