The Amarok Squad is proud to announce the second beta release of Amarok 2.0. This release includes a lot of bug fixes and improvements, like the switch from SQlite to MySQL-embedded. The LibriVox service is back, as is lyrics support. Please read the release announcement for a detailed list of changes since the release of Beta 1 and more.

The most significant change in this release is the switch from SQLite to MySQL-embedded as the database backend. MySQL-embedded allows us to use the performance increase of the popular MySQL database, while avoiding non-trivial configuration that comes with a standalone server. Most noticeably, you will see much improved performance of collection scanning and searching, especially with very large music collections.

More generally we have fixed a huge amount of bugs, bringing Amarok 2 one step closer to the stability you would expect for your daily music needs. While not quite stable quality, we encourage our users to test this release and continue submitting bug reports. Already this beta version is good enough for daily use in many cases.

We hope you enjoy this release which we have worked so hard to produce. Stay tuned for upcoming releases, and please do not forget to donate for Roktober!

Yah but with a relational engine you would still have to parse all the metadata before inserting it right?

Besides, no, I don't want to wait 10 seconds and I never said I would but you keep ignoring the question I'm trying to ask, which is how many files it takes for MySQL to make a difference as opposed with a good algorithm.

I mean MySQL doesn't do magic, it just implements good searching and retrieval algorithm in a fast yet generic way, and I am confused why we can't implement good searching and retrieval algorithms in the much simpler case where we don't have to have the genericity of an RDBMS. Plus it'd be even faster because of no SQL overhead.

Sorry if I'm being bothersome with this, as a whole I think KDE is great but this issue in particular smells a lot like people trying to rationalize bad design after the fact and that's be a pity for such a project as KDE. I mean back in the day it was GNOME that made the poor choices (such as CORBA for interprocess communication) rather than implement the correct solution themselves (DCOP, in the case of this comparison), right?

So if MySQL implements good searching and retrieval algorithms in a fest way, and provides them as a library to embed in your application, why not use it, but implement the same thing again? A lot of work went into mysql to make it fast and powerful, plus you don't need to maintain it.

Oh because MySQL implements a lot of additional stuff such as SQL parser and network layer and the whole database engine independence system and query planner (and besides the database engine independence is the reason why the MySQL query planner is not as good as other databases because it has to remain generic across them all, and that's one more reason why too much genericity leads to more overhead). Kinda like using CORBA for communication between local processes indeed, it works and that's not the problem, the problem is the overhead. You are giving me the not reinventing the wheel argument which does make sense, except when you're trying to fit truck wheels on your sports car because that's all you have at hand.

They are using MySQL embedded, which removes much of the multi-user/network stuff (which is why it's called embedded).

This kind of metadata management is the RIGHT application for SQL. Manually recoding every possible kind of query that Amarok needs, when MySQL already does this, would be utterly silly. Especially since it _does_ have things like the query planner built in, so that it can take into account dynamic things like indexes, table sizes, join path optimization etc, which _could_ be reimplemented manually but with little noticeable performance benefit, and huge opportunity cost (I'd rather have the Amarok developers spend their limited time on Amarok than duplicating RDBMS features - the MySQL devs do that full time already). Not to mention that db versioning between releases of Amarok would be much easier with an SQL backend than some ad-hoc db implementation.

The query planner overhead you are talking about is irrelevant btw - for simple queries (one or two tables) it will be negligible, and if it is a complex query, it will often do a better job than someone who isn't a DBA coding it by hand.

Thank you for actually addressing my questions it is good of you to do so.

I can see how MySQL embedded would be way better than the server version, yes, so that's less of a concern, thank you for this detail.

About the query planner and complex queries though, this is a music player, what kind of complex queries do you think would be running? It may be there is something I'm missing there mind you so this is a honest question.

I am pleased that you said "why we can't implement..." because I am interested in reading about your proposed alternative DB design for Amarok. Amarok requires dozens of different storage, searching, sorting and filtering queries across many different data types, so I'd be interested in seeing how you solve this problem while still allowing plugins and future versions of Amarok to perform additional queries without causing re-programming of the db algorithms, and taking into account data size and distribution like the MySQL query engine does.

And I am genuinely interested btw, because this is something that has been kicking around in my head since JuK implemented its own custom database for all of this. I am (passively) for more SQL use within KDE, so a good analysis of this case would be very useful fodder for debate.

Some possible avenues for exploration comparing JuK's db to Amarok's - is there a large delta in performance across all query types? Particular query types? At what point is this affected by database size? How about ad-hoc queries - are both able to handle future query requirements? How about versioning and updates? On-disk size? Reliability and fault-tolerance (ACID)?

Thank you for bringing up actual implementation requirements and not just generalities such as speed, because it helps make the issue clearer.

The way I was seeing the problem domain it was essentially about sorting and filtering a medium-size set of data because I don't think many users have more than say a few thousands songs at most? Is this correct? I'm going to work based on that from here down so please keep this in mind, thanks.

Because if so a simple tabular binary representation of data would work fine, that's actually how RDBMS engines work under the hood. By this I mean binary files with fixed-sized records (so you can scan sequentially with a fixed increment which is very fast).

Actually a RDBMS adds a lot of layers around such as indexes, which are depending on the implementation b-trees or hashes pointing to offsets in the tabular data, and of course JOINs and the query planner for non trivial queries, and in some cases collation (well for MySQL anyway because the support for collations in Postgres is kinda weak).

What I was thinking is that in a number of cases the query planner ends up deciding to not use indexes at all and scans the tabular data instead, it depends on how expensive index lookup is as opposed to scanning sequentially (which itself depends on the cost of loading a page of tabular data). I refer you to the Postgres documentation for EXPLAIN ANALYZE which does a good job of speaking about this.

So in the case of a list of song files (if there are at most a few thousands entries for instance) it's really not obvious that a good query planner would decide to use the index, or that the index if used would make a huge difference in speed. Plus in the case of text search (by fragments of title for instance) there will be no index used of course.

So a simple tabular representation might work very well for this problem domain.

There is the problem of collations (which we will probably want to use for searching and sorting, for instance Gödel should appear alongside Goedel), which I think is what the QString::localeAwareCompare method exists to solve, although there might be too much overhead for this. In such a case I've at times used an algorithm where I store a normalized version of strings (meaning all lowercase, spaces simplified and accents removed) to search and order based on that, which works well but has the drawback to double the storage space required for a given string.

Then there's JOIN and I didn't think there would be a need for those in a playlist because the data model of artist / album / song can be designed as not following the usually correct normal form (because a song encompasses both album and artists, i.e. there are never more albums than songs for instance), but of course I may be wrong because I don't know what kind of queries the playlist would need to do anyway. Besides there is the issue of artists collaborating on an album or on a song which may require the use of some normalization at least if that's an issue for you.

Of course JOINs can be implemented all the same if needed, it is logically speaking only a matter of scans across different tables using the JOIN ON clause as the search key. This is actually where a query planner makes a difference by ordering the JOINs if there are several of them, you want to start scanning the smallest table first but that particular optimization is not hard to implement especially as in our case we'll nearly always have more songs than artists and albums and more albums than songs so we can work based on that assumption.

There is also the issue of data and index buffers caching, which for small datasets (and a few thousands entries is still a small dataset I think) can probably be handled better by the OS-level cache while a RDBMS is generally designed to work best when it has the whole machine to itself and so manages the buffers itself for more efficiency, although of course that may not be true of MySQL embedded.

So for the problem domain as described thus far, a tabular data implementation with fixed-size records would probably work very well but, but but but of course I'm working based my assumptions of how a playlist works and that may not fit at all the data model Amarok really uses, in particular it seems you mean that plugins should be able to create data columns, right?

For instance it would be harder to implement flexible data model description, so columns could be added and removed and changed dynamically, and if that's a very very important part of the way you see playlists then of course a simple tabular model would no longer fit the problem domain as such, we should at least consider an array of such tabular models and that's already more complicated.

So it would help if you could quote typical complex queries such as you would like to use, and I do mean typical and not too far fetched because the "what if we ever need to do..." kind of scenarios is recipe for overengineering I think, because typical use cases is what you need first and foremost to design such a system, otherwise it's all assumptions and assumptions are the mother of all fuckups. (Which is a way to say that the whole above text might be full of fuckups because I still don't know exactly what you truly need to do with your playlist data.)

Well! I'm sorry this reply was so long but I didn't have time to make it shorter. :)

Thank you for reading taj. I hope at least it's clear I'm really trying not to talk out of my ass and to give this thorough thought.

You didn't respond so I assume you aren't interested anymore which I can understand because the amarok developers said they'd still use MySQL as a datadump for various things even with other collection engines, so that makes the whole idea of a better collection engine to avoid the overhead of MySQL kinda moot.

It's a pity because I did some preliminary testing and a very very simple implementation of mmap'ed tabular records shows excellent performance I mean we're talking almost one million records per second for a case insensitive search (think like 'ILIKE') and that's without any indexing whatsoever of course.

Sorry this is wrong, that number is for case-sensitive search (the equivalent of 'LIKE'), case-insensitive search is more like 500,000 records per second.

Mind you that's without the collating algorithm I told you about, with that algorithm it's back to 700,000 to 800,000 records/s at the cost of doubling the record size.

Also note this is a very simple proof-of-concept implementation so of course it means little about how the final implementation would perform, the sole idea is to show that a very simple and quick to implement algorithm using tabular records can be very fast.

Perhaps I'll keep working on this for its own sake it's kinda fun, I like algorithmic problems.

Didn't someone on the planet comment the other day that several KDE apps are using very different approaches for storage, and wonder if perhaps they should come together and discuss what is the best way for them to store data, and perhaps unify?

A database backend for data is likely the way to remain, but what about Strigi indexing that data, and Nepomuk keeping ratings info and tags?

Well actually facts are what I'm asking about, because all I've seen thus far is claims that the current method is too slow but for all I know it takes many more song files for it to become slow than most users have and if so we'd be making the RDBMS a mandatory dependency for the benefit of but a few, that's why my initial query was about making it non-mandatory.

As for facts to back my concern a default MySQL installation consumes several tens of megs of memory which is not that much for a RDBMS but all the same pretty crazy in the face of the problem domain as it appears to be to me, that being a playlist.

Granted as someone else pointed out politely the embedded version would have less overhead, I overlooked that.

Mind you I understand why developpers would act defensive these days after the mean spirited comments there have been about Plasma and KDE 4 as a whole so I am not blaming you or anyone, but it worries me a little that the actual questions I've been asking are only now being answered after this thread has already grown more than it probably should have.

I am really missing the old quick search of Amarok 1.4. This is the feature I used most of the time and it was easily accessible using just the keyboard. But now search results are burried within the tree and I need to perform 2 mouse clicks to expand artist and album on every single search result to see the results. This makes the result totally inaccessible using the keyboard.

Would be great, if search results would be expanded automatically and if they would become accessible by the keyboard like in the old days.

I really like the new internet services. Since I installed beta2, I've listened to a number of short stories through the librivox-service, and harassed my friends using their last.fm personal radios :)

My only gripe so far is concerning the phonon-xine engine, I can't seem to make it respect ~/.xine/config, and it insists on resampling everything to 48Khz. Not the biggest issue in the world, and not really all that connected to Amarok.

I can't seem to find how to change the amount of info the playlist shows. The most important one being the actual rating of the song. I find useless the ability to rate a song if you can't have a look at the songs ratings unless you are playing the song. I mean that you can't see the ratings on the collection nor even on the playlist.

The simple reason for this is that there is no need to. One of the reasons we choose MySql-embedded is that pointing it at a standalone external service is just a config option away and is completely transparent to the rest of the application. So all we really need is a config dialog for handling this ( and potentially a way to transfer an existing db back and forth between the embedded and external server )

I am using a resolution of 1400x1050 and amarok doesnt really fit into it...

In my opinion the main window is too crowded horizontally. I know that it is still in heavy development, so what about this idea: Move the plasma "workspace" in the mainwindow into its own tab on the left, and make it cover the whole window in this new tab?

Combining this with effects (maybe under the plasmoid layer) would be awesome imho :)

The logic of moving it to the center was to emphasize its importance in the Amarok world view. Appropriate contextual information should always be availible. Unfortunatly the world of plasma makes creating displays for this appropriate contextual information a whole lot less enjoyable than it could be, and therefore a lot of this appropriate contextual information does not exist at this point.

I do not understand your claim that Amarok doesn't fit into 1400 pixels wide.... I have no problem with it here.

We are actively working towards an alternative player, tentatively named the Amarok Mobile Companion, which is being based on the groundwork in Amarok 2's collection and service frameworks, incorporating a new UI designed specifically for touch-screen small form factor devices... So... watch this space :)

Uhm, sure. I dont see any reason why it should not be possible to write a thin QtScript <-> Ruby wrapper.

BUT, this would very much be against the entire point of why we switched to QtScript only. The reason is to avoid users running into dependency hell when installing or running scripts, a problem that is magnified several times when we go truly cross platform. Now, you can argue that ruby itself is not that bad, but many scripts today use other external dependencies ( Ruby-Qt bindings, xml parsing modules, ... ) that the user also has to install. With QtScript, as you have access to the entire Qt framework, there is really no need to install additional dependencies, and we can ensure that all scripts will work "out of the box" for all users as the dependencies are met simply by having Amarok installed.

So while you might be able to get around this, it will severely limit the audience of your scripts as we will only allow pure QtScript scripts to be distributed via the "get hot new stuff" service.

Besides, although javascript has a very bad reputation, most of this ( deservedly ) comes from it being used as a web scripting language. As a popper scripting language, backed by the entire Qt and Amarok API, its really not that bad!

My scripts are only for me, so portability is not an issue at all. I have no problem with Javascript per se, I just don't want to learn yet another language. I certainly understand your reasoning, and who knows, I might wind up learning it anyway!

i am still using amarok 1.4 and i have running all the time because i use it to manage by podcasts.

in amarok 1.4, trying to open an audio file from a konqueror will automatically add it to a playlist without start playing it, the only way(as far as i know) to have amarok play a file from double clicking it in konqueror is to first make sure the playlist is empty and this is not very practical and there is no way to change this ..

how do you guys plan to handle this in amarok 2? ..can i just double click a file in konqueror and have amarok start playing it? ..just adding it to the playlist and have me go to amarok to click it again(the way i have to do in amarok 1.4) to start playing it is not very convenient.

currently, i have xmms player set as a default player in order to be able to just double click an audio file and simply have it start playing ..i think amarok should be able to do this

Well, instant-play single-shot type stuff is what Dragon Player (or Codeine in KDE 3) was designed to do, so... you could always give that one a shot :)

But to answer your more immediate question, it's a case of default actions - try right-clicking on a sound file, there should be actions to append to playlist, load and play, and append and play (depending on the version of amarok you're running there may be less options, but at least the first and last option should be there). So... you can set up the default action to be what you want it to do on double-click (and since you're already set it to double-click in stead of single-click, which is the default in KDE, you should be able to change that default action as well with not too much trouble :) ).