SqlScanResultProcessor: cope with non-unique uniqueid in the database
Unfortunately, the uniqueid column (or rather its index) of our urls
table is not defined as unique and unfortunately at least some code in
SqlCollection doesn't check for duplicates before inserting to the
table. This can be provoked for example by using the "Organize
Collection" functionality.
While fixing SqlCollectionLocation in short-term and making the
uniqueid index unique in long-term is probably needed, we need to cope
with existing user databases. This change is needed because
SqlScanResultProcessor identified tracks fully by their uniqueid which
resulted in unpredictable and incorrect behaviour - it for example
never removed the "old" duplicate entry in deleteDeletedTracks( int )
and sometimes found incorrect entry when importing a track in
commitTrack().
This does not solve bug 289338, but it dramatically reduces its
consequences, the (correct) duplicates are removed as soon as
collection scanner fires.
v2: the test failure spotted by Sentynel discovered a bug in patch v1,
there was an assertion that sometimes failed even for normal operation
because database updates were temporarily blocked. Fix by moving the
assertion to a place where it is valid in all cases.
CCBUG: 289338
SqlScanResultProcessor: skip removeTrack() if there were errors
Another in a series that try to minimize chance of users losing their
tracks, statistics etc.
SqlScanResultProcessor: don't accidentally delete tracks, defensive rewrite
This fixes data-loss that I can trigger every time by toggling "Local
Files & USB Mass Storage Backend" in Config -> Plugins, restarting
Amarok and triggering collection update / rescan.
In theory, more things such as cloning changing disk could trigger this
problem, from SqlScanResultProcessor::deleteDeletedDirectories() comment:
We need to match directories by their (absolute) path, otherwise following
scenario triggers statistics loss (bug 298275):
1. user relocates collection to different filesystem, but clones path structure
or toggles MassStorageDeviceHandler enabled in Config -> plugins.
2. collectionscanner knows nothings about directory ids, so it doesn't detect
any track changes and emits a bunch of skipped (unchanged) dirs with no
tracks.
3. SqlRegistry::getDirectory() called there from returns different directory id
then in past.
4. deleteDeletedDirectories() is called, and if it operates on directory ids,
it happily removes _all_ directories, taking tracks with it.
5. Tracks disappear from the UI until full rescan, stats, lyrics, labels are
lost forever.
Also add a handful of asserts, ScanResultProcessor is very complicated
and small error or corner-case in logic may result in horrible data
losses.
Reporters of linked bugs, please try to reproduce your data-loss with
this patch applied and report back in both cases. In negative case,
please reopen and attach full updated amarok --debug log.
After this patch, only (statistics, lyrics and labels)-loss operations
should be:
a) moving track out of mounted collection folders [by design]
b) changing both metadata and url from outside of a track not tagged
by amarok_afttagger [we can do nothing about this]
BUG: 298275
FIXED-IN: 2.6
SqlMeta::Track: remove unused methods deviceId() and rpath()
There are unused and rather internal, to remove them.
id() and urlId() are unused, too, but these are at least
in theory useful.
SqlScanResultProcessor: remove dead code, sanitize includes
SqlRegistry, DatabaseUpdater: delete stats and everything in removeTrack()
SqlRegistry::removeTrack() had code to remove the entry from the tracks
table and to preserve the entry in the statistics table. This doesn't
work, because SqlRegistry construction calls
DatabaseUpdater::deleteAllRedundant( url ) on next start-up that
removes the url entry, plus we don't know how long we should keep the
entry, so we just delete everything. ScanResultProcessor should be
witty enough not to delete tracks that have been moved to another
directory and/or device, even if it is currently unavailable.
Additionally, we clean up the statistics, urls_labels, lyrics tables
on start-up to avoid stale entries (pointing to deleted url) in a way
similar to what it is done with other tables.
TestSqlScanManager is updated so that it doesn't test removed
behaviour.
SqlScanResultProcessor: rename attributes in preparation for a fix
...because I was really confused by the old names.

Looks good for now.
What I would really like is some more test cases to check for the current problems.
Also I am working on a solution for the database schema as you can see from the database table .svg file in the docs directory.

> Looks good for now.
> What I would really like is some more test cases to check for the current problems.
Agreed, but due to their complexity, I won't be able to write them in Amarok 2.6 timeframe. Do you think we can merge this for 2.6 even without tests?
> Also I am working on a solution for the database schema as you can see from the database table .svg file in the docs directory.
Yeah, I've seen it and generally it's a right step forward IMO. But please let me look at it in more detail after 2.6 is released. Quick question: don't you think that multiple artists (etc) for a track will make code much more complicated?

Okay, thanks. I've tested it again & again so let's hope it won't eat our users (it should not, worst case is that Amarok will abort on too eager assertion). I've put it to reviewboard mainly to get approval with the general direction.

Review request changed

July 25, 2012, 12:01 a.m.

Change Summary:

Update the patch to fix tests + one too brave assertion that was hit even in correct operation. The failing test actually discovered a bug in the patch - kudos go to Ralf for fixing testsqlscanmanager in the first place to make this possible.
This is the final version that I'll merge to master soon.

Description:

-

SqlScanResultProcessor: debugging; not intended to be merged to master

-

-

-

SqlScanResultProcessor: cope with non-unique uniqueid in the database

Unfortunately, the uniqueid column (or rather its index) of our urls

table is not defined as unique and unfortunately at least some code in

SqlCollection doesn't check for duplicates before inserting to the

table. This can be provoked for example by using the "Organize

Collection" functionality.

While fixing SqlCollectionLocation in short-term and making the

uniqueid index unique in long-term is probably needed, we need to cope

with existing user databases. This change is needed because

SqlScanResultProcessor identified tracks fully by their uniqueid which

resulted in unpredictable and incorrect behaviour - it for example

never removed the "old" duplicate entry in deleteDeletedTracks( int )

and sometimes found incorrect entry when importing a track in

commitTrack().

+

This does not solve bug 289338, but it dramatically reduces its

+

consequences, the (correct) duplicates are removed as soon as

+

collection scanner fires.

+

+

v2: the test failure spotted by Sentynel discovered a bug in patch v1,

+

there was an assertion that sometimes failed even for normal operation