I've been busy. Almost too much to write about, none of it all that interesting in the grand scheme of things, so I'll just stick to recent stuff.

Our main problem lately has been the mysql database. Given the increased number of users, and the lack of astropulse work (which tends to "slow things down"), the result table in the mysql database is under constant heavy attack. Over the course of a week this table gets severely fragmented, thus resulting in more disk i/o to do the same selects/updates. This has always been a problem, which is why we "compress" the database every tuesday. However, the increased use means a larger, more fragmented table, and it doesn't fit so easily into memory.

This is really a problem when the splitter comes along every ten minutes and checks to see if there's enough work available to send out (thus asking the question: should I bother generating more work). This is a simple count on the result table, but if we're in a "bad state" this count which normally takes a second could take literally hours, and stall all other queries, like the feeder, and therefore nobody can get any work. There are up to six splitters running at any given time, so multiple this problem by six.

We came up with several obvious solutions to this problem, all of which had non-obvious opposite results. Finally we had another thing to try, which was to make a tiny database table which contains these counts, and have a separate program that runs every so often do these counts and populate the proper table. This way instead of six splitters doing a count query every ten minutes, one program does a single count query every hour (and against the replica database). We made the necessary changes and fired it off yesterday after the outage.

Of course it took forever to recover from the outage. When I checked in again at midnight last night I found the splitters finally got the call to generate more work.. and were failing on science database inserts. I figured this was some kind of compile problem, so I fell back to the previous splitter version... but that one was failing reading the raw data files! Then I realized we were in a spate of raw data files that were deemed "questionable" so this wasn't a surprise. I let it go as it was late.

As expected, nature took its course and a couple hours later the splitter finally found files it could read and got to work. That is, until our main /home account server crashed! When that happens, it kills *everything*.

Jeff got in early and was already recovering that system before I noticed. He pretty much had it booted up just when I arrived. However, all the systems were hanging on various other systems due to our web of cross-automounts. I had to reboot 5 or 6 of them and do all the cleanup following that. In one lucky case I was able to clean up the automounter maps without having to reboot.

So we're recovering from all that now. Hopefully we can figure out the new splitter problems and get that working as well or else we'll start hitting those bad mysql periods really soon.

- Matt-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

... We came up with several obvious solutions to this problem, all of which had non-obvious opposite results. Finally we had another thing to try, which was to make a tiny database table which contains these counts, and have a separate program that runs every so often do these counts and populate the proper table...

Hey.
Thanks for updating us. Must have been hellish booting all those 'puters and making them work.
But, what really interests me, why did you choose to use a separate program instead of triggers?
BR,

Our main problem lately has been the mysql database. Given the increased number of users, and the lack of astropulse work (which tends to "slow things down")...

As in "slows down the flood of results from the users" 'slow down' and so lets the servers speed up! :-)

... hanging on various other systems due to our web of cross-automounts. I had to reboot 5 or 6 of them and do all the cleanup following that. ...

Any possibility of arranging the mounts to be in a tree structure rather than a random mapping?...

Or rearrange the data locations to avoid them in the first place?...

(I'm sure you're on to that if at all possible already. I've played with multiple network mounts. Very useful on guaranteed dedicated bandwidth or if for just as temporary fixes. Otherwise, it is a nightmare just waiting to happen...)

or else we'll start hitting those bad mysql periods really soon.

The central database is a recurring theme... Can the database itself be split up across multiple machines to ease the central bottleneck?

Could some operations run offline from updating the database in real-time to be instead batched through more efficiently?

Instead of deleting results every 24 hours, could you just schedule this operation late Monday night/Tue morning (weekly) before the Tuesday outage? That would avoid the fragmented table extents and indexes during the week. Before the outage, the validated results at least 24 hours old would be purged. During the outage, you can do your compress as normal.

The only downside is that this method will consume more disk space on the database server. But the performance should increase.

It might also help the database if the compress operation you do has the option to "reuse storage"; Oracle has this feature. This keeps the extents allocated for the table and index empty but locked for its table. That way, when more inserts happen, there is no need for the DB system to "find free space". You may have to drop index(es) before this delete operation and re-create them afterward for acceptable performance.

Please point the DBA (Jeff?) to this post. He could tell in about 10 seconds if either or both of these strategies will work or not. I hope it helps.

The only downside is that this method will consume more disk space on the database server. But the performance should increase.

Far from the only downside I'm afraid. You just replaced one problem (fragmentation) with another (seven times more records). The whole point of db_purge is to stop the tables getting so big that they won't fit in memory.

The optimal solution is to design the query such that it does not need to traverse all rows - an inherently expensive operation - but I think we can safely assume that isn't possible or it would have been done already. The next best is to reduce the number of times the query is used, for example by caching the result, which is what was attempted. After that comes keeping the table in memory to avoid using slow disks, which is what db_purge is supposed to do (but isn't doing very well right now).Stats site - http://www.teamocuk.co.uk - still alive and (just about) kicking.

Sorry to hear about all the difficulties. Thanks for all the work.
Hang in there!

I wasn't sure what version of MySQL you're currently using (or even planning to use in the future) so at the risk of teaching you to suck eggs I thought I'd better mention that MySQL began shipping version 7 in April. It supports clustering, in-memory tables, load balancing, fault tolerance etc.

If you haven't upgraded to it , it might be worth a thought so you can spread the load around as per ML1's suggestion.

The only downside is that this method will consume more disk space on the database server. But the performance should increase.

Far from the only downside I'm afraid. You just replaced one problem (fragmentation) with another (seven times more records). The whole point of db_purge is to stop the tables getting so big that they won't fit in memory.

The optimal solution is to design the query such that it does not need to traverse all rows - an inherently expensive operation - but I think we can safely assume that isn't possible or it would have been done already. The next best is to reduce the number of times the query is used, for example by caching the result, which is what was attempted. After that comes keeping the table in memory to avoid using slow disks, which is what db_purge is supposed to do (but isn't doing very well right now).

What query are you talking about that must traverse all rows? That's a bad query in any context.

The problem I was addressing relates to minimizing disk I/O. The table is obviously too big to fit completely into memory anyway (remember to include indexes). So, reducing I/O should also reduce the amount of steady-state memory usage for that table. Having contiguous records in an index also helps it when searching, because the search algorithm goes from log(n)+(total_records / deleted_records) to log(n). By improving the index seek time, queries will average shorter. By optimizing the most common case, we're making the most impact on performance.

There are several other things they could do, including lowering the value of query_cache_min_res_unit to be the data size of one row from the result table (default is 4K). But without actually being there to work on this and see the performance metrics (e.g., Qcache_lowmem_prunes, etc.), I can only make thoughtful, constructive suggestions and research them before saying anything. It's up to Matt, Jeff, and the rest of them to find the time for discussion, decision, and implementation.

Performance vs storage space has always been a hallmark trade-off in computer science applications, but here I am simply presenting an alternative to status quo, that to the best of my professional experience, will accomplish what I said it would.

Performance vs storage space has always been a hallmark trade-off in computer science applications, but here I am simply presenting an alternative to status quo, that to the best of my professional experience, will accomplish what I said it would.

Well, to avoid hitting the db, I'd just add another line to the code that would add +n or -n to a persistent key in memcache after the sql update. It would only use a few bytes instead of causing such grief.
The memcache key could be reset daily or weekly with absolute values from one of more count(*) queries, and then just record deltas as the db is updated.

The values would be available immediately to all machines aware of the memcache pool, and not hit the db at all.
It could also be used for real time stats on the public website with no hit on the db either.

Could some operations run offline from updating the database in real-time to be instead batched through more efficiently?

Martin

Yes, that reminds me of the dark ages (well the 1970s actually) where updates were glorified merges of transactions with the previous master to create the new generation of the master. Master files were copied to indexed-sequential when random access was required.

1) I'm sure you've noticed, but just in case you run straight to the database when you get in - you've got another problem: no uploads since yesterday. That can't be a database problem, of course, and it's not a bandwidth problem either (been below 40 Mbit/s all day). So I presume it's one of those pesky mounts, or an underlying file system problem on the upload data store.

If they are going to enjoy a satisfactory user experience (and not take their spare cycles away again), you've GOT to get that database under control: and if it's grown too big to fit in memory, I don't see how that's going to be possible.

Instead of trying to get it to work at the present (bloated) size, have you thought about trying to reduce it to a manageable size?

Size will be proportional to (active_users)*(tasks_per_user): you want to maximize the first term, so that would mean reducing the second.

There is an unfortunate tendency among crunchers to increase cache sizes to the maximum at the first sign of trouble. That solves the problem from their own personal perspective, but I think too few of them (us?) stop to think about the strain they impose on the project they purport to support.

I saw an interesting FAQ at GPUgrid this morning, while looking for an alternative CUDA project: they "... give an additional 25% [credit] for WUs returned within two days. This is useful for us to reduce latency of the results ..." - in other words, use a bit of economic social engineering instead of brute database force to solve the problem. Look at host 4947578: eight cores, 5 GPUs, and a turnround of 19.4 days. How much does that add to your query load?

This is really a problem when the splitter comes along every ten minutes and checks to see if there's enough work available to send out (thus asking the question: should I bother generating more work). This is a simple count on the result table, but if we're in a "bad state" this count which normally takes a second could take literally hours, and stall all other queries, like the feeder, and therefore nobody can get any work. There are up to six splitters running at any given time, so multiple this problem by six.

Is it possible to have different settings for the each of the splitters?

If you could set individual timing & turn on/off values on queue size.....

If they are going to enjoy a satisfactory user experience (and not take their spare cycles away again), you've GOT to get that database under control: and if it's grown too big to fit in memory, I don't see how that's going to be possible.

Hmmm....

Admittedly, I learned about databases when 64k was a lot of memory (and it was core, not RAM) but you sure couldn't keep the whole database in memory then, and I'm not sure why it "must" fit now. I don't claim to be a MySQL expert by any means, I just assume that it's competent.

You have to measure to work this out, but scanning the whole database is likely rarely needed. There is probably an active region that'd be nice to have cached.

But the whole database? Seems unlikely, even in this day of much RAM.

I can definitely see that some operations (like getting a count) would be expensive -- and would tend to push the active records from the cache, and it sounds like they've got a start on preventing those.

It does sound like MySQL does not recover used space well on the scale that SETI adds and removes records. That's a (bigger) flaw.