Saturday, April 23, 2005

I went to this for one reason only: It was the presentation immediately preceding mine, so I was guaranteed to be on time and not have any surprises about the room.

The presenters were from Dell, and their demo app was a DVD sale system using MySQL running a (surprise!) Dell server (dual Xeons at 3.06 GHz). The three approaches were PHP, JSP, and ASP.net. I was really, really glad to see this as the presentation before mine, because the PHP database code was such a mess. In PHP, you have to piece together your own literal query, typically via string concatenation and $ substitution, and then call functions that are different for each database. If you want to even try to be database-independent, you need to add an abstration layer of some sort. I've done this before with PHP, implementing something like Python's DB-API, and having subclasses for both MySQL and PostgreSQL, but this was not done here. Worse, I am pretty sure that many of their PHP code examples were susceptible to SQL injection attacks. In short, it makes Python look damn good, which it is.

My presentation is now available on-line. I was happy with the way it turned out. We had 45 minutes, with the last five minutes for questions. I had 37 slides, and I managed to finish with seven minutes for questions, and ended up getting 10-15 minutes of questions. It was not a huge crowd -- less than 30 people I think -- but it was also the last day of the conference, with checkout time at noon, and I don't think I had anyone leave before the question session. People I talked with afterwards seemed to think it was useful, at least, including people that were already MySQLdb users, and the presentation was intended primarily for non-users.

Lunch

This was a more focused version of the "Tour of the Source" presentation, with more details on creating new column types and storage engines.

There are quite a few MySQL storage engines out there; in fact, a lot more than you might think:

ISAM. This is the original MySQL storage engine, before there were storage engines, and in fact it is now removed from 5.0.

MyISAM. This is the replacement for ISAM, around since 3.23. It's not transactional -- though it is planned to eventually make it transactional -- but very fast if you have to append new rows.

MERGE. This is, in a way, a limited sort of view. You can define a MERGE table to be the UNION of several identical (schema) MyISAM tables. For example, if you are recording log data or audit trails, you could segment your data into seperate tables for each month, and then use a MERGE table to treat them like one.

BDB. This is the Sleepycat Berkeley DB. BDB itself is not a relational database but a hashed-key database which is transactional.

InnoDB. InnoDB has been around for a long time, supposedly more than 10 years. It provides Oracle-style multiversioned concurrency control.

Archive. This is an engine that was developed by Yahoo! for logging. It supports SELECT and INSERT, and compresses records as they are inserted.

CSV. This does what you would think: Each table is stored in a CSV (comma-separated value) file, which is suitable for importing into lots of programs, especially spreadsheets.

FEDERATED. This engine actually acts as a proxy to another database server. At the moment, the remote server must be a MySQL server, but other servers are expected to be supported in the future. Transactions are not supported, but SELECT, UPDATE, DELETE, and INSERT are. In theory, it seems like transactions could be supported, and the documentation seems to hint that it could be at some point. Referential integrity can't be guaranteed, since the tables could change on the remote server without the local server being informed about it.

MEMORY. Also known as HEAP tables, these only exist in RAM and are never written to disk. They are used internally sometimes for JOINs and views, but can be used for other temporary uses. The schema is persistent; only the data is volatile.

BLACKHOLE. All SQL operations are supported, but they do nothing. Data written to BLACKHOLE tables is discarded; SELECT returns no rows. These tables are supposed to be good for replicating the schema but not the data.

EXAMPLE. This doesn't do anything useful, except provide a skeleton example of how to make your own storage engine.

Brian said that it generally takes a few hours to write a specialized storage engine. Obviously if you need a real storage engine with persistent storage and transactions, this will take longer. However, you do not need to worry about details like parsing queries, or the query cache, since these are all managed higher up in the server. For now, you have to re-compile the server to add a new storage engine, but Brian says this will change in the future, and that the MySQL module configuration will look a lot like Apache.

Post-Conference

The conference was wrapped up by 5 p.m. PDT. Since I had some time on my hands, I checked the train schedules, and decided to make a quick trip to San Francisco. This involved taking the VTA train/trolley to Mountain View and catching Caltrain to San Francisco: $6 for both trains. Once there, I got on the 30 bus and went to Chinatown. Had dinner at Cafe Honolulu. Walked up-hill a block and rode the cable car down to Fisherman's Wharf. By this time, it was dark, but you still get a good view of the city. From Fisherman's Wharf, you can see the Golden Gate bridge.

I had figured out from the train schedules that the last Caltrain train I could take and still make the last VTA train out of Mountain View left at 10:07 pm PDT. My other options were: a) take the 12:07 a.m. PDT train to Mountain View and catch a taxi back; and b) take a train that left after 4 a.m. and wander around SF for six hours. In the end, I decided to take the cable car to the other end of the line. This put me close to 4th Street, which was one of the cross streets at the Caltrain station. I missed one bus and decided to walk there, which turned out to be a pretty good idea, since I got there with about 10 minutes to spare and was not passed by any more busses. I thought it would be about six blocks, but it was more like 1.3 miles. Then took Caltrain to Mountain View, and waited on the train about half an hour before it left. By this point I was nearly falling asleep, so skipping town early was probably the right choice. I figured out that to catch my 1:03 p.m. flight the next day, I should take the VTA train leaving around 10:30 a.m., and that would take me back to the San Jose airport. Got back to my hotel after midnight; set the alarm for 8:30 a.m. and sent to sleep.

8:30 a.m. arrives. Hit snooze.

8:37 a.m. or so arrives. Turned off alarm.

10:00 a.m. arrives. Now I am wide-awake. Pack in a frenzy. Checked out about 10:25 a.m. Made it to train station in time. Had to change trains and then catch the airport flyer bus to the airport, which turns out to be only about other mile away. Checked in. Got breakfast burrito at Señor Jalopeño. Ate a pastry I had gotten from Cafe Honolulu the night before in SF and put in my coat pocket, just in case. Flight to Atlanta and airport shuttle to Athens were uneventful, in comparison.

Wednesday, April 20, 2005

Hotel Hopping

I checked out of the Four-Points Sheridan in Sunnyvale and into the Santa Clara Westin (the conference hotel). I had planned to stay at the Westin the entire trip, but they were booked up the first few days of the conference. I definitely wanted to be there the night before my own presentation. Travelling back and forth from the Sheridan cost me at least an hour every day.

In general (i.e. not just MySQL, but relational databases), not all views are updatable; some cannot be updated. Whether or not a view is updatable seems to depend mostly upon any JOIN condition that may be present in the underlying query. Aggregate functions and derived columns will also make a view (or at least the relevant columns) non-updatable. MySQL-5.0 does not support all theoretically-possible updatable views, but it supports a lot. The theoretically-possible ones it does not are mainly due to implementation or performance issues. Some of the exceptions are listed in the CREATE VIEW manual section.

When they say a view is updatable, they mean you can use UPDATE, INSERT, and DELETE.

Some new VIEW-related privileges were added. In particular, you can restrict examining the view's schema, while still allowing access. For access, you need SELECT, UPDATE, etc. on the view itself and not the underlying tables.

You can enforce constraints on views locally, or have them cascade if you have a view that is based on other views.

This was an overview of best practices, and what you should do to protect your server. The Windows-only worm that was active for one day a few months back was taking advantage of three security issues. The first was that the Windows installation (unlike the default UNIX installation) was adding access for root@%. The second: A lot of admins were choosing crappy passwords. The initial worm had a small dictionary of passwords to try, like "abc123", and would simply try everything. One user there said the list got longer as time went on, which seems entirely possible, since it would connect to IRC servers and websites. The final problem: A lot of admins were using a system account to run the server, which effectively gave it root.

The worm would log in as root, and create a table with a BLOB column. Then it would INSERT a record containing the worm payload (a DLL), and then use SELECT INTO DUMPFILE to write it to the disk. Finally it would use CREATE FUNCTION to load the DLL. To midigate the problem, mysqld now will not load a shared library for a user-defined function (UDF) unless it implements the required API calls. This is not much protection, since the attacker can add these to the payload, but it would thwart a naïve worm implementation. They also removed the root@% account, or at least no longer install it, and seem to more strongly encourage running mysqld with an unprivileged account.

At least one person voiced support for adding group/role support to the privilege system, so that you could configure privileges for a few roles and then assign multiple users to that role, or to multiple roles. The presenter was fully in agreement, saying that this was something he also wanted badly. Afterward, we had a brief discussion of how something like this might be implemented now, before there is official support. I had two separate ideas. One was to add a column to the user privilege table that would indicate whether or not this was a real user or a role, the difference being that a role account could not log in, and was simply a prototype for other users with that role. Then have a separate table that would associate users with multiple roles. The actual user privileges would be manipulated outside of mysqld by some script. The other idea was use a view for the user table which would somehow consolidate all the privileges for the roles that user was a member of.

Lunch

This was a broad overview of how the MySQL source code is organized, with some information on adding UDFs and storage engines.

There already exists some UDFs to run perl and PHP, and Brian Aker told me there was one for Python, since he was the one who originally wrote it. However, my Google searches were turning up nothing relevant, so I spent the rest of the afternoon writing one. I've gotten it to work, except that there is a dynamic linking problem which causes imports of extension modules to fail: It can't find symbols that are definitely in there from libpython. I can probably fix this with the linker flags, but I ran out of time to work on it.

One of the quirks/features of the UDF API is, if you want to have a function foo, you need to define foo_init(), foo(), and maybe foo_deinit(). foo_init() is supposed to do all the initialization and argument checking. If there are any problems with the arguments, you have to report it here; foo() cannot return an error code. Checking the arguments is not really practical with Python; it can be done with introspection but would be expensive. If an exception gets raised, the only real option is to return NULL. It does send tracebacks to the log, though. foo_deinit() is supposed to clean up, but is only called if foo_init() was sucessful. There is a separate set of API functions to define for aggregate functions.

In short, they do a lot of caching on the front end. On the database side, they split databases across servers in some cases. They also do some master-slave replication. Another setup is to use InnoDB with the database on a remote filesystem, and if the master goes down, the backup master mounts the filesystem and recovers. They hate the Linux NFS implementation (probably with good reason), and so I suspect they use some type of NAS like iSCSI for this; it wasn't clear to me. They also do a lot of perl.

Other advice: Use InnoDB. Despite some early InnoDB problems, it works quite well for them. The exception to this is logging, in which case the advice is: Use MyISAM. It's not that InnoDB is bad, but that MyISAM is so good at appending rows.

Apparently InnoDB is about 15 years old, according to one of the presenters; it's just the MySQL support that is relatively new.

MySQL-5.0 (5.0.4 is the second beta, just released) has support for views, stored procedures, triggers, and an information schema. The most interesting thing about views is that they are updatable in most cases. With PostgreSQL, it appears you do not have directly updatable views; you have to write update rules.

Stored procedures follow the SQL 2003 standard. Apparently IBM DB2 is the only other DBMS which satisfies the standard.

The information schema looks like a database, but it is virtual. It's also an SQL standard for getting metadata about your tables. According to Monty, this is not currently implemented as a storage engine, as you might expected, because there some of the hooks that would be needed don't exist yet. I think he said this was one of Brian Aker's projects.

Lunch

At least O'Reilly feeds you well at these conferences. Day 1 was Asian-themed: Thai noodle salad, stir-fried vegetables, teriaki chicken, hot and sour soup, fried rice, spring rolls. Day 2 was Mexican-themed: Field green salad with jalapeno vinegrette dressing, fajita/taco fixings, black bean soup. OSCON 2002 (the last in San Diego) also had good lunches, as I recall. OSCON 2003 had a brown bag lunch, which was provided by Microsoft. My comment at the time was that Microsoft should get used to us eating their lunch.

PyCon 2005 furnished box lunches that, while quite good, many people had to sit in the hallways to eat. However, PyCon was only $250 for a three day conference, while the early-bird price for MySQL UC is $895 for the conference without tutorials. PyCon doesn't pay speakers, but then neither does MySQL UC, except for tutorials.

MySQL-5.0.4 can act as a transaction manager for performing distributed transactions. What this means is that there is a two-stage commit (prepare and commit), and once one of the servers in the transaction successfully prepares, it guarantees that it can commit or rollback. If the TM crashes, it can recover open transactions and either commit them or roll them back. This can take an indefinite amount of time, or there can be timeouts.

An example of how you could use this is writing to multiple databases -- perhaps multiple MySQL servers or MySQL and Oracle and ZODB, etc. -- and ensuring that all servers involved either commit entirely or rollback entirely.

Clustering first showed up in early 4.1, though it had some serious limitations. Some of these limitations are removed in 4.1.10a, and more are removed in 5.0.4.

In the 4.1 series, if you do a SELECT with a WHERE clause, the MySQL server has to fetch all rows from the NDB nodes and then filter them based on the WHERE condition. In 5.0, the WHERE clause is evaluated by the NDB nodes, which can greatly reduce network traffic between the nodes and the server.

Support for BLOBs was added in 4.1.10a, but NDB is really not intended for large BLOBs. In 5.0.4 it is still an in-memory database, which is checkpointed to disk periodically.

In 4.1, NDB tables cannot use the query cache. The query cache is a function of the MySQL server, not the NDB nodes. In 5.0.4, the storage engine API has apparently been improved so that the NDB nodes can invalidate the cache in the MySQL server(s).

Replication does work with clustering, but it's not perfect. Since you can have multiple MySQL servers as a front-end to a cluster of NDB nodes, the server being replicated doesn't necessarily see all transactions, so they don't all get replicated. However, if you had a single read-write server and the rest read-only, you could replicate the data to another server which used a different cluster; the example given was for west and east coast clusters, with west replicating to east. This is in addition to the normal replication features of the cluster. If you want a disk analogy, this would be like RAID-110: A mirrored array of two RAID-10 (striped and mirrored) arrays.

Free as in Food and Beer

Happy happy joy joy

I built a new kernel last night (gentoo-sources-2.6.11-r6) and as a result, when I resume after suspending to disk, my PCMCIA and USB work. This used to work and broke at some point. It's really nice to be able to suspend and resume and still have net and trackball afterwards. In my suspend script, I shutdown PCMCIA and USB and then remove the modules, and afterwards restart PCMCIA and USB. I don't know if that's still strictly necessary, but I don't feel like breaking it again just yet.

I really hate the Blogger editor. Well, both of them. There's no tool to add headers, but you can change fonts all damn day (bad idea).

Here's a stupid trick I just found: In the composer, you can enter HTML tags, switch to edit HTML mode, then switch back to compose, and your HTML tags will now be treated as HTML.

As another example, take <h2>. The preceding is entered as SGML literals for the angle brackets. They show up as SGML literals in both the composer and HTML mode. In preview mode (and presumably published mode), they show up as angle brackets. However, if you preview your post from the posting list, surprise! the angle bracket literals now become real angle brackets.

When you switch between Edit HTML and Compose, which I do a lot, you lose your place and have to scroll.

It is totally retarded about paragraphs. A return in the composer gives you a blank line in the HTML, which can either translate into a <br /> or normal whitespace. In the former case, the so-called WYSIWYG editor shows a line break where there is none; in the latter, you get line breaks instead of proper paragraphs. So it fix it, I have to go into Edit HTML, add <p> and </p> around my paragraphs, and then go back into Compose mode and delete the extra breaks it has thrown in.

To delete characters, I have to be very careful to use Del or Backspace, because if I use CTRL-D, which is what I normally use to delete the character to the right of the cursor, it saves my post as a draft.

Insanity.

I despise the trackpad on my laptop. Periodically it decides to go completely spastic on me and is nearly unusable. Other times it is fairly smooth. I don't know which would be worse, having it like it is now or spastic all the time. At least if it were spastic all the time, I'd have more incentive to disable it completely.

Wake at 6:30 am PDT. Legs are a little stiff, but I expected much worse.

The free breakfast, rather than being the typical "continental breakfast", is a "full breakfast", which turns out to be a buffet: Pancakes, eggs, potatoes, bacon, sausage, cereal, etc. The normal price for it is $10, which is way expensive, but you get coupons which make it free.

Caught the 55 bus to Old Ironsides. Bus driver is not familiar with the route; he's driven it before, but not lately, and so he drives the route with one of the little route maps for riders. I still don't think the last stop was in the right place, but it was a block away from the hotel, so I can't complain.

There's also a trolley. One interesting thing about the trolley is the machine that sells you tickets speaks three languages: English, Spanish, and Vietnamese.

The clustering works like a new storage engine: When you create your tables, use ENGINE=NDB to use the cluster. You can have multiple mysqld front-end servers use the same cluster.

The data storage is handled by data nodes. The nodes implement replication and distribution. For a given table, rows are distributed across nodes based on a hash function. Each row is stored on n nodes, where n>0. n=1 is less reliable than no clustering at all. n=2 is most common; n=4 is the practical upper limit. For the rest of this, n=2 is assumed.

The nodes are partitioned into node groups. Each node group stores n fragments (some subset of a table). You can lose all but the last node of a node group before the cluster shuts down. If you have 32 nodes with two replicas (n=2), you get 16 node groups, and your table is split across all 32 nodes, with each fragment replicated.

WTF Moment: This is an in-memory database. Your database size is limited by the amount of RAM in your data nodes. However, the node perform checkpoints and snapshots, and so your data does get saved to disk. I spoke to Arjen Lentz during lunch, and he said this feature was purchased from Ericsson, and was developed for a telecom application where latency had to be minimized, and it was specified as an in-memory database. Disk database support is planned for the future. It was somewhat disturbing to find out about about 1/3rd of the way through, though, since it's probably the most important constraint on designing your cluster.

I knew this tutorial was in trouble when we got through 36 of 138 slides by the halfway break. We only got to slide 87 by the end. Most of the advanced material was in the last half of the slides, as near as I can tell. You know you're in trouble on your presentation when the screen saver keeps coming on. I don't think I learned anything cool from this.

The tutorial sessions are three hour presentations (with a half-hour break in the middle), or 180 minutes, so if you've got 138 slides, and want 15 minutes at the end for questions, that gives you about 70 seconds per slide. With my own presentation coming up on thursday, I'm targeting about a minute per slide, and the first third I'm planning to whiz through in about 10 minutes.

I don't think I learned anything cool from this.

Dinner

Monday, April 18, 2005

Got up before 6 a.m. to drive to Atlanta for a 10 a.m. flight to San Jose, CA (Delta). Flight left mostly on time, and full.
Fun travel fact: You don't need a driver's license to travel. When I went to PyCon in DC, I found out once I was on the airport shuttle that mine was gone. I used my university ID instead; no one questioned it. I decided to use it again for this trip, with the same results.
Fun travel fact #2: When travelling to Washington, DC, you are not allowed to leave your seat during the last 30 minutes. If you do, air marshalls might shoot you. The same rule applies for outgoing flights, although "30 minutes" is somewhat loosely interpreted: We sat on the ground for almost an hour, and we still weren't supposed to get up, although several people did, and no one got shot.
I managed to watch about 1.25 DVDs before the battery gave up on the laptop.
Arrived around noon local time. After an hour or so at the San Jose airport, I managed to catch an airport shuttle to my hotel in Sunnyvale (4-Points Sheridan). The room is nice: Large with a vaulted ceiling. Staying here was not my first choice, but the conference hotel was full for the first few days.
Fine dining at the local Jack-in-the-Box.
Walked to the conference hotel: The Westin in Santa Clara. This is about a 2.5 mi hike from my hotel. On the way up Great American Parkway, I passed a lot of the big names in the tech industry: Sun, Yahoo!, McAfee, NetGear, ATI. I also passed Globix. Once upon a time I worked for Globix in Atlanta (pre-bankruptcy). Passing it in Santa Clara is kind of running into your creepy uncle.
Santa Clara is pretty clean, at least in the sense of not having a lot of litter. The weather is nice, and the mountains are scenic. It's urban, but not so densely urban as Atlanta or New York or Washington, DC.
I registered for the conference. At this point I am pretty burned out, and I still have to get back to my hotel. I decide to walk back the other way, which is down Tasman Drive to Lawrence Expressway. This is somewhat more roundabout, as Lawrence Expressway doesn't have sideways all the way down. But I do manage to drag my ass back. The 55 bus looks like the best one to take tomorrow. Bus fare is $1.75.
I'm hurting by the time I am back to the hotel, so I decide to make use of the outdoor Jacuzzi. However, no one else has used it for some time, so it is pretty cold. I stay in about 45 minutes because I need the massage, and it does help a bit. Some high school kids come over to check it out and they all puss out fairly quickly.
Too tired to write anything that night, so I watch a DVD and go to bed about 1 a.m. EDT.

About Me

My G+ Circle policy: If you put me in one of your circles, I am not likely to reciprocate unless I either know who you are, or you have something interesting to say. I'll only see your posts if they are Public or visible via Extended Circles. Addendum: I rarely have time to look at Incoming anymore, but you should be interesting anyway.