Slashdot videos: Now with more Slashdot!

View

Discuss

Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

joabj writes "While MySQL is the subject of much high-profile wrangling between the EU and Oracle (and the MySQL creator himself), the MySQL developers have been quietly moving the widely-used database software forward. The new beta version of MySQL, the first publicly available, features such improvements as near-asynchronous replication and more options for partitioning. A new release model has been enacted as well, bequeathing this version the title of 'MySQL Server 5.5.0-m2.' Downloads here."

If you look at the current state of data storage, the new trend is for *less* features and for more speed, concurrency, throughput and *eventual consistency*. So not supporting strict ACID and/or parts of ANSI SQL can allow databases to perform faster. Really depends on what you want to do with your data. No more one-size fits all db anymore. Even Oracle has different versions ( with a huge variance in price) for different use cases.

So depending on your use case, you can still make fun of it for not sup

Exactly! Especially since MySQL is already well known for both data loss and corruption in the name of performance. Made all the more embarrassing is that PostgreSQL consistently either meets or beats MySQL in performance and leaves it far behind in scalability. In short, PostgreSQL is literally the poster boy proving such an errant trend is bad for everyone.

At the end of the day, that's just MySQL marketing trying to explain why MySQL is inferior to PostgreSQL and other commercial offers. After all, bringi

Especially since MySQL is already well known for both data loss and corruption in the name of performance. Made all the more embarrassing is that PostgreSQL consistently either meets or beats MySQL in performance and leaves it far behind in scalability.

We moved our DB from MySQL to PG about a year ago and experienced significant performance improvements over both InnoDB and MyISAM. Pretty interesting when you consider that PG is also more secure that MySQL.

My guess is that once PG reaches critical mass as an Open Source DB name, people will be moving away from MySQL in groves.

Because of other things? To name one: I find MySQL a great database for agile database development. I can write SQL in a way that upgrades toward a situation instead of failing when the incorrect old situation was there . Lots of those things can be done without stored procedures, in a legible and maintainable manner. In PostgreSQL, I would probably have to write lots and lots of procedures, making this totally unmaintainable.

I moved our application of now over 1 millions lines from MySQL to Postgres with only a few minor changes and no stored procedures.

I'm not sure what you mean by "I can write SQL in a way that upgrades toward a situation instead of failing when the incorrect old situation was there." Can you elaborate?

In my experience, MySQL doesn't offer any more flexibility on the whole than Postgresql. Some stuff is better with MySQL, like replication, whereas other stuff is better with Postgres (performance of joins, var

You can't. I wasn't saying that MySql was perfect in all cases when you didn't care about ACID. Twitter can't use PostgreSQL. Facebook can't use PostgreSQL. The performance hit of ACID is too much. There isn't a single tool that solves all data storage and retrieval problems. You can argue that PostgreSQL is better for most usage cases than Mysql ( if you're dead set at arguing Mysql vs PostgreSQL while ignoring every other database.). I wouldn't agree with that statement, but it makes a lot more sense than

I wasn't trying to imply that PG is better than everything else for all workloads. But, for relational database scenarios, I've found that PG is faster and more secure than MySQL. We've noticed a significant performance improvement even comparing PG to MyISAM tables. Given that,

Facebook, Twitter, Amazon, etc, I believe use Key-Value stores for performance reasons. Hadoop is a big player in that game, Google, Yahoo and IBM are using it. Not sure about Amazon, Facebook, Twitter and so on.

Really? Any DB engine can do that. Just do a minimal install, remove all constraints and stop transaction logging. You'll be amazed how fast a DB engine can run.

BTW, *eventual* consistency is an oxymoron. Once it's gone it is gone. Unless you do a total wipe of all the tables and then reload them from the original data sources. Once a DB is corrupted, good luck cleaning it up. As many victims of identity theft have learned.

If you look at the current state of data storage, the new trend is for *less* features and for more speed, concurrency, throughput and *eventual consistency*.

Speed, concurrency, throughput, distributability, scalability, etc. are all features. So its not "less features" but "a different mix of features".

And, actually, the trend of having new DBs that sacrifice ACID features for speed and performance features, which then evolve to have ACID features added in as people build bigger systems with them and real

Maybe it's just me, but I find 'Psotgres' to be far lacking compared to mysql. Ease of use also counts for something when working with the masses...and yes I am making fun of your inability to spell the very product you're trying to troll with.
Fun times.

I can't think of many criteria in which PostgreSQL is lacking compared to MySQL. In my experience, MySQL is "easier to use" only in that the default security configuration on some distribution packages is easier to understand.

stop being a tool, it was obvious it was not his only criteria. And in any marketplace it is an important one that will gain more users.

Do I see netbsd high in the usage ranks?

Actually I didn't think it was obvious.

As I read it the first guy was just saying he preferred mysql to PostreSQL and that one of the deciding factors in that decision was ease of use.

The second guy as I read it was trying to discount the original argument by showing that ease of use should not be considered because that means Access would win which we consider absurd knowing many of the weaknesses with Access.

I don't think that pointing out that that is absurd reasoning is "being a tool" but I am

Actually, most desktop people who are just after ease of use end up using Excel:)

This is true; I actually thought about saying Excel instead of Access. I chose Access because it actually is a relational database.

When used with SharePoint, Access 2007/2010 is easier - SharePoint will automatically create an Access database using SharePoint lists/libraries as tables, and Access will synchronize the content. Setting up a custom SharePoint list (or customizing an existing one) isn't too difficult - certainly

Yeah, transactions. Those are a real bitch, aren't they? I mean, they get in the way all of the time, protecting your data's integrity. We can't fucking have atomicity. No fucking way. PostgreSQL totally lacks the random and unexpected data corruption that makes MySQL great.

And foreign key constraints! Stupid little motherfuckers, preventing arbitrary data entry and orphan records. In my MySQL database, I want to insert any sort of crap I feel like, even if it violates all sorts of constraints.

What ease of use issues? That hasn't been an issue in years. PostgreSQL is well supported even on Windows these days.

For the vast majority of users, PostgreSQL scales better, has far more features, supports far more PLs, is technically more advanced, has a vastly superior query optimizer, is more stable, is well supported, and doesn't have the politics surrounding it like MySQL does. Even better, it teaches proper ANSI SQL which carries over to any number of other engines, excepting MySQL.

Because most web publishers are deployers, rather than developers, of web software. The overwhelming majority of this software is written in PHP and assumes the presence of MySQL. Even those packages that support other databases often treat them as second-class citizens; they tend to be much less developed and tested.

I am a sane person, and I care more about using the database that'll work best with the apps I want to use (such as phpBB), than I do about promoting tech for its own sake.

Everytime I see this debate on slashdot it invariably degenerates into claims of mysterious missing features (what are they?) or non-transactional characteristics of MyISAM. Too bad, because I'm genuinely interested in a good comparison. I use MySQL extensively today but have worked on both in the past.

Anyone still tempted to go with the "non-acid" argument... give it up, else risk looking ignorant or foolish. MySQL is designed to support many storage engines. Anyone using MySQL this past decade who car

Why do you think so many PostgreSQL supports are so rabid about how inferior MySQL is in just about every metric that matters for a RDBMS? Its like constantly watching Pinto [wikipedia.org] owners rave about how great their car is when for the same money they could have gotten just about anything else and been better off, not to mention safer.

I like firebird a lot. What I really like is that it's easy to use the embedded version for applications, then transition to a stand alone RDBMS install with nothing more than a config change. It doesn't scale/replicate so well, but for small to mid sized projects it's far better than mysql, and for mid-large projects postgres is way better. Other than logging or chat (higher concurrency, low/no relational mapping), I can't see mysql being the best choice for anything.

Okay I'll bite. I'm not familiar with postgresql versions since about 7.x.

From http://www.postgresql.org/about/:

An enterprise class database, PostgreSQL boasts sophisticated features such as Multi-Version Concurrency Control (MVCC), point in time recovery, tablespaces, asynchronous replication, nested transactions (savepoints), online/hot backups, a sophisticated query planner/optimizer, and write ahead logging for fault tolerance. It supports international character sets, multibyte character encodings, Un

Security. Scalability. And recently, raw performance with more much more room through to exist. Superior query plan general for non-trivial queries; which also goes to the first three items listed. Extensibility such that MySQL can't even be compared. Geospacial capabilities with indicies + ACID. PLs for stored procedures and a multitude of choices and capabilities. Real life deployments where ACID accounts; compared to MySQL where people generally use it as a large, non-ACID storage retrieval system where

"near-asynchronous replication" is wrong, should be "semi-synchronous replication" as stated in the article. Striving for almost having replication asynchronous sounds like a poor implementation of synchronous replication:)

MySQL 5.5 will also support the ANSI/ISO SQL standard method of programmatically returning errors inside SQL procedures, called Signal/Resignal, which some users have called for.

This was never really an issue, because MySQL always had it's way of preforming whatever you needed it to do, but I used it in Oracle and it really does make a difference. Here's a link that will show you a bit of what it does, for those who don't know.

All in all, I'm glad things are moving forward. Still not the forerunner but still in the game.

The last two times I tested it for a true shared-nothing HA cluster, NDBCLUSTER failed miserably without a lot of tweaking. The optimizer was buggy to the point of being broken. And basically the response I got from MySQL AB at the time was, "If you want to use NDBCLUSTER, you'd better get the Enterprise Support Package". After pricing out what it would cost in support from MySQL AB AND the cost of having to go through and rewrite a bunch of our code to optimize it, it was cheaper to buy DB2.

Company I work for now uses PostgreSQL for main product lines. But two of their package are third party and use MySQL including their billing system. It works, but as it stands right now, neither of those systems are being taxed on a Dual-Quad Core DB server with 12GB RAM. In fact, it barely runs at 5% of resource utilization. We still use MySQL for one of our website's CMS. And it does the job well.

MySQL works well up until you need more than one box. Replication can work in some circumstances, but as a HA solution, it looses any advantages it had in terms of cost vs. extremely proven and reliable systems.

What are you using to cluster Postgres? I have looked at a couple of options here [postgresql.org] but didn't see an open source solution that was current and able to handle multi-master synchronous replication with some sort of automatic failover.

You generally would do the failover using another product like heartbeat, to promote a slave or whatever else needs to be done during failover. The unix way is each app does one thing and is good at it.

I agree about the HA and if you've got a highly transactional application and need enterprise grade fail-over Oracle RAC or some variation of PostgreSQL might work great for you. But for many people MySQL is still a good option and has some nice/useful features (online fs-based backups sans datapump, previously mentioned replication). There's also an amazing amount of information available about MySQL tailored to just about any skill-level, including a number of alternative approaches to HA.

The NDB releases basically forked from mainline MySQL. Latest is NDB 7.x, which is actually very good at what it does. Not *quite* ready for what we need but getting very close. ALTER ONLINE is a nifty feature... I've used it to add/drop indexes on the fly from tables in continuous use.

Earlier releases of NDB required that all indexes and data fit in main memory, but that has been remedied with disk data tables. Currently, disk data tables can only store fixed-length data (all strings are padded to thei

Nice to see some honest feedback from someone who's obviously tried the product. Glad you like ALTER ONLINE -- I'll pass that along to the devs.

MySQL 5.1.41 mainline has just been merged into what will be the next set of MySQL Cluster releases, BTW.

True variable-width columns on disk, indexes on disk, and better join performance are high on our list of priorities. As well as a few other goodies that'll be coming out early next year, but I can't talk about those just yet.:)

How about using DRBD for Mysql High Availability clustering [mysql.com] instead of NDBCLUSTER? In a nutshell, one DB instance is used to handle writes and this is synchronously replicated to a Heartbeat cluster standby node using DRBD. Asynchronous replication to more DBs handle all the reads (with load balancing between them). Use Sharding to scale out when you need more capacity.

They're also still missing 99% of the subquery optimizations they had in the 6.0 Alpha codebase and 99% of the other improvements. When they went sun they started worrying too much about BC and improvements slowed down substantially. In my opinion, if you want less buggy software on a faster release model, you need to not give priority to BC. But then you lose the support contracts, which is all sun cares about.

Is that supposed to be near-synchronous? What the hell is "near-asynchrynous"? I don't even see how "near-asynchronous" would be possible. If you aren't synchronous, you're asynchronous, and it's just a matter of how far away from synchronous you are. That's like saying "he's traveling at near-not-the-speed-of-light".

What's hard is synchronous replication it would be a very useful enhancement if 5.5 had a reliable synchronous replication option, and supported clustering, failover/hot-standby, and failed-node recovery/resynch.

It has working async (log shipping isn't synchronous) but it has lots of bugs that can jump up and bite you in the ass that haven't been fixed yet. Search the MySQL bug database for examples. It still requires you to stop your whole cluster and replicate the master to the slave to work. For large datasets this is unworkable if you need continuous uptime.