Posted
by
kdawson
on Tuesday July 21, 2009 @02:02PM
from the try-this-at-home dept.

ericatcw writes "'NoSQL' alternatives such as Hadoop and MapReduce may be uber-cheap and scalable, but they remain slower and clumsier to use than relational databases, say some. Now, researchers at Yale University have created a database-Hadoop hybrid that they say offers the best of both worlds: fast performance and the ability to scale out near-indefinitely. HadoopDB was built using PostGreSQL, though MySQL has also successfully been swapped in, according to Yale computer science professor Daniel Abadi, whose students built this prototype."

German prepositions do not have direct english equivalents. I suppose being an "Ubermensch" would be talking about the HATS that people wear, since that's what's Over the Mensch (person). Stop getting your panties in a twist over things you're wrong about.

Considering that "Ubermensch" was translatable to "Superman" then "Ubercheap" would be "Supercheap"

No, it wouldn't. It would be word soup that any German would find to be awkward. To say something is "super cheap" they would say something like "superpreiswertes" which would literally translate as "super inexpensive". They wouldn't use Ã¼ber in such a situation.

Of sure, German is an extremely over verbose language at times, but the fact of the matter is that CorporateSuit, despite all his blusterings, is about as clueless in German as he tries to claim others are.

the fact of the matter is that CorporateSuit, despite all his blusterings, is about as clueless in German as he tries to claim others are.

I suppose that all comes from living in Germany for several years, speaking nothing but German with around 1,000 people a week, face to face. I suppose anyone who had gone through such rigors would end up being "clueless" in German as well. All sarcasm aside, perhaps you are more right than you think. Some Germans don't consider Koelsch or Hessisch (the dialects I ended up speaking) to be real German at all (Although they are more understandable than Bayerisch or Frankfurterisch - which is like Hessisch o

Compounds parse easier with correct parantheses: (herz)(kreislauf)(wiederbelebung) or (heart)(circle-run)(re-activation), where each of the bracketed words is itself a common compound. FWIW, Cardiopulmonary resuscitation has more characters than the German term. German and English aren't very different, in fact, in terms of compounds; English also has a huge number of compound words, even though they are often not spelled as a single word: circuit breaker, for instance. As English compounds get increasingly

We've been co-opting other language's words into English for a long, long time now. To a growing number US citizens prefixing anything with "uber" is the same as saying "ultra" or "super". You know the saying "it's all over except for the shouting"? Yeah, that's pretty much where this is.

Feel free to mod this entire thread, including the parent, uber off-topic.

You're right that ueber would not conventionally be used as a prefix in this situation, but we weren't talking about the German prefix ueber, but about the English prefix uber, which was adopted from German. The fact that you wouldn't say ueberbillig in German doesn't mean that it's improper to use ubercheap. It makes you sound a bit like an ass, but I would argue that it's in line with other conventional uses of the "uber" in English. To put this in a different perspective, English probably uses the Latin

Uber and Super both mean "above", knucklehead. Same proto-indo-european root, in fact.

Today may just be the day that you learn that a word may have more than one definition. In fact, the word you use "root" refers not just to a word's origin, but it can also refer to a very important part of plants. Do not squander this opportunity. It will open an entire new world of linguistics. I have nothing but hope for the grand future that awaits you and your once-tunneled view of the English language.

"Overcheap" would work, although it would have to be read somewhat ironically since "over" usually has a negative connotation. Anyway the Germans call cell phones "Handys", and really shouldn't complain about what we borrow from their language.

Uber-cheap is not a word, and it doesn't even make sense because you're saying it's "above cheap".

You remind me of my English teachers. Every year they kept saying that "ain't" isn't a word because it's not in the dictionary. Then one day I looked in the dictionary and it was there. The lesson I learned was that humans create words and "rules" of language aren't really rules at all. They are merely traditions. I suppose you think the French are just speaking bad Latin? No, languages change. From Old English to Middle English to Modern English it changed. I bet all along the way there was some know-it-al

The way I see it, the real question should be "does it increase the ambiguity of the language or decrease it's expressive power?". As long as someone understands what is being said (with slang like "ain't" that has been in use long enough so it is widely known) then I don't see a problem with it. We may become, somewhat Balkanized in the short-term, but, hopefully, this will serve to get those conservatives used to living in a pluralistic society and will wear down some of their xenophobia. I see the rea

If both the performance and scalability is as good as described I can safely say that this is the most important thing of the decade and not only for DBMS.Handling large portions of data would get cheaper by an order of magnitude at least and scaling out would be way cheaper than now as well. I do hope it's true.

It won't deliver. In the mean time for those of us living and working in the real world, hard-drives will be bigger and faster, file systems will get better, and SSDs will start to shit all over spinning platters.

It it will deliver it will change much. Not for your average blogger with a $10 hosting, wordpress and all his 100 readers but for all the folks that have sites successful enough to go beyond that a single DB server can deliver. Now you have to work really, really hard to make it all work with replication as pretty much no free CMS offers data sharding. Now you won't have to. Just get a DB cluster (as a service) that works out of the box with none/very little modification to the software you are using. The

I thought Essbase was supposed to be one of the best databases for managing too much information. Is this supposed to be an alternative, or act as something in-between using Essbase and a mysql server?

The grad students do all the work, and the professor takes all the credit. Anyone can come up with ideas, the real work is in actually getting things done. This is the reason I stopped grad school with my MS even though I LOVE computer science, more than anyone i've ever met.

Anyone can come up with ideas is true, HOWEVER not all ideas are GOOD ones. The problem with coming up with GOOD ideas is often people don't have a basic understanding of the problem or the implications of various ways of implementing an idea.

Getting people to do the work is often not quite as easy as it seems. First you have to have qualified people. They have to be motivated to actually complete the work given.

I take a look at what I know, verses what I knew graduating college, and I know substantially more, and more practical knowledge, things that no MS piece of paper can show.

Does your extensive post-collegiage learning include constructing a multi-clause sentence in the English language (..and I wouldn't even mention the spelling error) ? Ordinarily I wouldn't be an asshole about this except you screwed up the exact sentence where you're bragging about your amazing skills, acquired over a long professional career. And whatever that career might be, writing a readable sentence in your main language is a basic skill (and I know you're an American from your other posts).

Can't spell worth shit, doesn't negate my intelligence. I know idiots who spell perfectly. I know very intelligent people who spell well, and idiots who can't spell like me. Spelling is NOT a sign of intelligence nor education level.

And I didn't know I was going to be "graded" on a spur of the moment post to a web log. Had I known you were a lurking grammar nazi, I would have proofread my post more carefully. Perhaps even hiring someone to draft(write) it for me to post as my own.

It's not just your writing that sucks. Going by your response, your reading comprehension is not so hot either. I like your rant (which, your probably did get proofread ironically) but it was about *THE WRONG THING*. I'd leave it at that. Maybe you should spend some time on reflection rather than defending the indefensible.

Scalability is one thing, but what we appreciate in SQL-free databases is also that they don't require SQL.

When what we want is just to retrieve a record, calling get(id) is way easier and more secure than building an SQL statement, and way cheaper than using an ORM.

The Tokyo Cabinet API is absolutely excellent in this regard. And there's no need to learn yet another domain-specific language like SQL, just use the language you use for the rest of the app.

Now, SQL-zealots would troll "but how would you do with ?".And yes, for complex requests as in data mining, SQL and XPath make sense. For people who aren't developpers, SQL makes sense as well. For interoperability with 3rd-party apps, SQL is also useful, just as FAT is still useful today in order to share filesystems between operating systems.

But for the rest of us, SQL is cumbersome. Databases like MongoDB make you achieve similar results in a more natural way instead of forcing you to learn SQL and to rethink everything in a tabular way.

Marschaling is still required, but you don't need to think about being restricted to a schema, columns, types, to define identifiers for everything, to do explicit joins, etc. Just store your objects as they are in memory.Look at MongoDB: http://www.mongodb.org/ [mongodb.org]

> you don't need to think about being restricted to a schema, columns, types, to define identifiers for everything, to do explicit joins, etc. Just store your objects as they are in memory.

That's not a disadvantage in many cases, especially for the long term. There's a benefit of forcing people to use SQL to talk to the DB. It becomes a layer of abstraction, somewhat like a standard protocol or interface.

When you use SQL, your database can be used by 100 different people and programs, and when you add co

Marschaling is still required, but you don't need to think about being restricted to a schema, columns, types, to define identifiers for everything, to do explicit joins, etc. Just store your objects as they are in memory.

Well, that's all very interesting, but it doesn't sound any different than your average OODBMS, something which has been around for a *long* time (I worked with Versant nearly a decade ago doing exactly what you describe). Heck, the Smalltalk world has had intelligent object databases for

I would argue that all solutions that currently exist for databases are ideal for some specific set of problems AND some specific set of users for each problem within that initial set.

There is no "perfect" solution that will work for all types of data, be it a flatfile structure, a hierarchical structure, a relational structure, object-oriented or some combination of those. (The star-structure of OLAP databases is a hybrid, for example.)

What would be good is if there was a suitable metalanguage in which you

I would just like to state for the record, that IMHO SQL is a beautiful thing. Its ease of interoperability (both between languages and backends) has saved my butt on numerous occasions (not to mention the ease with which you can go from very simple to very complex depending on the need of your application)...

...and you can get rid of it and replace it with OOP when you pry it from my cold dead hands.

When what we want is just to retrieve a record, calling get(id) is way easier and more secure than building an SQL statement, and way cheaper than using an ORM.

Yes, I'm an SQL troll, but... if using SQL to get a row by a unique ID is too hard for you and too insecure, there is no amount of code which is going to fix your problem, which is that you are a shitty developer who is far too lazy to make a function or macro to wrap around the simple sql request.

So you're relating "id" to "a record." I assume that the record in question is a blob of potentially binary data that your program parses however it wants. So you want to relate unique identifiers to blobs. You can do that quite easily with SQL. Looking up a given unique identifier quickly is something your average relational database is very good at. And writing the wrapper function to implement your hypothetical get() function is trivial in most languages. I'm completely at a loss for what your SQL-free database is offering me in this case. It's saving you from the horror of writing 10 lines of code, once, to implement get(in)? 60 minutes with a good SQL tutorial will teach you everything you need to know. Sure, there is a lot more you can learn, but for the simple case you're describing you can understand SQL at only the most simple level.

Or are you handwaving the "a record" is actually automatically squeezed into one or more variables or objects in your code? You say get("ChaosDiscord") and out pops the UserObject populated with the relevant information. Of course, at this point you need to start teaching you database, or at least your database wrapper, how your objects are structured, and how to serialize them. This is admittedly a bit of a nuisance, but an SQL-free database doesn't magically make the problem go away. Sure, an SQL-free database can provide a layer to simplify or automate it, but so can a layer on an SQL database (Ruby on Rails is perhaps the best known). Sure, you'll need to tell it that username is a string, userid is an integer, and so on, but you only have to say it once in SQL instead of in your program. The total work hasn't gone up.

Ultimately, you appear to be complaining that SQL is too powerful (and thus complex) for your needs. But you can easily learn and use a subset of SQL that corresponds to what you claim you're looking for in an SQL-free database! You might as well complain that Java is too powerful it has thousands of classes you don't need. The time to learn the relatively minor amount of SQL you need is insignificant compared to the time to develop any non-trivial application. If even that hour is too much, you can outsource the work to a geeky college student for some pizza and soda.

There are some compelling reasons to look at SQL-free databases, but "SQL is too powerful" isn't one.

It's because SQL isn't just SQL. It's all the cruft that goes with it. Accessing a DB from an OO language is simply a major fucking pain in the ass and much harder than it should be even when using the ORM du jour. A lot of this complexity comes from the fact that OO and RDBS just don't play well together no matter how you slice it. Instead of focusing on the business domain you end up spending far too much time dicking with the data layer. A lot of this would go away by using an OO database but then y

I believe the great-grandparent poster was talking about simple key-value stores, similar to the Tokyo Cabinet system he mentioned. When people talk about Anti-SQL or SQL-Free, that seems to be what they're always talking about, although usually on the larger end with things like BigTable and HBase. My criticism was directed in that direction. Compared a key-value store to a subset of SQL, or even a key-value store implemented in SQL, the complexity difference is negligible for any but the most simplisti

Judging from your response to ahabswhale, it seems that you've pretty much made up your mind that everyone else is wrong and you're right, but I'll take a stab anyway at why I think you're missing the point:

Looking up a given unique identifier quickly is something your average relational database is very good at. And writing the wrapper function to implement your hypothetical get() function is trivial in most languages. I'm completely at a loss for what your SQL-free database is offering me in this case. It

The grandparent ultimately asserted that "for the rest of us,
SQL is cumbersome," calling out that an "SQL-free" database is
"easier", "more secure," and "cheaper."

If we're talking about essentially key-value stores, SQL can
do it well.

It's harder, but we're talking about an hour or so of work, an
hour's worth of value you can reuse for future projects. For all
of the complaints about needing to worry about tables with lots
of fields and serialization, it's moot if you just want a key
store. All that

And there's no need to learn yet another domain-specific language like SQL,

SQL, "domain specific"? Wow. I am taken aback. Over 30 years of coding, I think SQL is singlehandedly the most productive addition to the development environment I can think of since the compiler. There are a lot of reasons that using a SQL database might not make sense (small platform, single user, low cost, small required footprint, etc) but domain specificity isn't on my list. I can't think of a less domain specific development

> WTF! I think that ranks as one of the stupidest statements I have ever read on slashdot!

Tons of people aren't exactly writing PHP websites, but are still able to install vbulletin, phpbb, phpnuke, joomla, wordpress on mutualised hosting. And then they fire phpmyadmin in order to remove bogus users, to count the number of posts or visits, etc. SQL perfectly makes sense for this.

SQL injection vulnerabilities don't exist because of the database, they exist because of the crappy programmer who doesn't know how to use the database being let loose writing production code. It's a bit like saying "Lets blame the internet for Cross-site scripting vulnerabilities!".

And there's nothing wrong with SQL. There's a lot wrong with people who think SQL will solve every single problem under the sun. Unfortunately, those people seem to be employed writing 3rd-party abstraction layers and ORMs.

It's a bit like saying "Lets blame the internet for Cross-site scripting vulnerabilities!".

No, it's about having less issues by using modern tools, rather than trying to find who's to blame.

If HTML/JS/CSS/HTTP could be redesigned today, do you think that the way a browser manage cookies, XHR requests and sandboxing in general would be the same as it is today? Do you think that the SMTP protocol that was good enough 30 years ago is not a big pile of crap nowadays, even, just like ORMs, their content is now

It's a bit like saying "Lets blame the internet for Cross-site scripting vulnerabilities!".

No, it's about having less issues by using modern tools, rather than trying to find who's to blame.

If HTML/JS/CSS/HTTP could be redesigned today, do you think that the way a browser manage cookies, XHR requests and sandboxing in general would be the same as it is today? Do you think that the SMTP protocol that was good enough 30 years ago is not a big pile of crap nowadays, even, just like ORMs, their content is now shown in webmails? SQL is just like SMTP. Or the FAT filesystem. An old thing. There are worthy proposals and even working products that could superscede them, but because of legacy applications and people who want to stick with the same technologies till the end of the universe, these old things remain. They just get bloated with new extensions instead, in order to keep up with mandatory requirements.

Of course, if you were designing something today to do the job of 'relational database', you'd probably get something different from SQL. That doesn't change the fact that today, the SQL / RDBMS combo is the best tool for solving a lot of problems. That doesn't mean that people won't try and use it improperly, but those people are idiots. People don't stick with SQL because it's old, they stick with SQL because it gets the job done, and in the hands of someone who knows what they're doing, it gets the job d

The answers to those questions will say a whole lot about why PHP sucks, but very little about SQL.

in particular:

Why does a stock PHP have 5 different APIs just to issue basic MySQL queries?

Because the PHP developers have re-invented the wheel five times and still haven't figured out it's not supposed to have sharp corners. Nothing to do with SQL. Perl's DBI is a good example of a database abstraction layer done right.

And yes, for complex requests as in data mining, SQL and XPath make sense. For people who aren't developpers, SQL makes sense as well. For interoperability with 3rd-party apps, SQL is also useful, just as FAT is still useful today in order to share filesystems between operating systems.

But for the rest of us....

Sorry, but could not help thinking but to this line from "Life of Brian":

But apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, the fresh-water system, a

PostgreSQL allow turning any programming language into a query language, AFAIK. BTW, this may be off-topic, but I'm pretty sure I'll get the most DBMS geek eyes on this article, so here it goes - would it be possible/feasible to integrate the compiler system and cache with the VCS within the database? My idea is about getting the flexibility of Portage/pkgsrc systems without the hassle of compiling the whole thing, start to finish. I'm pretty sure most compile time options can be recalculated quickly, and r

Actually it's quite the opposite. For any complex task, writing a script for MongoDB, CouchDB or TC/TT is way easier and faster than an unbearable 100-lines SQL statement, that even you are unable to understand the day after. Plus it's able to get things that just can't be written as an SQL query.And your "we'll have to run it over the weekend because it'll kill the server" is also why when you need to extract stuff out of a large dataset, you write a script to process data in chunks, not a single SQL state

Actually it's quite the opposite. For any complex task, writing a script for MongoDB, CouchDB or TC/TT is way easier and faster than an unbearable 100-lines SQL statement, that even you are unable to understand the day after. Plus it's able to get things that just can't be written as an SQL query.

Can you provide concrete proof of this? "For any complex task" pretty much uses universal quantification over all problems dealing with data representation. Considering that some (not all) data (with its associated meaning and function) is best represented using relations, while others are better represented using network models (think cyclical graphs), and others are best represented as simple key-value mappings, I would find it hard to believe that truly and verily any complex data manipulation task is best represented writing a MongoDB/CouchDB/TC/TT script.

Now, if you decide to reply by saying "well, not all, but most", please provide proof, or at least some logical demonstration that this is indeed true. For, if it is, holy crap, you need to write a dissertation on this.

And your "we'll have to run it over the weekend because it'll kill the server" is also why when you need to extract stuff out of a large dataset, you write a script to process data in chunks, not a single SQL statement. If SQL is so wonderful and the answer to everything, why do stored procedures exist?

Actually it's quite the opposite. For any complex task, writing a script for MongoDB, CouchDB or TC/TT is way easier and faster than an unbearable 100-lines SQL statement, that even you are unable to understand the day after. Plus it's able to get things that just can't be written as an SQL query.

Can you provide concrete proof of this? "For any complex task" pretty much uses universal quantification over all problems dealing with data representation. Considering that some (not all) data (with its associated meaning and function) is best represented using relations, while others are better represented using network models (think cyclical graphs), and others are best represented as simple key-value mappings, I would find it hard to believe that truly and

Denormalized data (except in certain cases) is usually a sign of bad design, not an intrinsic RDBMs attribute.

As for sharded data (assuming that it's properly normalized, otherwise see previous paragraph), and assuming that it's properly sharded among functionally-sound partitions, what's the trouble in implementing the hypothetical request? Badly partitioned data is just as denormalized data; signs of someone who didn't know what he/she was doing.

No offense to the creators (well, maybe some offense) but why the heck would you want to put MySQL in where PostgreSQL already was? That's like taking out your star quarterback and putting in, well, me!

MySQL has its fan boys from circa 1994-2001. During this period, the MySQL license was much more permissive, and gained a certain momentum from PHP that carries it through to this day. At the same time, PostgreSQL was still using Cygwin on Windows, the INSTALL had a table of contents, and was lacking performance enhancements (particularly on Windows). Eventually Cygwin was dropped and the threading was happy on windows, and the performance enhancements were good. Along with this came a much shorter INSTALL file and all reason to use MySQL had disappeared. But once you know something, people like to keep on using it. Then MySQL got things like triggers, foreign key constraints and full ACID compliance. So in the end it ended being a wash.
However, and not to start a flame war, it seems that PostgreSQL, having been feature-complete (ACID, foreign keys, etc) maintained a performance edge. But also to this day MySQL has a very fast table implementation, provided you don't need things like ACID compliance. For a variety of applications, this is "good enough" and the trade-offs of feature completeness vs performance are worth it.
Disclaimer: I have used both extensively in the past. I prefer PostgreSQL, but now use neither. Now I only do SQLite (embedded tables) or Oracle (for hot replication).

We might create the software intending it to do and be used in one way, but how it will actually be used is determined by the users. Postgre and MySQL don't carry any intrinsic values, only the values which their users discover and, well, use. Without users they have no good or bad features.

So why is it that people feel the need to rally around or defend them? After all, only the developers who have done the work are capable of understanding the snips and criticism leveled against them, and these are the

I've used both pretty extensively in a wide variety of environments, and I don't take such a balanced view at all. IMHO, the best answer to most database-related problems is to use PostgreSQL or SQLite. MySQL sits somewhere between them in terms of reliability, scalability, ACIDity, etc, and kinda fails at being good at anything in particular. For that matter, even if you *like* where MySQL lies on those tradeoffs, compared to either of the other two mentioned products (especially Pg), the quality of the

Not sure what you were talking about, but hadoop and postgres are open source. Unless they're stupid, they wouldn't make the resulting product closed source.

I'm not going to make the whole free software pitch here, but lets just say I believe in the superiority of the development process and the end product through my experiences developing and using software.