Carfax Selects MongoDB To Drive 11 Billion Records

Vehicle-history service switches to open source, NoSQL database with an eye to exploring its massive data set in new ways.

There's a 30-year-old relational database up on blocks at Carfax's Columbia, Mo., office.

On Tuesday, the Web service, which supplies used-vehicle history reports to millions of consumers and 30,000 dealerships every year, announced plans to retire its VMS-based RDBMS and switch to MongoDB, the open source, document-oriented database developed and supported by 10gen.

"VMS has been a very valuable OS for us," Carfax CTO Joedy Lenz told InformationWeek in a phone interview. "Unfortunately, with our data volumes, it became fairly expensive to operate and maintain." The production VMS system will be retired within 12 months, he said.

Carfax's Vehicle History Report, created in 1986, is the largest vehicle-history database ever assembled, with nearly 11.5 billion records and growing at 1 billion new records a year. It comprises information from more than 75,000 sources, such as U.S. and Canadian motor vehicle departments, service and repair facilities, insurance companies, and police departments.

When it takes over the driver's seat, the MongoDB will run across 50 servers. Lenz declined to name the hardware vendor. But 10gen CEO Max Schireson told InformationWeek on the phone: "Using inexpensive commodity servers means they can scale out," Schireson said.

While an open source product, 10gen claims some 500 customers worldwide who pay for its consulting and services. This customer list includes marquee Web brands like eBay and Craigslist, but traditional businesses as well, including three of the top 10 global banks and telcos, among others.

Another advantage of using MongoDB is its built-in redundancy. If a node fails, work is picked up by one or more secondary nodes.

In fact, Carfax already uses a seven-node VMS system. However, Lenz shared that in early performance testing, MongoDB ran transactions up to four times faster. But speed and cost savings weren't the only reasons Carfax decided to migrate to a NoSQL architecture.

Unlike their relational predecessors, NoSQL databases like MongoDB, Cassandra and Riak use a flexible, schema-less design that is especially well suited for massive amounts of variable data.

"Mongo does [transaction processing] with the added benefit of analytics and data mining," he said. "The sky's the limit ... we're just scratching surface."

As NoSQL products like MongoDB win new adherents, relational database vendors haven't been sitting still. Just last month, Oracle announced a major upgrade, MySQL 5.6, which includes features for high-scale deployments. For example, Oracle announced it would support direct access to data through the Memcached API, which is up to nine times faster than accessing data through SQL parsing.

ITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.