Since the problem-plagued rollout of Healthcare.gov, there’s been a lot of blame to go around — between the government’s Centers for Medicare and Medicaid Services (CMS) and the contractors, including CGI Federal, that had to build the site. What I hadn’t heard till now was that the choice of MarkLogic, a NoSQL database may also have played a role. According to the story:

Another sore point was the Medicare agency’s decision to use database software, from a company called MarkLogic, that managed the data differently from systems by companies like IBM, Microsoft and Oracle. CGI officials argued that it would slow work because it was too unfamiliar. Government officials disagreed, and its configuration remains a serious problem.

A report in The Hill a few weeks back quoted email from Healthcare.gov project leader Henry Chao mentioning MarkLogic in passing. To be fair, the issue seems to be that it’s harder to find database admins and other techies that know NoSQL databases inside and out whereas there’s a ton of existing expertise in SQL databases from Oracle, Microsoft and IBM.

According to Marklogic’s website, it offers the”only government-grade security NoSQL database” and healthcare is a big vertical industry for its software. Neither Healthcare.gov or CMS is listed among its reference accounts.

I’ve reached out to the company for a comment and will update when it’s forthcoming.

]]>When MarkLogic Founder Christopher Lindblad started working on a database for unstructured data in 2001, his efforts were prescient. Since then, the database market has since seen a proliferation of non-relational, or NoSQL, startups to handle the wide variety of data types that new data sources such as web applications and digital documents generate. The space has grown so big, in fact, that it has already started to consolidate. Amid all this, MarkLogic has managed to stand out by generating more revenue than pretty much any other vendor, according to figures Wikibon released in February.

On Wednesday, MarkLogic’s success was validated again, as the company announced a $25 million round of venture funding, bringing the total it has raised to $71.2 million. Sequoia Capital and Tenaya Capital led the round; CEO Gary Bloom and other MarkLogic executives also contributed.

MarkLogic like to tout the fact that it’s geared for enterprise use. Features such as high availability, replication, clustering and ACID compliance help differentiate the company from other NoSQL databases, Bloom told me. And although the company is taking in revenue and looks robust enough to go public now, Bloom said he would rather boost revenues to the point that MarkLogic could sustain success after an IPO.

Rather than go after the revenues that open-source NoSQL databases generate, Bloom said he wants to take away database marketshare from legacy companies peddling SQL databases, including IBM, SAP and Bloom’s previous employer, Oracle. That means MarkLogic salespeople will have to convince slower-to-change enterprises on the reality that relational databases might not be the best choice if they want to take advantage of unstructured data. MarkLogic also will have to put up with fellow NoSQL players that are adding enterprise functions, such as MongoDB,

But if MarkLogic’s plan turns out to be fruitful, a public offering could come within a year or two, Bloom said.

]]>Knowing what to expect in the big data market is now key for companies of all sizes. In this latest GigaOM Research podcast, analysts George Gilbert and Jo Maitland discuss the future of this market and what to expect over the next 12 months.

Businesses now “get” the fact that big data technologies can help them wring value out of their legacy — and largely unused — data. The Montreux Jazz Festival, which had archives of music sitting on tape since its 1967 inception, was able to put those performances, long dormant, into streamable form, for example.

This ability to monetize unproductive assets, is a huge selling point for big data, said Paul Speciale, VP of products at Amplidata, the object storage company that worked with the festival on that project. So is the ability to look outside your company to see and analyze what users, would-be users, and competitors are saying about your products and services — thus all the talk about analyzing the Twitter firehose and Facebook data.

But such projects are fueling expectations for more, better, and faster big data interactions, according to speakers at the GigaOM Structure:Data 2012 event in New York on Wednesday. The advent of consumer technologies like the iPhone’s Siri have educated consumers about the need and application of natural language processing. The ability to handle unstructured speech is a key component of many big data applications.

“We are past the distinction between consumer and business,” said Staffan Truve, CTO and co-founder of Recorded Future. “They drive each other.”

Jason Hunter, deputy CTO of MarkLogic, agreed that the explosion of fast, powerful consumer devices is driving demand for better big data applications. “I remember waiting for [compute] jobs to process over night. Now if I’m not sure I’m getting 60-frames-per-second on my iPad, I’m upset. Expectations change. I want lots of data, smart data. I want it free and I want it pretty.”

This exploding demand means the technologies around outputting data in a useable, understandable format, ingesting it into storage so that it’s manageable and searchable, and the analytics to parse that data so it’s useable will only grow.

What the AP wanted to do, VP of information management Amy Sweigert told me, is build an application that would let it search through its mountains of archived content so it could better analyze that information. Internally, the organization AP wants to better understand how much content it’s publishing on any given topic and in what formats (e.g., stories, photos, videos), but it also wants to deliver custom data sets to business-to-business customers based on whatever their needs might be.

According to Sweigert, the AP had to go with a non-relational database for a variety of reasons, with scale and freedom from schemas being chief among them. Her team actually had built a relational database, but as content volumes grew (the new system holds about 120 million pieces of content) and the team wanted the flexibility to perform new types of searches without complicated queries and — more importantly — without having to reconfigure the database to support new methods of searching, the old database had to go.

Sweigert said many large publishers are moving toward an XML-centric data model, if they’re not already there, because the format makes it so much easier to work with old content that doesn’t necessarily have metadata associated with it. What’s more, she said, the AP is actually using MarkLogic to help add metadata to some of that old content.

In that regard, the AP’s new database sounds similar to the value proposition for publishing analytics tools like Parse.ly, which launched earlier this year and already has some big-name clients under its belt. Parse.ly analyzes clients’ web content based on the text rather than the metadata, which means publishers without strict metatagging procedures or crack data analysts can still get deep insights into what topics are driving traffic.

However they do it, the rationale is the same: find a way to keep making money off of years worth of archived content, either directly or indirectly. The direct route is probably akin to what the AP is doing with its business partners, while the indirect route is the same story as any analytics effort. That being to use older content to help identify trends that can influence future decisions on both content and products.

The company, like many others in this era of big data, is updating its infrastructure to better handle a wealth of information from many sources — in this case a lot of XML documents as well as “screen scrapings” of other documents. Toward that end, it has rebuilt its infrastructure as a service-oriented architecture (SOA) that gives it flexibility in tool choice.

When LexisNexis started in the 1970s, there were not a lot of off-the-shelf search technologies, so it built its own search on the mainframe, Barton said. That has all changed in recent years with the advent of several quality search engines. “Some are better suited for some jobs than others. Our new SOA architecture means we can pick the right search engine for the right job,” Barton said.

LexisNexis uses HPCC for public records searches and still uses some FAST search, now owned by Microsoft, for some legal web searches, Barton said. HPCC is LexisNexis’ own technology for analyzing information in the intelligence and financial services industries.

For Lexis Advance, MarkLogic had an edge, because it comes with its own repository, whereas some other tools, including FAST, require a separate repository built on SQL Server or Oracle databases. “Since the bulk of our content is XML, it made sense to use MarkLogic as the repository,” Barton said.

MarkLogic positions its offering as a true big data solution. “We like to view the world as XML so we convert a lot [of documents to that format] but we’re also great at storing binaries and video and we can index that with the metadata associated with it,” said Bill Vega, the VP of solutions marketing for MarkLogic, which is based in San Carlos, Calif.

]]>The big Hadoop news today is Hortonworks’ entre into the product space with a new distribution, but it’s just one company trying to sell big-data-hungry businesses on its Hadoop prowess with new products. Individually, none of these announcements are particularly earth-shaking, but they’re very meaningful when taken as a whole. They’re part of a larger trend in which everyone with a data-driven business — Informatica, Microstrategy, HP, EMC, Oracle, ParAccel, IBM, Dell, Pentaho, Jaspersoft, you name it — has a Hadoop story to tell customers.

Karmasphere. Karmasphere and Amazon Web Services have teamed to make Karmasphere’s Analyst product available in a pay-as-you-go pricing model. This means AWS users can create Hadoop workflows using Karmasphere’s graphical interface and run them on Elastic MapReduce without having to purchase Karmasphere licenses through the traditional sales model. Of course, because the jobs run in Amazon’s cloud, users don’t have to purchase hardware either.

MarkLogic. Unstructured database provider MarkLogic is souping up version 5.0 of its product with a Hadoop connector that lets users run MapReduce jobs on MarkLogic data without it having to leave the database. That’s potentially a powerful feature because it speeds the MapReduce job by saving transmission across the network and by taking advantage of the database’s native performance features.

Sybase. Sybase, the analytic database from German software giant SAP, released version 15.4 of its IQ product, which includes a native MapReduce API within the database as well integration with Hadoop environments. The former capability is designed for structured data stored within Sybase IQ, but the Hadoop integration will, according to the announcement, allow for “different techniques to integrate Hadoop data and analysis with Sybase IQ.”

Syncsort. Data integration specialist Syncsort released DMExpress 7.0, which includes enhanced Hadoop integration to make it easier and faster to extract data from all data environments and load it into Hadoop.

You’ll hear various estimates about how much data will be stored in Hadoop in the years to come — somewhere between half and all of the world’s data — but it’s a lot any way you slice it. Big data is driving everything now because there’s so much to learn and so many business opportunities for companies that truly understand what data says about consumers, systems, climate change or anything else they want to know. Hadoop is driving the big data ship because such much of that data, and more every day, is unstructured and not suitable for traditional relational database environments.

Thus all the chest-beating about Hadoop integrations, connectors and new products: Data-focused vendors without a Hadoop story don’t have much of a story at all.

]]>Unstructured database provider MarkLogic has a new CEO with big-business experience and plans to take the fast-growing company public. CEO Ken Bado comes to MarkLogic from computer-aided design leader Autodesk, where he served as EVP of sales and services and helped drive revenue up to $2.3 billion during his time there. MarkLogic is nowhere near those levels yet, but it does have a healthy business that belies its relative youth and NoSQL ties.

Although it’s technically a NoSQL database, MarkLogic has never really embraced that label, and it hasn’t had to. Since launching in 2005, MarkLogic has grown to 250 employees and has about 240 customers, including several prominent ones in the government and media industries. According to VP of Engineering Ron Avnur, MarkLogic Server, the company’s flagship product, is ideal for storing and querying everything from intelligence data to tweets to actual documents, and can scale to more than a petabyte.

Bado says the goal going forward is to add about 150 employees and keep revenue growth around its 45-percent-per-year level before ultimately filing for an IPO when it reaches a certain, but undisclosed, revenue mark. Before that happens, Bado said, MarkLogic will have to “move from the children’s table to the adult table,” a process that will require expanding its geographic presence and attracting more developers to drive bottom-up adoption of MarkLogic Server in even more businesses. He said he’s surprised MarkLogic is still relatively unknown despite its stellar reputation and customer base, and that it beats out Oracle in some situations.

To meet its growth goals, Bado said the company will step up its aggressiveness in touting MarkLogic Server for “big data” applications, a term that has caught fire lately, and that pretty accurately describes what MarkLogic does. Avnur says it will do more than just talk about big data, though, and actually is working to further improve its story by creating a Hadoop connector that lets customers run MapReduce jobs on unstructured data stored in MarkLogic Server.

In coming on board, Bado says he wants to strike while the iron is hot in terms of all the hype around big data and MarkLogic’s natural fit into that discussion. Some things, like a hiring boom and new features, will happen quickly, he said, while others, like the IPO, will happen in due time.

Frankly, though, a lot more could end up happening in a hurry for MarkLogic, especially if its big data message catches on. There are still a few potential large-vendor suitors that might be looking to round out their strategies with an unstructured database, and MarkLogic seems to provide a safer and more robust option than buying any traditional NoSQL vendor. If anyone does approach, MarkLogic’s fate might rely on well Bado is able to deliver on his goals and how strongly its investors believe in the company’s ability to compete against large database vendors like Oracle, IBM and HP in the long run.