Can't Hire Big Data Staff? Try Enterprise Crowdsourcing

For organizations launching a big data platform, paying remote workers by task is one way to reduce expenses.

Big Data Analytics Masters Degrees: 20 Top Programs

(click image for larger view and for slideshow)

A big data management system, whether on-premises or in the cloud, often requires significant startup costs, and not just in hardware and software. Companies also need people to perform data-related tasks, but may be unable (or unwilling) to hire staff employees to do the job.

That's where enterprise crowdsourcing comes in. What is it? A recent high-profile example of big data crowdsourcing was demonstrated by The Human Face of Big Data project, in which organizers used a mobile app to collect personal information from volunteer participants around the world.

In a big data crowdsourcing model, an organization might distribute tasks -- not jobs -- to workers who have online access anywhere in the world. The goal would be to complete tasks quickly and with high quality, at a much lower cost.

Crowdsourcing specialists like Lionbridge, a 17-year-old company based in Waltham, Mass., provide translation, testing and other professional services to enterprise clients. With offices in 22 countries, Lionbridge utilizes the services of some 150,000 "crowd workers" worldwide.

"We have facilities in India, China and Poland that do software product development for companies," said Martha Crow, Lionbridge senior VP for global testing, development and crowdsourcing, in a phone interview with InformationWeek. "And what became the next evolution was this concept of enterprise crowdsourcing."

For big data operations, crowd workers could perform a variety of tasks, including data entry, cleansing and validation, Crow said.

"When you think of things like data tagging, normalization and sentiment analysis, (companies) do it either with employees or with very large pools of contractors," she added.

Enterprise crowdsourcing and big data may be a good match, particularly as data-related work can often be broken down into tasks or projects.

"Data lends itself very well to task-based work," Crow said. Example: a worker performs a specific task and is paid by an outsourcing firm (like Lionbridge), which in turn is paid by its enterprise client.

"When you think about economics here, there's something very compelling about the model of paying for what you use," said Crow.

Enterprise crowdsourcing may evoke images of technically adapt workers in developing nations toiling for low pay. However, Crow said the location of these workers often varies. "They could be anywhere," she said. "Some clients have a requirement -- for regulatory or security reasons, for instance -- and want us to build crowds for them here in the U.S., or in a particular state."

Similarly, other enterprises may have language requirements, or may request workers in specific countries.

According to the Intel Science and Technology Center (ISTC) for Big Data, crowdsourcing is becoming a popular way to accomplish tasks that computers aren't particularly good at, such as audio transcription, image annotation and document editing. But human-based solutions have their shortcomings as well.

"Although humans are often more accurate than machines at these tasks, using humans for annotating large datasets soon becomes infeasible -- say, when dealing with Web-scale data," wrote Barzan Mozafari, a postdoctoral associate at MIT, in a recent post on the ISTC for Big Data blog.

Researchers at MIT are developing ways to solve this problem by integrating machine learning into crowdsourcing workflows -- a solution that could potentially reduce the need for enterprise crowdsourcing. But that's strictly a research project at this time.

For now, there's a strong need for crowd workers in the big data space, Crow said.

"When we think of big data, we think of enterprise crowdsourcing," she added. "We're looking at the next evolution ... a different way of business getting done."

Predictive analysis is getting faster, more accurate and more accessible. Combined with big data, it's driving a new age of experiments. Also in the new, all-digital Advanced Analytics issue of InformationWeek: Are project management offices a waste of money? (Free registration required.)

Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.

Why should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.