Posted
by
Soulskill
on Friday August 22, 2014 @12:23PM
from the i'm-sure-the-marketroids-have-something-in-mind dept.

gthuang88 writes: As the marketing hype around "big data" subsides, a recent wave of startups is solving a new class of data-related problems and showing where the field is headed. Niche analytics companies like RStudio, Vast, and FarmLink are trying to provide insights for specific industries such as finance, real estate, and agriculture. Data-wrangling software from startups like Tamr and Trifacta is targeting enterprises looking to find and prep corporate data. And heavily funded startups such as Actifio and DataGravity are trying to make data-storage systems smarter. Together, these efforts highlight where emerging data technologies might actually be used in the business world.

Gartner is the king/pusher of course. But I think they were actually insightful about this 5 or so years ago. They predicted about 3 year of all hype, no product "cloud", another 3 years of practical, useful cloud infrastructure with nothing really taking advantage of it, and only after that would we see startups (and VC investment opportunities) making use of the cloud to make actual products. I think we're almost there.

Even for hobby programming, the cloud is becoming quite appealing. For example, take a look at this remarkable Mabdelbrot zoom [youtube.com] to 10^275. This required 6 core-years to render (6 months wall clock). If you have the patience, the machines sitting idle (perhaps discarded bitcoin rigs) and no fear of power bills, then sure, turn on 3 old high-CPU towers for 6 months. But if you're good at massively parallel coding (and Mabdelbrot rendering is great to learn that!) you can usually get AWS Spot [amazon.com] machines for under a penny per core-hour. That means you can get that 6 core-years of CPU for about the price of a midrange geek PC, and you can get thousands of cores in parallel, and be done rendering in a day.

For a hobby project it might be hard to justify spending $hundreds this way, but for a start-up it makes perfect sense. So there's something to the "cloud" IMO if you're trying to do supercomputer parallelism on a shoestring budget, something that's really only become possible in the past couple of years. I'm not sure how cheap 10000 core-hours for $100 is, really, but 10000 cores in parallel for an hour for $100 is something wonderful.

After big data they will hire people to think and actually produce useful/actionable insights.

Haha, you wish!

Marketing will just start a new buzzword trend. Investors will all dump shit tons of money into projects believing it's the new ".com" and try to cash in. Management will think it's the "next big thing" sheep will perform their normal function of following the herd. Techies will all scratch their heads wondering why people continue to fall into the game of hype, and continue to believe that one day people will learn.

Techies don't pull the purse strings, and until that changes the market wi

Common sense will never come into style, and "they" will never hire people to think and actually produce useful/actionable insights. You see, it's a bit of a catch-22. No one will make good decisions until someone sensible is in charge, but we'll never put sensible people in charge until we've started making good decisions. It's ignorant sociopaths all the way down.

I thought the idea of big data was that looking at ALL the data obviated the need to sample the data and all the attendant issues that come with that. Ferreting out bits and pieces of the big data set is a step backward from the idea of big data. Real numbers people can jump in here and set me straight.

However, my biggest hope for what is after big data is no one ever again having the title "Data Scientist."

The problem with 'Big Data' is everyone is trying to use it as a substitute for actual hypothesizing and experimentation.

I am not suggesting it isn't useful, it is, and it can be a huge help in identifying non-intuitive relationships that may exist. Its not being marketed that way though! Everyone is trying to sell it as the solution to all their unresolved problems and knowledge gaps.

At the end of the day all it can ever show is correlation, never causation. All the fancy AIs we add on top are really just correlation engines as well. One day real-soon-now WATSON or something like it will diagnose your cancer. It won't 'discover' the cure though, it will just apply the 'KNOWN' treatment that statistically correlates with the best outcome, hopefully excluding some which correlate with especially un pleasant side effects.

Same is true with the financial markets. Big Data alone will never discover a unified theory that explains market behavior. It will probably make a handful of people stupid amounts of money based again or event correlation and speed. As long as those are the drivers though we will remain forever at risk of sudden meltdowns.

Agreed 1000 times over. The same problem Big Data has faces statistics in general... a lot of people have been convinced that a tool created to aid human/scientific insight is now the insight itself. And these big-data/statistical insights are rarely useful, except in squeezing out extra business pennies at the expense of human agency.

The way mind comes from input to knowledge is first through "analysis of data", then sorting the data through search for "analogy" and in the end "synthesis" of the sorted data based on new hypothesis to verify correctness of understanding. I'd call those Map / Reduce / Produce. Many people are forgetting the last part, the verification. That turns the whole process into experimentation and results in new hypothesis or true understanding.

Big Data is really good at the Analysis and Analogy part of the proces

Judging from all the new aggregated travel sites that say they search "all travel sites to get you the best price", my guess is an aggregated big data warehouse that searches "all big data to get you the best target profile for your advertising. Canoe(tm). Search one and done, the best profile for the right price. Guaranteed."

In hindsight, his remark was a clear sign that the marketing hype around "big data" had peaked.

This is true, and it provides the context missing from TFS: "Big Data" is over as a marketing term. But as technological term and as far as actual implementation, it is the status quo and forevermore will be.

From a technological perspective, "Big Data" has a simple definition: more data than can be stored on a single machine. And this need will only grow as hard drives and maybe even SSDs p [datascienceassn.org]

Ads on the insides of your eyelids, random "disappearances", a stock market crash or two, a bunch of middle-aged White guys who achieved one thing and are now kicking back and raking it in as pundits, WW3, roving bands of thugs allied with motorcycle gangs terrorizing the nation, or any number of other random things that might be benign or catastrophic.

Coming up with my own startup is just too hard to do. Instead, can one of you think of a cool technology I can half-assed write in the hopes of getting a huge payout from some monolithic corp decides that they-too want to jump into the hot market of the moment?

Full disclosure/shameless plug: I work internal IT for ExtraHop Networks.

Analytics platforms like ExtraHop do the analysis on streams of data in real-time so that what gets sent to the Big Data store (such as Elasticsearch) is structured or clean data that's more immediately useful.