Big data researchers have the ability to find false statistical relationships, pick whatever statistics confirm their beliefs, or show good results. Do you think big data may mean more information but also false information?

Well you might think you can hide by putting in false birth dates, transposing dates, changing parts of your name, or false addresses, etc. But for companies that really want to know who you are there are products like Entity Analytics from IBM that use some very interesting algorithms to piece disparate bits of information together to know who you really are and "fix" the "mistakes" people make when they type in information about themselves.

A valid point Chad, but big databases find false associations by testing for multiple associations and the probability of finding a false association also forms a normal curve. Thus about 5% of what are thought to be associations (null hypothesis disproven) will be false. It's doing multiple tests that's the problem, even the same test over and over on the same data as it gradually accumulates. Martin Bland's book on statistics explains this much better than I have.

The other problem with big data is heterogeneity of the population in the database, which makes it harder to spot something happening in a subgroup in that population.

If the statistics are based on the data, then they are not false. If the
data is accurate and the analysis is performed correctly (whether you
agree with the results or not and whether you believe the results
conform to your world view or not), then the data is not false.

Now, having said that, most people in the world do not understand
statistics; therefore, most people in the world can be totally misled by
someone who selectively presents correlations as "cause and effect"
relationships. Is that a fault of the science of statistics, the person
who presents the results, or the people who make assumptions about the
results? Well, that's kind of hard to say but, most commonly, people
will decry "statistics" when they finally figure out that they have been
misled. Similarly, when the analysis is presented but it contradicts
their personal world view and beliefs or it calls into questions their
life-style and/or has negative implications with regard to their
earnings, people will declare that the information is "false",
"misleading", or "unbelievable" (as though that last one is some sort of
scientific "truth" ;-) and they will refuse to believe the results.

So, whether the statistics confirm or refute your beliefs or the beliefs
of others, declaring the statistics to be "false" is the refuge of the
person who is not willing to think through the problem. If the analysis
is flawed, then find the flaw and expose it. If the calculations are
correct but presented incorrectly (e.g. the correlation exists but is
not actually a causal relationship) then expose the misleading
presentation. In other words, argue the facts not the emotions.

If you want to see the difference between arguing the facts and arguing
the emotions, read the chapter in the book "Freakonomics" about the
statistical correlation between Rowe V Wade and the decrease in crime in
New York City in the 1990's.