Search this site

notes on numbers and other randomness

Tag: big data

Depression: A genetic Faustian bargain with infection? [Emily Deans/Evolutionary Psychiatry]. Discusses the Pathogen Host Defense (PATHOS-D) theory of depression described by Raison and Miller [pdf]. Genes that make people susceptible to depression may also protect them from infection. Depression is associated with brain inflammation; inflammation is also part of the immune response that combats infectious disease. “Since infections in the developing world tend to preferentially kill young children, there is strong selection pressure for genes that will save you when you are young, even if those genes have a cost later in life.”

The people of the petabyte [Venkatesh Rao/Forbes blogs]. An “informal taxonomy and anthropological survey of data-land” based on Rao’s observations at the Strata conference. Apparently everyone’s a data scientist now:

The taxonomy part is simple. Apparently the list of species in data land is very short. It has only one item:

Data scientist

What is the value of big data research vs. good samples [from LinkedIn Advanced Business Analytics, Data Mining and Predictive Modeling group]. Interesting and lengthy discussion from LinkedIn’s Advanced Business Analytics, Data Mining, and Predictive Modeling group on whether/when sampling vs. big data sets should be used.

This year and beyond, we will see enterprises place greater emphasis on real-world experiments as a fundamental best practice to be cultivated and enforced within their data science centers of excellence. In a next best action program, real-world experiments involve iterative changes to the analytics, rules, orchestrations, and other process and decision logic embedded in operational applications. You should monitor the performance of these iterations to gauge which collections of business logic deliver the intended outcomes, such as improved customer retention or reduced fulfillment time on high-priority orders.

Kathy Sierra on gamification in education [Larry Ferlazzo/Larry Ferlazzo’s Websites of the Day… for Teaching ELL, ESL, & ESL] Kathy offers guidelines around when gamification may be safe vs. dangerous. What falls in the dangerous category? Learning and engaging that is intrinsically rewarding, since psychology studies have suggested that rewarding such activity destroys a person (or a monkey’s) interest in doing the activity for its own sake:

The studies are both counter-intuitive and disturbing. The monkeys that enjoyed playing with wooden puzzles until given their favorite treat reward for solving the puzzles, at which time their puzzle-solving diminished. The kids given ribbons for their drawings then showed less interest in drawing. The writers shown a list of possible external reasons for writing immediately wrote less complex and interesting poems than those shown a list of intrinsically-rewarding reasons for writing. And on and on and on and on. Animals, humans, children, adults, across wide-ranging domains and in studies conducted by dozens of independent researchers.

If 99.9% of big data is irrelevant, why do we need it [Michael Wu/Lithium Lithosphere blogs] Lithium’s Principal Scientist of Analytics Wu says “Just because you can track, store, and analyze big data, doesn’t mean you should.” He argues that in many cases you can answer the questions you need to answer just by getting the relevant data — which might be able to be loaded and analyzed on a beefy computer.

Lazily musing about sharing[JP Rangaswami/Confused of Calcutta]. “Sharing is serious business” — it has serious consequences for businesses, especially for those built upon not-sharing. Five ideas about sharing:

1. For anything to be social, it must be shared.

2. Sharing, the act of making social, happens because people are made social.

Getting managers to give up their pet theories, their ideological convictions, their vested interests, their intuition, their past experience and use data and analytics to make decisions. That is the central issue that you have to and should deal with.

Apache Hadoop is unquestionably the center of the latest iteration of big data solutions. At its heart, Hadoop is a system for distributing computation among commodity servers. It is often used with the Hadoop Hive project, which layers data warehouse technology on top of Hadoop, enabling ad-hoc analytical queries.

I’m starting my first ever project with Hadoop this week–a prototype of an analytics warehouse using Amazon Elastic MapReduce. Colleagues have told me EMR is a great way to get your head around Hadoop-based data processing.

The initial study was small and involved highly screened people with a lot of support. And it seems to have suffered from publication bias–the most spectacular results got the most attention, even though these might just have been outliers.

This is distressingly common–not just in government or social-do-gooding research, but in organizations of all kinds–including corporations.

Programs at scale often don’t show results as good as pilot studies of those programs. More generally in program evaluation, it’s hard to find evidence of strong (or even weak) effects of interventions. Social systems are complex; factors other than those targeted by the intervention often determine outcomes. This is something I need to communicate regularly to my colleagues and our partners–student learning is largely determined by factors other than what we have control over. That’s not to say we shouldn’t improve our course design, teaching practices, and so forth but it is to say that there aren’t many easy pickings out there for improving student outcomes.

I know full well that a lot of not-for-profit organizations are run in a dreadful fashion; I’m just not convinced that introducing a profit motive is always or even often the best way to fix that problem…. I very much doubt that for-profit education is ever a good idea. I just don’t see how the incentives there could possibly be aligned.

But the profit motive can’t provide optimal outcomes if there isn’t consumer discipline along with it. For-profit higher education is subsidized by the government in the form of grants and low-interest loans (and note that nonprofit education is subsidized in additional ways as well, in the case of public institutions). Would-be students do not have an incentive to seriously evaluate whether the education they are purchasing is worth what they pay, because there is a third-party payer involved. The situation is much like health care. Good discussion in post of the issues and controversy over for-profit higher education.