Search this site

notes on numbers and other randomness

Category: links

He told me to get a big wall calendar that has a whole year on one page and hang it on a prominent wall. The next step was to get a big red magic marker.

He said for each day that I do my task of writing, I get to put a big red X over that day. “After a few days you’ll have a chain. Just keep at it and the chain will grow longer every day. You’ll like seeing that chain, especially when you get a few weeks under your belt. Your only job next is to not break the chain.”

On January 2nd of this year I started publishing a daily data science blog post for my team at IQNavigator with analytic results of some sort or another–charts, statistical analyses, machine learning output. My goal is to write such a post every working day for 2015, following Seinfeld’s advice of seeking consistent daily action. I’ve missed one working day so far (last Friday) but otherwise it’s been a great way to ensure I stay engaged with hands-on data science work and consistently discover interesting insights in our data set.

As value shifts from software to the ability to leverage data, companies will have to rethink their businesses, just as Netflix and Google did. In the next decade, data-driven, personalized experiences will continue to accelerate, and development efforts will shift towards using contextual data collected through passive user behaviors.

We in the West hate to acknowledge – and most refuse to believe – that our leaders have been flagrantly wasteful of Muslim lives for a century now, in countless wars and military encounters instigated by overwhelming Western power. What is the message to Muslims of the US-led invasion of Iraq in 2003? More than 100,000 Iraqi civilians – a very conservative estimate – died in a war that was based on utterly false pretenses. The US has never apologized, much less even recognized the civilian slaughter.

“The Google search algorithm” names something with an initial coherence that quickly scurries away once you really look for it. Googling isn’t a matter of invoking a programmatic subroutine—not on its own, anyway. Google is a monstrosity. It’s a confluence of physical, virtual, computational, and non-computational stuffs—electricity, data centers, servers, air conditioners, security guards, financial markets—just like the rubber ducky is a confluence of vinyl plastic, injection molding, the hands and labor of Chinese workers, the diesel fuel of ships and trains and trucks, the steel of shipping containers.

The bottom line is that science is not merely a bag of clever tricks that turn out to be useful in investigating some arcane questions about the inanimate and biological worlds. Rather, the natural sciences are nothing more or less than one particular application — albeit an unusually successful one — of a more general rationalist worldview, centered on the modest insistence that empirical claims must be substantiated by empirical evidence.

I have said many times that teamwork is over-rated. It can be a smoke screen for office bullies to coerce fellow workers. The economic stick often hangs over the team: be a team player or lose your job, is the implication in many workplaces. One of my main concerns with teams is that people are placed on them by those holding hierarchical power and are then told to work together (or else). However, there are usually power plays internal to the team so that being a team player really means doing what the leader says. For example, I know many people who work in call centres and I have heard how their teams are often quite dysfunctional. Teamwork too often just means towing the party line.

A more accurate title for this role might be CDMO – Chief Data Monetization Officer – as their role needs to be focused on deriving value from, or monetizing, the organization’s data assets. This also needs to include determining how much to invest to acquire additional data sources that would complement the organization’s existing data sources and enhance their analytic results.

I know many others that are like me in this regard and for you I have these recommendations: 1- avoid unnecessary meetings, especially if you are already in full-productivity mode. Don’t be afraid to use this as an excuse to cancel. If you are in a soft $ institution, remember who pays your salary. 2- Try to bunch all the necessary meetings all together into one day. 3- Separate at least one day a week to stay home and work for 10 hours straight. Jason Fried also recommends that every work place declare a day in which no one talks. No meetings, no chit-chat, no friendly banter, etc… No talk Thursdays anyone?

We have identified that when these four skills are brought together as one, they produce an optimal collaborative environment that breeds the most successful teams and a workplace culture that continuously propels innovation and initiative:

Seeing opportunities with broadened observation

Sowing opportunities with extensive innovation

Growing the seeds of opportunity of greatest potential

Sharing the opportunities you create and sustain with others

In fact, a study by my organization revealed that the workplace is not innovative enough because employees are mostly proficient “sowers” (with the propensity of doing what they are told very well).

The winds of change originate in the unconscious minds of domain experts. If you’re sufficiently expert in a field, any weird idea or apparently irrelevant question that occurs to you is ipso facto worth exploring. [3] Within Y Combinator, when an idea is described as crazy, it’s a compliment—in fact, on average probably a higher compliment than when an idea is described as good.

Today I believe that a major transition towards what some futurists call a “knowledge-based society” is underway. In that context what I call wirearchy represents an evolution of traditional hierarchy. I don’t think most humans can tolerate a lack of some hierarchical structure, primarily for the purposes of decision-making. The working definition I developed (and which has been ‘tested’ br a range of colleagues and friends interested in the issue(s) recognizes that the necessary adaptations to new conditions will likely involve temporary, transient but more intelligent hierarchy. The implication is that people in a wirearchy should be focused on seeking to better understand and use the growing presence of feedback loops and double-loop learning.

In this paper, we present the benchmark data set CauseEffectPairs that consists of 88 different “cause-effect pairs” selected from 31 datasets from various domains. We evaluated the performance of several bivariate causal discovery methods on these real-world benchmark data and on artificially simulated data. Our empirical results provide evidence that additive-noise methods are indeed able to distinguish cause from effect using only purely observational data. In addition, we prove consistency of the additive-noise method proposed by Hoyer et al. (2009).

In an interview with Kevin Smith, writer and television producer Paul Dini complained about a worrying trend he sees in television animation and superhero shows in particular: executives spurning female viewers because they believe girls and women don’t buy the shows’ toys.

My personal feeling is that this will really take off if you can start linking performance information to the more objective factual data within the various systems. How does the performance of interim staff vary and is that linked to which agency they come through, their employment history, the length of their assignment or other factors? We’ve probably all had experience of working with interim staff who were brilliant; and with others who weren’t worth a fraction of their day rate. So you can imagine some really powerful analysis that might give a strong steer into how you best choose, structure and manage your contingent workforce – and maybe even take that into the permanent staff world!

Totally agree! Now we just need to get hold of comprehensive and reliable performance data…

This tutorial is not about making a beautiful, perfect R package. This tutorial is about creating a bare-minimum R package so that you don’t have to keep thinking to yourself, “I really should just make an R package with these functions so I don’t have to keep copy/pasting them like a goddamn luddite.” Seriously, it doesn’t have to be about sharing your code (although that is an added benefit!). It is about saving yourself time. (n.b. this is my attitude about all reproducibility.)

People are searching for products on Amazon, rather than using Google. The only reason search makes money for Google is that people use it to search for products they would like to buy on the internet, and Google shows ads for those products. Increasingly, however, people are going straight to Amazon to search for products. Desktop search queries on Amazon increased 47% between September 2013 and September 2014, according to ComScore.

Jeff: I think it takes more time to analyze something like that. Again, one of my jobs is to encourage people to be bold. It’s incredibly hard. Experiments are, by their very nature, prone to failure. A few big successes compensate for dozens and dozens of things that didn’t work. Bold bets — Amazon Web Services, Kindle, Amazon Prime, our third-party seller business — all of those things are examples of bold bets that did work, and they pay for a lot of experiments.

…

What really matters is, companies that don’t continue to experiment, companies that don’t embrace failure, they eventually get in a desperate position where the only thing they can do is a Hail Mary bet at the very end of their corporate existence. Whereas companies that are making bets all along, even big bets, but not bet-the-company bets, prevail. I don’t believe in bet-the-company bets. That’s when you’re desperate. That’s the last thing you can do.

“The dirty secret is that a significant majority of big-data projects aren’t producing any valuable, actionable results,” said Michael Walker, a partner at Rose Business Technologies, which helps enterprises build big-data systems. According to a recent report from the research firm Gartner Inc., “through 2017, 60% of big-data projects will fail to go beyond piloting and experimentation and will be abandoned.”

3) The Convergence of VMS and FMS The continued adoption of FMS software in 2015 will produce ramifications for other segments of the labor ecosystem, particularly project-based contingent labor. Vendor Management Systems (VMS), which are used primarily to manage temporary staff and contract labor, do not address the specific needs of freelance management.

Yet data science, as a business, is still young. As the technology moves beyond the Internet incubators like Google and Facebook, it has to be applied company by company, in one industry after another.

At this stage, there is a lot of hand craftsmanship rather than software automation.

So the aspiring software companies find themselves training, advising and building pilot projects for their commercial customers. They are acting far more as services companies than they hope to be eventually.

While that may sound like a condition to be remedied, in fact we are living in an era where uncertainty and ambiguity are increasing. The reality is that we can’t shoo it away by becoming more rigid, creating more rules, or imposing more authoritarian controls. We need to loosen control, make more whitespace, give people more autonomy, and rely on the network of loose connections to influence everyone’s actions. We need a climate of soft power in a social network based on sparsity, not density, where weak and lateral connections dominate. That is the wellspring of organizational flexibility and adaptability.

In his 2003 book, Open Innovation, Henry Chesbrough defined this important concept. In short, open innovation is a product or technology development model that extends beyond the boundaries of a firm to involve others in a collaborative way. Today, much of this activity uses various social networking tools and technologies to empower people to generate ideas, fine-tune concepts, share knowledge or solve critical problems.

When you look at the evolution of digital measurement in the enterprise and study organizations that have achieved a significant degree of maturity, you’ll notice that they come in two distinct flavors: the analytic and the informational. Analytic organizations have strong teams studying the data and driving testing, personalization and customer lifecycle strategies. Informational organizations have widespread, engaged usage of data across the organization with key stakeholders absorbing and using data intelligently to make decisions. It’s not impossible for an enterprise to be both analytic and informational, but the two aren’t necessarily related either. You might expect that organizations that have gotten to be good in measurement would be mature in both areas, but that’s not really the common case. Instead, it seems that most enterprises have either a culture or a problem set that drives them to excel in one direction or the other.

“Garbage in, garbage out” is the cliché of data-haters everywhere. “It is not true that companies need good data to use predictive analytics,” Taylor said. “The techniques can be robust in the face of terrible data, because they were invented by people who had terrible data,” he noted.

Revolution R Open (RRO) is the enhanced distribution of R from Revolution Analytics. RRO is based on version 3.1.1 of the statistical software R and includes additional capabilities for improved performance, reproducibility and platform support.

But the world operates differently today. Companies own less infrastructure, inventory and manufacturing equipment than ever. They’ve outsourced everything from customer service to supply chain. And a growing portion of their workforce is not on their full-time payroll.

As a result, predictive API providers will face increasing pressure to specialize in one or a few verticals. At this point, elegant and general APIs become not only irrelevant, but a potential liability, as industry- and domain-specific feature engineering increases in importance and it becomes crucial to present results in the right parlance. Sadly, these activities are not thin adapters that can be slapped on at the end, but instead are ravenous time beasts that largely determine the perceived value of a predictive API. No single customer cares about the generality and wide applicability of a platform; each is looking for the best solution to the problem as he conceives it.