All posts by oddur.bjarnason@gmail.com

I recently came across Kumu which is a network development software. The developers introduce it as follows:

Kumu is a powerful visualization platform for mapping systems and better understanding relationships. We help the world’s top influencers turn ideas into impact through a creative blend of systems thinking, stakeholder mapping, and social network analysis.

Kumu’s cloud-based platform enables users to:

Create a deeper understanding of key influencers and system dynamics in order to identify levers for enacting change

Discover paths of influence and make the most out of your team’s combined network

Analyze and improve how your organization collaborates to enhance innovation and execution

Tap into the power of networks in activating support for your cause or organization

Evaluate the critical role relationships play in order to change behaviors and create lasting impact

And they continue with:

Most mapping tools are overwhelmingly technical or frustratingly one dimensional. While they might work well for researchers and academics, they’re not very approachable and aren’t effective at engaging stakeholders.

Kumu helps you create and share maps and strategies that are comprehensive, yet comprehensible. Our rich set of tools will support you all the way from the initial idea to impact.

Use filter and focus to explore subsections of your maps without being distracted and overwhelmed by its entirety.

Use metrics to run powerful calculations that highlight key players and leverage points.

Hans Rosling is an extremely fine presenter of data. His visualizations using Gapminder are excellent and very effective – sometimes perhaps seductive.

In his TED talk “The best stats you have ever seen” (2006) he shows a visualization of the percentage of the world population as a function of income per person per day. He maintains that the income gap has been decreasing and is disappearing. This depends on his definition of gap. If he means the dip/relative minimum in the curve he is right. But if gap means income inequality between the poor and the rich then he not right. In fact income inequality has been increasing in recent years.

Hans Rosling exhorts all of us to use the enormous amount of data that exists for the benefit of all. He says:

“We need really to see them. We need to get them into graphic formats, where you can instantly understand them. Now, statisticians don’t like it, because they say that this will not show the reality; we have to have statistical, analytical methods.”

When Rosling says “instantly understand” I take him to mean “intuitively understand”. He is on the verge of seducing us into accepting that the relationship/correlation between the variables he visualizes implies causation.

But then he seems to feel uncomfortable with this and says:

“Many people say data is bad. There is an uncertainty margin, but…. the differences (in the data I use) are much bigger than the weakness of the data.”

This is of course an application of statistical thinking and he finally escapes by the skin of his teeth from giving the impression that he thinks that correlation implies causation by saying:

“But this is hypothesis-generating.”

The visualizations that can be made with Gapminder are extremely fine and if you are not on your guard you can easily be seduced by them. The same applies to the equally fine visualizations made with Tableau.

In a blogpost called Random Thoughts: Points and Poisson (d’Avril) Peter Coles says:

“The statistical description of clustered point patterns is a fascinating subject, because it makes contact with the way in which our eyes and brain perceive pattern….

[The above picture] was generated by a Poisson process using a Monte Carlo random number generator. All the structure that is visually apparent is imposed by our own sensory apparatus, which has evolved to be so good at discerning patterns that it finds them when they’re not even there!

The tendency to find things that are not there is quite well known to astronomers. The constellations which we all recognize so easily are not physical associations of stars, but are just chance alignments on the sky of things at vastly different distances in space. That is not to say that they are random, but the pattern they form is not caused by direct correlations between the stars. Galaxies form real three-dimensional physical associations through their direct gravitational effect on one another.

I suppose there is an evolutionary reason why our brains like to impose order on things in a general way. More specifically scientists often use perceived patterns in order to construct hypotheses. However these hypotheses must be tested objectively and often the initial impressions turn out to be figments of the imagination, like the canals on Mars.”

When I began to use the new generation of data analysis and visualization software like Tableau I thought that I would first use them to address some of the most important problems of humanity like Resource Scarcity, Inequality, Poverty, Human Migration, Refugees, …

I have found large amounts of data relevant to these problems published to the Internet by various organizations and institutions, like the United Nations, The World Bank, The World Health Organization, …The data are usually in the form of data tables with countries, regions, and locations as rows, time periods as rows or columns, and variables as columns.

The data have been collected in surveys. The completeness of the data and their reliablity is uncertain and variable.

The presentations of the data in in the worksheets and dashboards of Tableau workbooks are very fine and I have no doubt that such presentations can increase the viewers knowledge and understanding of the problems. But in order to solve a problem it is necessary to identify, eliminate or minimize its cause or causes.

The presentations can be seductive. Viewers may be tempted to identify causes by calculating correlations between the variables in the data and assuming that correlations imply causation.

Statisticians know that correlation does not imply causation. What does this mean? Correlation is a measure of how closely two things are related. You may think of it as a number describing the relative change in one thing when there is a change in the other, with 1 being a strong positive relationship between two sets of numbers, -1 being a strong negative relationship and 0 being no relationship whatsoever. “Correlation does not imply causation” means that just because two things correlate one does not necessarily cause the other. Although this is an important fact most people do not sufficiently take this into account. Their preconceptions tempt them to leap from correlation to causation without sufficient evidence.

This can result in absurd and ridiculous causal claims. Tyler Vigen has recently published the second edition of his book “Spurious Correlations” (May 8, 2015).

Provided enough data, it is possible to find things that correlate even when they shouldn’t. The method is often called “data dredging.” Data dredging is a technique used to find something that correlates with one variable by comparing it to hundreds of other variables. Normally scientists first hypothesize about a connection between two variables before they analyze data to determine the extent to which that connection exists.

Instead of testing individual hypotheses, a computer program can data dredge by simply comparing every dataset to every other dataset. Technology and data collection in the twenty-first century makes this significantly easier….This is the world of big data and big correlations….

Despite the humor, this book has a serious side. Graphs can lie, and not all correlations are indicative of an underlying causal connection. Data dredging is part of why it is possible to find so many spurious relationships….Correlations are an important part of scientific analysis, but they can be misleading if used incorrectly.”

Why is it that people are so easily allured/seduced into assuming that correlation implies causation? Vigen states: “Humans are biologically inclined to recognize patterns”. This reminds me of a blogpost in “Science or not” by Graham Coghill called “Confusing correlation with causation: rooster syndrome”.

And then he says: “This is the natural human tendency to assume that, if two events or phenomena consistently occur at about the same time, then one is the cause of the other. Hence “rooster syndrome”, from the rooster who believed that his crowing caused the sun to rise….

We have an evolved tendency to believe in false positives – when event B follows soon after event A, we assume A was the cause of B, even if this is untrue. In evolution, such beliefs are harmless, whereas the belief that A is not the cause of B when it actually is (false negative) can be fatal. Michael Shermer explains: “For example, believing that the rustle in the grass is a dangerous predator when it is only the wind does not cost much, but believing that a dangerous predator is the wind may cost an animal its life.”

Michael Shermer wrote an article in Scientific American with the title “Paternicity: Finding Meaningful Patterns in Meaningless Noise”.

He says: “Why do people see faces in nature, interpret window stains as human figures, hear voices in random sounds generated by electronic devices or find conspiracies in the daily news? A proximate cause is the priming effect, in which our brain and senses are prepared to interpret stimuli according to an expected model.

Is there a deeper ultimate cause for why people believe such weird things? There is. I call it “patternicity,” or the tendency to find meaningful patterns in meaningless noise. Traditionally, scientists have treated patternicity as an error in cognition. A type I error, or a false positive, is believing something is real when it is not (finding a nonexistent pattern). A type II error, or a false negative, is not believing something is real when it is (not recognizing a real pattern—call it “apat­ternicity”).

In my 2000 book How We Believe (Times Books), I argue that our brains are belief engines: evolved pattern-recognition machines that connect the dots and create meaning out of the patterns that we think we see in nature. Sometimes A really is connected to B; sometimes it is not. When it is, we have learned something valuable about the environment from which we can make predictions that aid in survival and reproduction.”

When data is collected in a non-random, uncontrolled, survey, it is very hazardous to base decisions and actions on the assumption that correlation implies causation. It is impossible know which correlations correspond to causation with a high probability and which are spurious. And it is impossible to estimate the risks associated with decisions and actions based on the assumption.

Correlations between variables calculated from data collected in a non-random, uncontrolled survey can not be used for anything but to state hypotheses that can be tested in statistically sound research.

Causation is extremely important. It is the most fundamental relation or connection in the universe. Without it there would be no science or technology. Our thoughts would not be connected with our actions and they would not be connected with consequences. There would be no moral responsibility and no legal system. Causation is the basis of prediction and explanation. Any intervention we make in the world around us is premised on there being causal connections that are to at least to some degree ch a predictable. Without it we would not be able to predict or explain anything. We would not be able to make decisions and not be able to act on these decisions. There would be no natural laws. There would be total chaos. Such a world is illustrated by the following picture of random points i a two-dimensional space. There is no correlation, no causal claims can be made, no prediction or explanation possible, no rational decisions can be made and no rational actions can be taken.

The picture is generated by Poisson process using a Monte Carlo random number generator. I took it from the blog by Peter Coles “In the Dark”.

We can therefore not do without causation and It is very important to be able to identify and establish causes.

Anyone who makes a causal claim must state the premises on which he/she bases the claim. He/she must have a theory of causation

What is it for one phenomenon/event to cause another phenomenon/event?

Everybody thinks that he/she intuitively knows what causation means or is and how to make valid causal claims. However, philosophers and scientists have proposed a large number of theories and there is as yet no consensus about a single theory.

Because of the uncertainty about causation I recently decided to read deeply about causation. I want to be able to identify causal relations that have a very high probability of being true, true positives and negatives, false positives and negatives. I want to avoid being seduced by spurious correlations.

This has proved to be very difficult and time consuming. I have been reading about 15 books and a large amount of other material but I have to admit that my knowledge and understanding is still marginal. This is understandable considering that philosophers and scientists have not been able to reach consensus about causation.

I shall continue to publish posts and pages about causation to this website in the hope that this will increase my knowledge and understanding about causation and perhaps also that of others.

During recent weeks I have been busy reading books and listening to interviews and lectures about Cosmology. The timeline of our universe is extremely long – 13.8 billion years. In order to visualize the whole timeline it is necessary to transform the linear scale to a logarithmic scale and in order to visualize relatively short parts of the timeline it must be possible in zoom in on them. Fortunately I came across a software program on the net called ChromoZoom:

“You can browse through all of history on ChronoZoom to find data in the form of articles, images, video, sound, and other multimedia. ChronoZoom links together a wealth of information that has been curated by experts and enthusiasts to tell important stories from history. By drawing upon the latest discoveries from many different disciplines, you can visualize the temporal relationships between events, trends, and themes. Some of the disciplines that contribute information to ChronoZoom include biology, astronomy, geology, climatology, prehistory, archeology, anthropology, economics, cosmology, natural history, and population and environmental studies.”

I have been reading the Kindle Edition of the training manual and working through its examples.

Keller says in the preface:

“This document is intended for new users who range from days to a few months. It contains some lessons that introduce advanced material to enable the user to “see” what the future can be with Tableau. The manual begins at a level geared to the average user and is written in simple language unaffected by a need to promote extraneous features.”

“Every step in this manual is supported by examples in seven (7) Tableau Packaged Workbooks that each user receives with the purchase of the manual.”

“One of my biggest regrets is that I had to work the first 20 years of my career without Tableau desktop software. Looking back, I can see how so many computer codes I designed and wrote, or software teams that I directed, were necessary because we didn’t have a tool like Tableau to help us perform our analysis…..

There were capable tools we used …, but much of the quantitative analysis had to be programmed on a case-by-case basis. If I had Tableau throughout my career, things would have been much easier, more insights would have been possible, and better models could have been built.

Now that Tableau is here, I’d like to take another crack at analyzing some of my earlier work.”

I had a similar experience and in a comment to his post I write:

“I found this post very interesting. I reminded me of how I became interested in dynamical systems. I happened to come across an article about a hydrologic model of the Okefenokee Swamp in Georgia. The articIe contained differential equations representing the model. I developed various dynamical models of the swamp in various programs like STELLA, Vensim, Mathematica, and even LiveMath (which for some reason still is close to my heart). When I later became chief of Acute Psychiatric Services at a Psychiatric Hospital I developed a lot of models of health services, especially acute psychatric services, in the hope that these models would increase my understanding and the understanding of health service administrators and politicians and thus lead to an improvement in the sevices. This however did not happen and has still not happened. Now, STELLA has quite good presentation methods, so I thought that the fact that administrators and politicians did not see the light of day immediately was an indication that they were not really interested in solving the serious problems of the acute psychiatric health services in Norway but mainly interested in decreasing the cost of those services. This is surely at least partly true. After reading your post I began to wonder whether it might partly be due to their inability to gain the necessary insight and understanding from the STELLA presentations and my comments on them. Perhaps it would be possible to increase their insight and understanding by presenting the underlying data and the data from simulations in Tableau Workbooks. Perhaps the clarity of these presentations and their availability to the public would make it imposible for them to avoid doing something about the problems.”