Association Analytics Blog

The Correlation vs Causation Conundrum

Apr 27, 2015 8:21:32 AM / by Tamsen Haught

I find that people are often confused about the difference between correlation and causation. This can lead to erroneous conclusions when performing something like a factor analysis. In factor analysis, correlation is a statistical technique that shows you the degree of relatedness between two variables. Two variables can be highly related but still have no direct cause and effect relationship. Causation is the connection between cause and effect. For example, walking into a door caused me to break my nose.  I might have been texting while I was walking. So, in this case, texting and breaking my nose might appear to be correlated.

Now, let’s walk through a scenario with which you are likely familiar. Let’s pretend we are looking at car accidents and factors related to car accidents.   You perform a factor analysis and determine that the following factors influenced car accidents:

  • Red cars
  • Younger drivers
  • Inclement weather
  • Higher speed limits

If I asked you which of these factors had a correlation vs causation to car accidents, I am sure you would notice the red cars right away. And you are right. Obviously having a red car doesn’t cause an accident. Now, what about younger drivers? Is that that correlation or causation? I am guessing you might have hesitated for a second before saying correlation, and you are right again. In this case, being a younger driver may mean you have a statistical higher probability of being in an accident, but that is not the direct cause of the accident. Let’s look at inclement weather, do you think that is correlation or causation? It may be a little harder to determine that one. It is actually correlation as well. Rain, similar to the younger driver, may mean you have a higher probability of being in an accident, but it is not the direct cause. The rain has an indirect relationship to the accident. The rain makes the roads slick and effects the cars ability to stop, but the rain is not actively pushing one car into another. The last factor is the hardest to determine right away. Are higher speed limits a correlation or causation? I hope you won’t think I tricked you once I say it is actually just a correlation. When you drive at faster speeds it will take you longer to stop and increase your risk of a crash but it is not a cause and effect relationship.

An example of a causation for a car accident might be malfunctioning breaks. Malfunctioning breaks directly lead to car accidents.

cartoon

Understanding the difference between correlation and causation will keep you from making incorrect assumptions based on your data. Understanding correlating factors and how they influence your association can help you run your organization, but only if you look at them through a critical lens. If you make business decisions based only on correlating factors you could end up marketing membership to people whose favorite color is red because members who like red have a higher renewal rate. Instead, use your data to determine causation factors to focus on the highest impact factors to success.

Topics: Predictive Analytics, Business Intelligence

Tamsen Haught

Written by Tamsen Haught