The Puzzle of Visualizing Correlations

Why Correlation?

In survey research, correlation is the extent of the relationship between two scale or numerical questions. That connection empowers decision makers to reallocate resources toward one feature of a product/service or another. For customers who research the features of a product, their overall satisfaction may be tied to their satisfaction with pre-purchase customer service. Alternatively, for price sensitive customers, it may be the purchase cost of a product that correlates highly with their overall satisfaction.

Not surprisingly, correlation analysis is a powerful initial step in analyzing complex data and building statistical models of consumers’ likes, dislikes, and decisions. Still, there is a catch! With so many question-to-question correlation values, absorbing the results from the analysis can be overwhelming. Plus, input variables can cause complications by being intercorrelated. That happens when multiple input variables have a connection with one another and the main variable. Intercorrelated variables tend to be tossed out of further analyses: a loss of valuable information.

Instant Visual Insight

The solution is a clear visualization that allows the viewer to focus on the highest correlations between her key question and a number of other questions. Maybe the chart can also show mutual influences between non-central questions. For example, a Solar Correlation Map captures both aspects.
The chart on the right displays correlation values between Overall Satisfaction (in the center of the system) and a number of experience or features questions. The fictional Spaceship Purchase Experience Survey shows that Overall Satisfaction is closely tied to Dialect (the dealership staff ability to speak the native dialect of the buyer), Staff Friendliness, and the Showroom Quality.

On the other end of the system, the quality of the Interior Panelling of the spaceship is only loosely connected to overall satisfaction. In addition, the Solar Correlation Map shows that Staff Friendliness has a stronger relationship with Dialect than Showroom, a clear example of inter-correlation.

Originally, the Python version of the Solar Correlation Map was presented by Stefan Zapf and Christopher Kraushaar (see here for article). Their reasoning was that inter-correlated variables are plotted based on the value of their Pearson coefficient with the main variable – the Sun. The variable with the strongest correlation to the Sun becomes a Planet and the highest intier-correlated with the Planet becomes its Moon. Then, positive and negative correlations are color coded to show the direction of the relationships within the Solar System.

What to Expect from Your Correlation Tool?

Tried and true approaches to correlation insights can get cluttered by many rows and columns with numbers. Consider a standard correlation matrix:

Each variable is listed as a row and a column. The result is that each value is repeated across the bottom and top triangles. It is hard to tell which ones a decision maker should focus on, isn’t it? Instead, you should look for a tool that generates a simplified matrix with color coded correlation levels: high, medium, low. In this simplified correlation matrix, hot correlations are red and cold correlations are blue. For example, in the Overall Satisfaction column below, the eye is then immediately drawn to Dialect and Staff Friendliness.

Correlation heatmap

In addition, you should expect your correlation tool to produce a report that breaks down correlations with the main question by non-categorical questions. The sample correlation report below offers a deeper dive into the gender differences in the Overall Satisfaction correlations. The report highlights that Dialect is very highly correlated with satisfaction for women, but not as much for men. Alternatively, Suede Interior is highly correlated with satisfaction for men, but not at all for women.

Correlation heatmap

From Correlation Coefficients to Actionable Insights

It is hard to get excited about a Pearson’s r of .6825 unless you’re a statistician. After all, amusing quotes attributed to statisticians have a low probability of winning the Mark Twain Prize for American Humor. Consider this one attributed to Edward Fulton Denison: “Three percent exceeds 2 percent by 50 percent, not by 1 percent”. Still, decision makers need the insights that researchers derive from the data. The right visualization has the power to bridge the two sides and deliver a clear message to the appropriate audience.

 

Authors:

Maksim Kokushkin, PhD, Dashboard Analytics Manager

Ryan Rogers, Quality Assurance Intern