Common misconceptions about Correspondence Analysis

Correspondence Analysis is a critical tool for brands trying to gain insight into their image, but it is notoriously easy to misinterpret. We asked two of our PhDs to talk us through some of the common misconceptions and how you overcome them. 

Misconception 1: People think the distance between two data points represents how similar they are 

On the scatterplot (perceptual map) there could be two points positioned on the same coordinates of the chart. Some would think that they share the same position on the chart, but in fact, these two points do not share the same space because they represent different types of data.

It is natural for people to look at the scatterplot and assume that the closest points are the ones with the strongest associations. To obtain the right information about associations between both a brand and a characteristic, we should calculate the so-called ‘dot product’ between the points. This can be quite difficult and something most people struggle with (especially without a calculator or a worksheet).

Resolution: In mTab Correspondence Analysis, the user has access to the moon plot visual where the impact of this misconception is reduced. A moon plot allows the user to assess the plot and read the distance between points as an estimate of their associations to one another.

More importantly, the user can select any brand point on the scatterplot to see the visual representation of an association strength between a brand and all of its characteristics.

Misconception 2: People think that they have a complete view of the data

The scatterplot presents a two-dimensional view of the data (each axis i.e. x or y represents a dimension) and the combination of these two axes gives a data point a coordinate on the chart. Often each data point can be located on multiple axes and therefore, the view you see on a scatterplot is generally a simplified one. The two-dimensional view you see ‘explains’ the majority of patterns and variance in the data itself, but there are often more dimensions that can be used to compare data points. 

Nevertheless, Correspondence Analysis is only a statistical model – meaning there is a degree of error as it does not account for 100% of the actual variance in the data. Only the most important patterns in the data are shown and for this reason, some variances’ may be omitted. This is a factor that users are not always aware of.

Resolution: In mTab, the user can look into the data from a different perspective and enhance the interpretation of the scatter or moon plot with the values from heatmaps and line charts. This is a more comprehensive approach and makes it easier to see ‘the whole picture’ – users are informed about the quality of the statistical model in the footnote.

Misconception 3: People mistake different types of scores (e.g. the raw score vs the association value)

While looking at the ‘residual table’ the user may be mistaken about values of characteristics. For instance, if characteristic “A” has the strongest association with the “main brand”, it doesn’t necessarily mean that this characteristic has the highest raw score. There may be another characteristic with a higher raw score, for example, “B”, but “B” may have a lower association score with the “main brand”. 

How does this happen? The score for “A” could be low for every other brand except the “main brand” showing that there are large differences between the scores. The score for “B” might be high across all brands and therefore showing little difference between the scores. The result is that “A” would have a higher association with the “main brand” compared to “B”, but “B” would still maintain the higher raw score.

Often when people are looking at the results of a Correspondence Analysis they are not aware of this, and for that reason, it makes it harder to read the chart.

Resolution: In mTab the user can freely switch between the raw score and association scores, helping them to understand the meaning of both types of data. To make the visualization clear to the average user we also explain “association” using information bubbles to make it accessible to everyone. This provides users with a powerful statistical tool which is not normally available to apply more generally across the business. 

And finally, the mtab heat map allows the user to gain a full understanding of Correspondence Analysis, as it clearly explains the meaning of the “association” between your data points.