Surveys & Forecasts: Research That Drives Business

Correspondence Analysis

Surveys & Forecasts regularly integrates correspondence analysis (CA) into its work because it is so powerful. CA is a data reduction and mapping technique that utilizes cross-tabular (i.e., aggregate-level) data as its primary input. It is known by several names, including additive scoring, canonical scoring, dual scaling, homogeneity analysis, optimal scaling, and reciprocal averaging. Correspondence maps replace a rectangular table of rows and columns (such top box attribute ratings for a set of brands) – with a two-dimensional graphical representation of the data. In effect, what is happening "under the hood" is a principal components reduction of the data, with the column and row points plotted on each of the factors that emerge.

The origin of the map is the average or marginal profile across both brands and attributes. That is, a column point (i.e., a brand) with the same score as the column marginal average will appear near the origin. Similarly, a row point (i.e., an attribute) with the same score as the row marginal average will appear near the origin. The origin, therefore, is at the "center of gravity" for both the cloud of row and column points.

Points that have a different profile from their marginal profile will appear further out from the origin on the plot. That is, the further out from the origin, the more the attribute is different from other attributes, and the more a brand is different from other brands. The points whose profiles are predicted poorly by Chi-square expected values, calculated from row and column marginals, are also those that will be further out from from the origin. Except in the case of a contingency table, correspondence analysis cannot be used as an inferential, hypothesis testing technique.

The plots produced from correspondence analysis are best used as an exploration tool for uncovering patterns in the data. However, care must also be used in map interpretation, as the raw data matrix may show relatively little absolute variation in scores, while the mapping algorithm attempts to spread the data points as far apart as possible based on available variance. Correspondence mapping will show how much variance between row and column data is explained by each underlying dimension. As a general rule, a map is considered to fairly represent the underlying data when at least 70% of the variance is explained by the two primary dimensions. There may be value in exploring more dimensions if the first two do a sub-par job.

Case Study

Click here and you will see a correspondence map of the results of a multi-country study that compared the appeal of different continuing education formats. As can be seen from this example, the percent of variance explained is high (97%). This implies that the map does a very good job of capturing the variability in the underlying data. Additionally, the x-axis accounts for most of the variance (85%), meaning that this underlying dimension explains the bulk of the relationship between countries and formats.

In this case, the closer a country is to a specific format, the more associated the country is with that format. For example, respondents in France and Spain appear more interested in brief continuing education formats (i.e., a few days at most), while the US, UK, and Germany are interested in longer, degree-oriented programs. Note also that the time-horizon formats also group together – the shorter duration ones on the left, and longer ones to the right. Note that the absolute physical location of countries and points is unimportant – only their relative position to one another matters.

If we were to market specific continuing education programs to these specific countries, the implications would appear to be quite clear. However, we would strongly recommend exploring the actual cross-tabular data used to create the map to make sure that these differences are meaningful enough -- in the absolute -- to support any specific action.

Back to Tools & Tutorial