Crosstab Terminology Explained

crosstab terminologyEven those familiar with crosstab software and cross-tabulation in general may not be familiar with its unique terminology. Terms you may run across include banners, stubs, expected variables and chi-square statistics. An example of cross-tabulation in action can help explain the terms connected with the tool.

Crosstab Example: Banners and Stubs

An example used by Qualtrics aimed to determine if there were any link between survey respondents’ city of residents and their favorite baseball teams.

City of Residence Favorite Baseball Team 

Toronto

Blue Jays

Boston Red Sox

New York Yankees

TOTALS

Boston, MA 11 33 7 51
Row percent 21.57 % 64.71 % 13.73 % 34.93 %
Montreal, Canada 23 14 9 46
Row percent 50 % 30.43 % 19.57 % 31.51 %
Montpellier, VT 22 13 14 49
Row percent 44.9 % 26.53 % 28.57 % 33.56 %
TOTALS 56 60 30 146
Column percent 38.36 % 41.1 % 20.55 % 100 %

Crosstab software could be used to easily create this table, which can be used to explain banners and stubs:

  • Banners: Column variables, or the baseball team names
  • Stubs: Row variables, or the cities of residence

Chi-Square Statistics

The term chi-square statistics refers to how actual results compare to expected results. For example, the expected results of flipping a coin 100 times would be coming up heads 50 times and coming up tails 50 times. Go through the motions and you may find heads comes up 42 times and tails comes up 58 times, giving you actual results to compare to your expected results.

To compute the chi-square statistic for crosstab tables:

  1. Determine a chi-square value for each cell within the table
  2. Sum up the cells to produce a chi-square value for the entire table

Determining a chi-square value for each cell uses the equation:

  • (Observed Value – Expected Value)2/(Expected Value)

The chi-square value for tails on your coin flip would translate to:

  • (58 – 50)2/50 = (8)2/50 = 64/50 = 1.28 = non-significant

When variables are related, they typically show a very low probability of 0.05 or 5 percent. If the variables are linked or related, they’re described as “statistically significant.”

The tails coin flip equation shows a higher probability of 1.28, or 128 percent, which may lead to the conclusion that the variables are likely independent, or not related. If the variables are independent, or have no link or relationship, they’re described as “non-significant.”

Because the expected values may not always be included in the table, another way to check for a relationship is to compare a specific column’s total percent to the cell percent of the row in which you’re interested.

If you wanted to find out if the Boston Red Sox were favored heavily by those living on Boston, for instance, you could compare column and row percentages of Red Sox and Boston:

  • Column percent of Red Sox: 41.1 percent
  • Row percent of Boston residents: 64.71 percent

Here you would see the percentage of Boston residents that favored the Red Sox was indeed much higher than the overall percentage of people who favored the Red Sox.

While this analysis could provide the hypothesis that there is a relationship between living in Boston and having the Red Sox as your favorite team, it’s dangerous to create other assumptions. You may not be necessarily be able to conclude that people moved to Boston because they liked the Red Sox, or that living in Boston automatically turned you into a loyal Red Sox fan.

10-reasons-why-autos-rely-on-mTAB-whitepaper