What You Need to Know about Principal Component Analysis (PCA)

The World's Research Marketplace

Find, buy, and analyze third-party research from top providers.

explore marketplace

Whether your survey aims to produce a customer retention analysis or feedback on your updated website, your survey is only good if it’s valid. And principal component analysis (PCA) is a useful step in the validation process.The most effective way to use PCA is with the help of a knowledgeable expert, although you still want to understand how PCA works and why it’s a helpful step – even if you’re letting an expert and your software do the calculations.What It Is, What It DoesPCA is a way to identify underlying components in your survey questions. Often called factor loadings or component loadings, the information provides insights on the specific factors your questions are actually measuring.Perhaps you have a question asking about one company your customers would never give up. Another asks about the length of time they’ve been using that company. A third question may ask what customers look for in a company before deciding to give it the bulk of their business. While the three questions are asking about different specifics, the underlying factor, or component, in each question is customer loyalty.Factor loadings range from -1.0 to 1.0, and pulling out the principal components typically involves looking for values that are at least 0.6. In the case of the three questions above, all load heavily into the customer loyalty category.A fourth question asking about income of the survey respondents doesn’t load at all into the customer loyalty factor, although it may load nicely into factors shared by other questions concerned with customer demographics.The number of factor themes your questions load into determines how many factors your overall survey is measuring. Validating your survey involves ensuring your survey measures what you intend to measure, and that’s much easier to assess with the help of PCA.PCA helps you pull out the underlying factors from the questions, or any other type of data, to determine the principal components. This, in turn, helps you transform large amounts of data into smaller, easier-to-digest sets that can be more rapidly and readily analyzed.Put another way, PCA can help you strip away unnecessary components of the data so it’s reduced to its basic, or principal, components.Another PCA Example A solid example of PCA in action comes from Oxford Internet Survey (OxIS) research, specifically in its OxIS 2013 report that analyzed 10 years’ worth of Internet use in Britain. The survey asked an estimated 2,000 people about their Internet use, with the survey containing multiple questions to uncover varying factors.If the survey contained 50 questions, you would be left with results containing 50 variables. Attempting to analyze 50 different variables would require a lot of time, effort and complexity, but the task could be made much easier by turning to PCA.PCA lets you boil down the data to its principal components, greatly reducing the number of variables that need to be analyzed. In the case of the OxIS 2013 report, for example, analysts identified a total of four principal components.These components represented the four sets of attitudes and beliefs that captured the most variants across a number of different items. The report referred to these components as:

  • Enjoyable escape
  • Instrumental efficiency
  • Social facilitator
  • Problem generator

Instead of attempting to analyze each and every component within the survey, PCA allows you to eliminate factors that have insignificant values. PCA as Coordinate System TransformationAnother way to view PCA is as coordinate system transformation, one that lets you choose a proper coordinate system to find the simplest way to describe an object. Let’s say you were trying to describe a standard shape, such as an ellipse.One way to describe the ellipse would be to use three axes: length, width and height. In this case, the data for the ellipse would be written as a function of those three variables, or:

  • Ellipse data = Length, Width, Height

You instantly see how the straightforward object can end up with unnecessary or complex variables, especially since the ellipse you’re describing is a two-dimensional object that doesn’t require the additional variable that indicates depth.Choose a different coordinate system to describe the ellipse, and you can instead find the center of the ellipse to use as the origin from which you measure two different components. One measurement could be the direction along which the ellipse has the longest radius. Another could be the direction that is perpendicular to the first measurement.Now the ellipse can be written as a function of only two variables, known as component one and two, or:

  • Ellipse data = Component 1, Component 2

Using a proper coordinate system transformation provides you with fewer variables, or lower dimensions of variables, without losing any information. The variables you choose let you describe your data much more concisely and conveniently, and those variables are known as your principal components.A spot check comparison of your two ellipse examples illustrates PCA’s overall simplification, with the second example much more concise and convenient than the first:

  • Example 1: 3 Variables: Length, Width, Height
  • Example 2: 2 Variables: Component 1, Component 2

To function properly, principal components must meet two main criteria:

  • They must be perpendicular to each other and not linear correlated between each other in any way. That means you shouldn’t end up with a principal component related to job security and another related to amount of time on the job, as the two factors are often related. Using related principal components can leave you with redundant information.
  • They must have numerical identifications ordered by the length of radius of data points along them. Component 1, therefore, would have the largest variance along the axes. The second-largest variance along the axes would be component 2, the third-largest would be component 3, and onward. The higher order a component has, the more important it is. Minor components are often sacrificed to reduce the number of variables and simplify the data set.

Being aware of PCA basics makes it easier to understand why it can be a necessary step in the validation process, one with the power to simplify your survey result analysis while making your metrics much more meaningful.

Game changing benefits of research data warehouse