Data Preparation for Crosstab Software

Preparing data for crosstab softwareCrosstab software provides an easy way to review your data. What’s not as easy, however, is prepping your data for crosstab software in the first place. While crosstab software platforms have advanced to make the process as simple and streamlined as possible, there is still a lot of work to be done to prepare your data for analysis. 

Data Format 

For most crosstab software programs, your first order of business is to ensure your data is in a format your software supports. Some of the most common market research data formats include: 

·         SPSS: Statistical Package for the Social Science (SPSS) is a highly popular statistical package that can execute complex manipulation and analysis of data. File extensions include *.sav and *.por.

·         SAS: Statistical Analysis System (SAS) is a software suite used for various types of data analysis and management. File extensions include *.sas7bcat, *.sas#bcat and *.xpt.

·         Excel: Microsoft Excel is a well-known spreadsheet file format that uses what is known as the Binary Interchange File Format (BIFF). The file extension is *.xls.

·         Triple-S: The Triple-S format uses two ASCII text files to transfer key elements of entire surveys between various survey software packages. File extensions for ASCII files include *.txt and *.dat.

If your data is in a compressed file, which is indicated by extensions that include *.zip, *.rar or *.gz, you’ll need to extract your data files with file compression software before formatting appropriately.

Data Preparation 

Once your data is in the proper format, you still need to explore and prepare the data before you perform any type of analysis. Understanding basic terminology helps to understand the steps involved, and that can be best done with an example. 

id

var1

var2

var3

1

3.7

11.9

Male

2

2.8

6.5

Male

3

8.4

33.1

Female

 

·         var: var1 through var3 are a collection of values, in this case three values represented in the columns beneath each variable. Variables refer to identifiable pieces of data that contain one or more values, with values represented by numbers or text that can be converted into numbers.

·         id: “id” refers to the identifier for each observation. In the above case, the identifier is people, but it could also be states, countries or another identifying aspect. 

Once you understand the terminology, a series of steps outlined by Princeton University’s Data and Statistical Services (DSS) can lead you through final data preparation process prior to statistical analysis. 

1.      Ensure your variables are in columns and your observations are in rows. The above example has properly placed variables and observations, with the observations referring to variables connected to each identifier.

2.      Ensure you have all the variables you need. Missing data leads to erroneous results.

3.      If data is missing, make sure it is indicated by a black space or a dot (‘.’).

4.      Ensure you have at least one id. The above example uses people as the id.

5.      If you are analyzing data over a set length of time, ensure you have included all the years you want represented in your study.

One final tip is to always backup a copy of your original dataset, an absolute must to avoid potential catastrophe due to computer glitches or human error.

Properly preparing your data is the only way to ensure accurate results with your crosstab software, and accurate results are critical for any type of analysis.

6-Steps-to-Understand-Survey-Data-whitepaper