The following links provide some information on how to get started using a data file. You should also look at our pages on SAS, Stata and SPSS.
- Research Data Tutorials - A series of tutorials on the use and processing of data.
- The Quartz Guide to Bad Data - An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.
- Research data management guidance - Guidance and support for managing, sharing and preserving research data.
- Depositing Data Even if you are not planning on deposting data to ICPSR, their guidelines on preparing and documenting data are a must-read for any researcher.
- How to Use a Codebook You can't use data without a codebook. This tells you how to use the codebook.
- Teaching and Learning Data ICPSR offers a broad range of learning tools and courses. These tools help undergraduates acquire basic skills in quantitative data analysis, support teaching faculty with tools for the classroom, and provide basic and advanced training in social science methods through the internationally-renowned Summer Program.
- Data Documentation Initiative an international effort to establish a standard for technical documentation describing social science data.
Downloading Data From The Web
Different websites provide data in several different formats - ASCII, SAS, SPSS, Stata and others. Not all data are available in all formats, though, so you need to choose which best suits your needs. Sometimes, you will find that the data you want are not available in the format you want. Don't worry about this, your data can always be converted. Here are some tips:
- Data and related files are often bundled together and compressed into what are often called a "zip" file. Common file extensions for these are ".zip" on Windows and ".gz" or ".tar" on Unix. You will need to "unzip" the files before you can do anything else with them. WinZip is a good Windows program, and the "gunzip" command can be used on Unix.
- If there is an ASCII (i.e., plain text) data set with a program file for the statistical package you intend to use, then select that option. Sometimes there is an option for data files already in the format you want ("system," "portable," "transport"), but these may have some "glitches" due to differences in the type of machine they were created on and the type you are using. It's rare, but it does happen.
- If there is not an ASCII data set and setup file in the package you want to use, but there is one for another package, then use the other package to create a system file and then convert it. For example, if you like to use SPSS, but there is only an option for SAS, use SAS to read and create a SAS data file, then convert the SAS data file to SPSS using StatTransfer.
- If you are downloading data from a geospatial data site, the file may be in "Dbase" format and have an extension of ".dbf". This is the format used by ArcView (a "shape" file is actually a set of files, one or more of which is a .dbf file). These files can be read directly into SAS, Stata, SPSS and Excel.
- The setup files are written to read the entire data set, which you may not need. Rather than editing the program to read only the variables/observations you want, let the program read the entire data set, then just add drop and/or keep statements in the appropriate place to retain what you want. Make absolutely sure that you select all identification and weighting variables. If you are not sure if you want a particular variable, keep it. It's easier to ignore or drop a variable later than it is to go back and add it to your dataset.
- Sometimes the programs have large sections "commented out" so those statements are not executed. If you do want these statements to be executed, then be sure to un-comment them. Typically, these are statements to convert missing value codes (such as "999") to system-missing codes.
- If possible, run some descriptive statistics on your data and compare them to the codebook or some other source to make sure you have read the data correctly.