SPSS Statistics Tutorials
 
 

SPSS Tutorials
Home

New
Quick Reference Guides for SPSS, SAS and WINKS
BeSmartNotes (tm) Order

For quick and simple statistical analysis use WINKS SDA
Click for more info


  Determine if data are normally distributed using SPSS


See www.stattutorials.com/SPSSDATA for files mentioned in this tutorial TexaSoft, 2008

These SPSS statistics tutorials briefly explain the use and interpretation of standard statistical analysis techniques for Medical, Pharmaceutical, Clinical Trials, Marketing or Scientific Research. The examples include how-to instructions for SPSS Software.

Assessing Data Normality (Gaussian, Bell Shaped Curve)

    

Hypothetical data from several branch banks in Southern California contain information on how many IRAs (Individual Retirement Accounts) were set up in 21 locations during a three month period. The variable is called IRA Setup. These data are counts and are appropriately classified as quantitative data since, for example, it makes sense to calculate a mean number of accounts per bank. Before calculating and reporting the mean or other parametric measures of these values you may want to assess the normality of the data. One way to do that is to perform a statistical test. This example illustrates how you can assess the normality of the IRA setup variable.

Step 1: Open the dataset IRA.SAV. Here are the first few records:

Spss Normality Data

Step 2: Select Analyze/Descriptive Statistics/Explore Select the IRASetup variable for the dependent list.

SPSS Normal 2

Click continue and OK. And the following output is displayed (abbreviated here).

Case Processing Summary

 

Cases

Valid

Missing

Total

N

Percent

N

Percent

N

Percent

IRA Setup

19

100.0%

0

.0%

19

100.0%

The Kolmogorov-Smirnov and Shapiro-Wilk tests can be used to test the hypothesis that the distribution is normal. (SPSS recommends these tests only when your sample size is less than 50.) The hypotheses used in testing data normality are:

Ho: The distribution of the data is normal
Ha: The distribution of the data is not normal

            If a test does not reject normality, this suggests that a parametric procedure that assumes normality (e.g. a t-test) can be safely used. However, we emphasize again that it is always a good idea to examine data graphically in addition to the formal tests for normality.
The plots in the output provide  a visual description of the distribution of the data. They include

    • Histogram: When a histogram’s shape approximates a bell-curve it suggests that the data may have come for a normal population.
    • Boxplot: A boxplot that is symmetric with the median line in approximately the center of the box and with symmetric whiskers somewhat longer than the subsections of the center box suggests that the data may have come for a normal distribution.
    • Q-Q plot: A quantile-quantile (q-q) plot is a graph used to display the degree to which the quantiles of a reference (known) distribution (in this case the normal distribution) differ from the sample quantiles of the data. When the data fit the reference distribution, then the points will lie in a tight random scatter around the reference line. For the IRA data, the curvature of the points in the plot indicates a possible departure from normality and the point lying outside the overall pattern of points indicates an outlier.

Stem-and-leaf plot: A stem and leaf plot is a method of displaying data that shows the data in a histogram-like pattern but retains information about actual data values. Each observation is broken down into a stem and a leaf where typically the stem of the number includes all but the last digit and the leaf is the last digit. Here are two of those plots:

SPSS Normality Histogram

SPSS Normailty Q-Q Plot

In both plots, there is a single value that appears to be considerably different. One term used to describe such as point is an “outlier.” This happens to be observation number 5 in the data set.

Step 3: To eliminate the outlying value (IRASetup >= 5), return to the data editor and select Data/Select Cases… and select the option “If condition is satisfied…” Click on the “If…” button. In the formula text box enter the expression

IRA Setup > = 5

 Click Continue and OK. A slash appears in the IRA data file next to record 16 indicating that record will not be included in subsequent analyses (as shown here.) (This is not to imply that you can arbitrarily exclude data from an analysis.)

SPSS Normaity Select Data

Step 4: To display the revised histogram select Graphs/Histogram and select IRA Setup >5 (FILTER) as the analysis variable. Select the “Display normal curve” checkbox and click OK. A histogram is displayed. Double click on the graph and from the Graph Editor select Element/ Show Distribution curve. This places a normal (beel shaped/Gaussian) curve on the graph. Exit the Graph editor. The following graph is displayed.

SPSS Normal Histogram

Note that the plot no longer has the “outlier.” Also, check the normality tests, and see that both are now non-significant, which implies acceptance of the hypothesis of normality.

Step 6: To remove the select cases criterion, return to the data editor and select Data/Select Cases… and select the option “All cases” and OK.

  See www.stattutorials.com/SPSSDATA for files mentioned in this tutorial TexaSoft, 2008

End of tutorial

See http://www.stattutorials.com/SPSS

 

Stat book coverAlso, we recommend this book: Statistical Analysis Quick Reference Guidebook: With SPSS Examples is a practical "cut to the chase" handbook that quickly explains the when, where, and how of statistical data analysis as it is used for real-world decision-making in a wide variety of disciplines. In this one-stop reference, the authors provide succinct guidelines for performing an analysis, avoiding pitfalls, interpreting results, and reporting outcomes.
Paperback. Sage Publishers
ISBN: 1412925606
Order book from Amazon

 

WINKS Numbers

 

 

Get the SPSS BeSmartNotes Quick Reference Guide

Order


| Send comments | Back to Tutorial Menu | TexaSoft |

Copyright TexaSoft, 1996-2008

This site is not affiliated with SPSS(r)