SAS Tutorials
Home

 

New
Quick Reference Guide for SAS

BeSmartNotes (tm)

Click for more info

Order

 

For quick and simple statistical analysis use WINKS SDA
Click for more info

 

 

 

Descriptive Statistics using SAS

PROC UNIVARIATE

See www.stattutorials.com/SASDATA for files mentioned in this tutorial © TexaSoft, 2006

 

These SAS statistics tutorials briefly explain the use and interpretation of standard statistical analysis techniques for Medical, Pharmaceutical, Clinical Trials, Marketing or Scientific Research. The examples include how-to instructions for SAS Software.

 

 

If the PROC MEANS procedure does not produce the statistic you need for a data set then PROC UNIVARIATE may be your choice. Although it is similar to PROC MEANS, its strength is in calculating a wider variety of statistics, specifically useful in examining the distribution of a variable.

 

Use PROC UNIVARIATE to examine the distribution of your data, including an assessment of normality and discovery of outliers.

 

The syntax of the PROC UNIVARIATE statement is:

 

PROC UNIVARIATE <options>; <statements>;

 

Commonly used options for PROC UNIVARIATE include:

 

DATA= - Specifies data set to use

NORMAL  - Produces a test of normality

FREQ – Produces a frequency table

PLOT – Produces stem-and-leaf plot

 

Commonly used statements used with PROC UNIVARIATE include:

 

BY variable list;

VAR variable list;

OUTPUT OUT = datasetname;

 

The BY‑group specification causes UNIVARIATE to calculate statistics separately for groups of observations (i.e., treatment means). The OUTPUT OUT= statement allows you to output the means to a new data set. The following SAS program (PROCUNI1.SAS) produces a large number of statistics on the variable AGE:

 

DATA EXAMPLE;

INPUT TREATMENT LOSS @@;

DATALINES;

;

PROC UNIVARIATE NORMAL PLOT data=example; var age;

HISTOGRAM age/normal (color=red w=5);

TITLE 'PROC UNIVARIATE Example';

FOOTNOTE 'Evaluate distribution of variables';

run;

 

The output from this program follows. The first table gives standardized descriptive statistics (Moments). These statistics allow you to gain an idea of the distribution of data within the variable AGE.

 

Moments

N

50

Sum Weights

50

Mean

10.46

Sum Observations

523

Std Deviation

2.42613323

Variance

5.88612245

Skewness

-0.5119219

Kurtosis

-0.2610615

Uncorrected SS

5759

Corrected SS

288.42

Coeff Variation

23.1943903

Std Error Mean

0.34310705

 

The next table provides basic measures of central tendency and spread.

 

Basic Statistical Measures

Location

Variability

Mean

10.46000

Std Deviation

2.42613

Median

11.00000

Variance

5.88612

Mode

12.00000

Range

11.00000

 

 

Interquartile Range

3.00000

 

The table “Tests for location” provides a test for the null hypothesis that the mean is zero. This can be used for a paired value (paired t-test using Student’s t) to test.

 

Ho: m = 0               (The mean is 0)

Ha: m ≠ 0               (The mean differs from 0)

 

The Sign test and Signed rank tests are nonparametric tests.

 

 

Tests for Location: Mu0=0

Test

Statistic

p Value

Student's t

t

30.48611

Pr > |t|

<.0001

Sign

M

25

Pr >= |M|

<.0001

Signed Rank

S

637.5

Pr >= |S|

<.0001

 

 

The test for normality are one way of assessing whether the distribution of the data appears normally distributed. Four tests for normality are provided:

 

Tests for Normality

Test

Statistic

p Value

Shapiro-Wilk

W

0.958283

Pr < W

0.0753

Kolmogorov-Smirnov

D

0.148067

Pr > D

<0.0100

Cramer-von Mises

W-Sq

0.145762

Pr > W-Sq

0.0259

Anderson-Darling

A-Sq

0.834989

Pr > A-Sq

0.0301

 

Notice that in this case these test differ in outcome (assuming a criteria of 0.05 is strictly followed) with the Shapiro-Wilk test providing evidence that the data are normally distributed (p=0.075) while the others reject this hypothesis.

 

The inclusion of the NORMAL and PLOT statement in

 

PROC UNIVARIATE NORMAL PLOT data=example; var age;

 

provides the test for normality plus a box and whiskers plot and a stem and leaf diagram.

 

Additional output that is useful is visually assessing normality may be created by including one the HISTOGRAM statement as shown below:

 

PROC UNIVARIATE NORMAL PLOT data=example; var age;

HISTOGRAM age/normal (color=red w=5);

 

 

 

 

The superimposed normal plot on the histogram allows you to not only see if the data are approximately normally distributed, it also shows where it may not be fitting normality. In this case, it appears that the plot has more than expected values at the upper end of the range.

 

 

End of tutorial

See http://www.stattutorials.com/SAS

 

 

Get the SAS BeSmartNotes Quick Reference Guide

Order


| Send comments | Back to Tutorial Menu | TexaSoft |

© Copyright TexaSoft, 1996-2007

This site is not affiliated with SAS(r) or SAS Institute