SAS Tutorials Tutorial Home

 

SAS Tutorials
Tutorial Home


 
Descriptive Statistics using SAS

PROC MEANS

See www.stattutorials.com/SASDATA for files mentioned in this tutorial
© TexaSoft, 2007

These SAS statistics tutorials briefly explain the use and interpretation of standard statistical analysis techniques for Medical, Pharmaceutical, Clinical Trials, Marketing or Scientific Research. The examples include how-to instructions for SAS Software.

Preliminary information about PROC MEANS

PROC MEANS produces descriptive statistics (means, standard deviation, minimum,

maximum, etc.) for numeric variables in a set of data. PROC MEANS can be used for

·       Describing continuous data where the average has meaning

·       Describing the means across groups

·       Searching for possible outliers or incorrectly coded values

·       Performing a single sample t-test

 The syntax of the PROC MEANS statement is:

PROC MEANS <options>; <statements>;

Statistical options that may be requested are: (default statistics are underlined.)

·        N - Number of observations

·        NMISS - Number of missing observations

·        MEAN - Arithmetic average)

·        STD -  Standard Deviation

·        MIN -  Minimum (smallest)

·        MAX -  Maximum (largest)

·        RANGE - Range

·        SUM -  Sum of observations

·        VAR -  Variance    

·        USS – Uncorr. sum of squares

·        CSS -  Corr. sum of squares

·        STDERR - Standard Error

·        T - Student’s t value for testing Ho: md = 0

·        PRT - P-value associated with t-test above

·        SUMWGT - Sum of the WEIGHT variable values    

(New to version 8.0)

 

·        MEDIAN – 50th percentile

·        P1 – 1st percentile

·        P5 - 5th percentile

·        P10 – 10th percentile

·        P90 - 90th percentile

·        P95 – 95th percentile

·        P99 - 99th percentile

·        Q1 - 1st quartile

·        Q3 - 3rd quartile

·        QRANGE – Quartile range

 

 

Other commonly used options available in PROC MEANS include:

  • DATA= Specify data set to use
  • NOPRINT Do not print output
  • MAXDEC=n Use n decimal places to print output

Commonly used statements with PROC MEANS include:

  • BY variable list -- Statistics are reported for groups in separate tables
  • CLASS variable list – Statistics reported by groups in a single table
  • VAR variable list – specifies which numeric variables to use
  • OUTPUT OUT = datasetname – statistics will be output to a SAS data file
  • FREQ variable - specifies a variable that represents a count of observations

A few quick examples of PROC MEANS

* Simplest invocation – on all numeric variables *;

PROC MEANS;

*Specified statistics and variables *;

PROC MEANS N MEAN STD; VAR SODIUM CARBO;

* Subgroup descriptive statistics using by statement*;

PROC SORT; BY SEX;

PROC MEANS; BY SEX;

VAR FAT PROTEIN SODIUM;

* Subgroup descriptive statistics using class statement*;

PROC MEANS; CLASS SEX;

VAR FAT PROTEIN SODIUM;

 

Example 1: A simple use of PROC MEANS

This example calculates the means of several specified variables, limiting the output to

two decimal places. (PROCMEANS1.SAS)

 

***************************************************************

* Data on weight, height, and age of a random sample of 12 *

* nutritionally deficient children *

***************************************************************;

DATA CHILDREN;

INPUT WEIGHT HEIGHT AGE;

DATALINES;

64 57 8

71 59 10

53 49 6

67 62 11

55 51 8

58 50 8

77 55 10

57 48 9

56 42 10

51 42 6

76 61 12

68 57 9

;

ODS RTF;

proc means;

Title 'Example 1a - PROC MEANS, simplest use';

run;

proc means maxdec=2;var WEIGHT HEIGHT;

Title 'Example 1b - PROC MEANS, limit decimals, specify

variables';

run;

proc means maxdec=2 n mean stderr median;var WEIGHT HEIGHT;

Title 'Example 1c – PROC MEANS, specify statistics to report';

run;

ODS RTF CLOSE;

Output for Example 1:

Example 1a - PROC MEANS, simplest use

Variable

N

Mean

Std Dev

Minimum

Maximum

WEIGHT
HEIGHT
AGE

12
12
12

62.7500000
52.7500000
8.9166667

8.9861004
6.8240884
1.8319554

51.0000000
42.0000000
6.0000000

77.0000000
62.0000000
12.0000000

Example 1b - PROC MEANS, limit decimals, specify variables 

Variable

N

Mean

Std Dev

Minimum

Maximum

WEIGHT
HEIGHT

12
12

62.75
52.75

8.99
6.82

51.00
42.00

77.00
62.00

Example 1c – PROC MEANS, specify statistics to report 

Variable

N

Mean

Std Error

Median

WEIGHT
HEIGHT

12
12

62.75
52.75

2.59
1.97

61.00
53.00

 

Example 2: Using PROC MEANS using “By

Group” and Class statements

 

This example uses PROC MEANS to calculate means for an entire data set or by a

grouping variables. (PROCMEANS2.SAS):

 

***************************************************

* Example 2 for PROC MEANS                        *

***************************************************;

DATA FERTILIZER;

INPUT FEEDTYPE WEIGHTGAIN;

DATALINES;

1 46.20

1 55.60

1 53.30

1 44.80

1 55.40

1 56.00

1 48.90

2 51.30

2 52.40

2 54.60

2 52.20

2 64.30

2 55.00

;

ODS RTF;

PROC SORT DATA=FERTILIZER;BY FEEDTYPE;

PROC MEANS; VAR WEIGHTGAIN; BY FEEDTYPE;

TITLE 'Summary statistics by group';

RUN;

PROC MEANS; VAR WEIGHTGAIN; CLASS FEEDTYPE;

TITLE 'Summary statistics USING CLASS';

RUN;

ODS RTF CLOSE;

 Output for this SAS code is:

Summary Statistics by Group

FEEDTYPE=1

Analysis Variable : WEIGHTGAIN

N

Mean

Std Dev

Minimum

Maximum

7

51.4571429

4.7475808

44.8000000

56.0000000

FEEDTYPE=2

Analysis Variable : WEIGHTGAIN

N

Mean

Std Dev

Minimum

Maximum

6

54.9666667

4.7944412

51.3000000

64.3000000

 

In this first version of the output the BY statement (along with the PROC SORT) creates

two tables, one for each value of the BY variable. In this next example, the CLASS

statement produces a single table broken down by group (FEEDTYPE.)

 

Summary statistics USING CLASS

 

Analysis Variable : WEIGHTGAIN

FEEDTYPE

N Obs

N

Mean

Std Dev

Minimum

Maximum

1

7

7

51.4571429

4.7475808

44.8000000

56.0000000

2

6

6

54.9666667

4.7944412

51.3000000

64.3000000

Hands on Exercise: 

1. Modify the above program to output the following statistics:

 

N MEAN MEDIAN MIN MAX

 

2. Use MAXDEC=2 to limit number of decimals in output.

 

EXAMPLE 3: Using PROC MEANS to find

OUTLIERS

 

PROC MEANS is a quick way to find large or small values in your data set that may be

considered outliers (see PROC UNIVARIATE also.) This example shows the results of

using PROC means where the MINIMUM and MAXIMUM identify unusual values in

the data set. (PROCMEANS3.SAS)

 

DATA WEIGHT;

INPUT TREATMENT LOSS @@;

DATALINES;

2 1.0 1 3.0 1 -1.0 1 1.5 1 0.5 1 3.5 1 -99

2 4.5 3 6.0 2 3.5 2 7.5 2 7.0 2 6.0 2 5.5

1 1.5 3 -2.5 3 -0.5 3 1.0 3 .5 3 78 1 .6 2 3 2 4 3 9 1 7 2 2

;

ODS RTF;

PROC MEAN; VAR LOSS;

TITLE 'Find largest and smallest values';

RUN;

ODS RTF CLOSE;

 

Notice that in this output, PROC means indicates that there is a small value of -99 (could

be a missing value code) and a large value of 78 (could be a miscoded number.) This is a

quick way to find outliers in your data set.

Analysis Variable : LOSS

N

Mean

Std Dev

Minimum

Maximum

26

2.0423077

25.4650062

-99.0000000

78.0000000

 

EXAMPLE 4: Using PROC MEANS to perform a

single sample t-test (or Paired t-test)

 

To compare two paired groups (such as in a before-after situation) where both

observations are taken from the same or matched subjects, you can perform a paired t-test

using PROC MEANS. To do this convert the paired data into a difference variable and

perform a single sample t-test. For example, suppose your data contained the variables

WBEFORE and WAFTER, (before and after weight on a diet), for 8 subjects. To perform

a paired t-test using PROC MEANS, follow these steps:

 

  1. Read in your data.
  2. Calculate the difference between the two observations (WLOSS is the amount of weight lost), and
  3. Report the mean loss, t-statistic and p-value using PROC MEANS.

The hypotheses for this test are:

 

Ho: μLoss = 0 (The average weight loss was 0)

Ha: μLoss ≠ 0 (The weight loss was different than 0)

 

For example, the following code performs a paired t-test for weight loss data:

(PROCMEANS4.SAS)

 

DATA WEIGHT;

INPUT WBEFORE WAFTER;

* Calculate WLOSS in the DATA step *;

WLOSS=WAFTER-WBEFORE;

DATALINES;

200 190

175 154

188 176

198 193

197 198

310 240

245 204

202 178

;

ODS RTF;

PROC MEANS N MEAN T PRT; VAR WLOSS;

TITLE 'Paired t-test example using PROC MEANS';

RUN;

ODS RTF CLOSE;

 

Notice that the actual test is performed on the new variable called WLOSS, and that is

why it is the only variable requested in the PROC MEANS statement. This is essentially

a one-sample t-test. The statistics of interest are the mean of WLOSS, the t-statistic

associated with the null hypothesis for WLOSS and the p-value. The SAS output is as

follows:

Paired t-test example using PROC MEANS 

Analysis Variable : WLOSS

N

Mean

t Value

Pr > |t|

8

-22.7500000

-2.79

0.0270

 

The mean of the variable WLOSS is –22.75. The t-statistic associated with the null

hypothesis is –2.79, and the p-value for this paired t-test is p = 0.027, which provides

evidence to reject the null hypothesis.

 

EXAMPLE 5: Using PROC MEANS to output

statistics (advanced)

 

Suppose you have a data set and you want to add a column containing a z-statistic based

on the mean and standard deviation of a variable. Here is one way to do that.

The following data set contains weights of 12 children. You want to add a column of the

difference of the scores from the mean based on a the information in the WEIGHT

variable. For good measure also calculate the z-score.

 

DATA WT;

INPUT WEIGHT;

DATALINES;

64

71

53

67

55

58

77

57

56

51

76

68

;

PROC MEANS NOPRINT DATA=WT;VAR WEIGHT;OUTPUT OUT=WTMEANS

MEAN=WTMEAN STDDEV=WTSD;

RUN;

DATA WTDIFF;SET WT;

IF _N_=1 THEN SET WTMEANS;

DIFF=WEIGHT-WTMEAN;

Z=DIFF/WTSD; * CREATES STANDARDIZED SCORE (Z-SCORE);

RUN;

ODS RTF;

PROC PRINT DATA= WTDIFF;VAR WEIGHT DIFF Z;

RUN;

ODS RTF CLOSE;

 

The statement

OUTPUT OUT=WTMEANS MEAN=WTMEAN STDDEV=WTSD;

Creates a SAS data file containing a single record with variables WTMEAN and WTSD

(and some other system variables.) You can then use that information to calculate the

desired values, as is done in the code:

 

DATA WTDIFF;SET WT;

IF _N_=1 THEN SET WTMEANS;

DIFF=WEIGHT-WTMEAN;

Z=DIFF/WTSD; * CREATES STANDARDIZED SCORE (Z-SCORE);

RUN;

 

The first SET statement (SET WT) reads in the entire WT data set. The statement

IF _N_=1 THEN SET WTMEANS;

Reads in the first (and only) record from the WTMEANS data set and merges the

WTDIFF and WTSD (and a couple of other system variables) into the new WTDIFF data

set, allowing you to do the calculations to come up with the DIFF and Z values.

 

The resulting data set contains the following information

 

Obs

WEIGHT

DIFF

Z

1

64

1.25

0.13910

2

71

8.25

0.91808

3

53

-9.75

-1.08501

4

67

4.25

0.47295

5

55

-7.75

-0.86244

6

58

-4.75