One-Way ANOVA using SAS
PROC ANOVA & PROC GLM
See
www.stattutorials.com/SASDATA
for files mentioned in this tutorial
These SAS statistics tutorials briefly explain the use and
interpretation of standard statistical analysis techniques for Medical,
Pharmaceutical, Clinical Trials, Marketing or Scientific Research. The examples
include how-to instructions for SAS Software.
A one-way
analysis of variance is an extension of the independent group t‑test where
there are more than two groups.
Assumptions:
It is assumed that subjects are randomly assigned to one of 3 or more groups
and that the data within each group are normally distributed with equal
variances across groups. Sample sizes between groups do not have to be equal,
but large differences in sample sizes for the groups may affect the outcome of
some multiple comparisons tests.
Test: The
hypotheses for the comparison of independent groups are: (k is the number of
groups)
Ho:
m1
= m2
... = mk
(means of the all groups are equal)
Ha:
mi
¹ mj
(means of the two or more groups
are not equal)
The test
statistic reported is an F test with k‑1 and N‑k degrees of freedom, where N
is the number of subjects. A low p‑value for the F-test is evidence to reject
the null hypothesis. In other words, there is evidence that at least one pair
of means are not equal. For example, suppose you are interested in comparing
WEIGHT (gain) across the 4 levels of a GROUP variable, to determine if weight
gain of individuals across groups is significantly different.
The following SAS
code can perform the test:
PROC ANOVA
DATA=ANOVA;
CLASS
GROUP;
MODEL
WEIGHT=GROUP;
TITLE
'Compare WEIGHT across GROUPS';
RUN;
GROUP is the
"CLASS" or grouping variable (containing four levels), and WEIGHT is the
continuous variable, whose means across groups are to be compared. The MODEL
statement can be thought of as
DEPENDENT VARIABLE = INDEPENDENT VARIABLE(S);
where the
DEPENDENT variable is the "response" variable, or one you measured, and the
independent variable(s) is the observed data. The model statement generally
indicated that given the information on the right side of the equal sign you
can predict something about the value of the information on the left side of
the equal sign. (Under the null hypothesis there is no relationship.)
Since the
rejection of the null hypothesis does not specifically tell you which means
are different, a multiple comparison test is often performed following
a significant finding in the One‑Way ANOVA. To request multiple comparisons in
PROC ANOVA, include a MEANS statement with a multiple comparison option. The
syntax for this statement is
MEANS SOCIO /testname;
where testname is
a multiple comparison test. Some of the tests available in SAS include:
BON
- Performs Bonferroni t-tests of differences
DUNCAN
-
Duncan’s
multiple range test
SCHEFFE
- Scheffe multiple comparison procedure
SNK
- Student Newman Keuls multiple range test
LSD - Fisher’s Least Significant Difference test
TUKEY
- Tukey’s studentized range test
DUNNETT
(‘x’) - Dunnett’s test – compare to a single control
You may also
specify
ALPHA = p
- selects level of significance for comparisons (default is 0.05)
For example, to
select the TUKEY test, you would use the statement
MEANS GROUP /TUKEY;
Graphical
comparison: A graphical comparison allows you to visually see the
distribution of the groups. If the p‑value is low, chances are there will be
little overlap between the two or more groups. If the p‑value is not low,
there will be a fair amount of overlap between all of the groups. A simple
graph for this analysis can be created using the PROC PLOT or PROC GPLOT
procedure. For example:
PROC GPLOT; PLOT
GROUP*WEIGHT;
will produce a
plot showing WEIGHT by group.
Thus, the code
for the complete analysis becomes:
PROC ANOVA;
CLASS
GROUP;
MODEL
WEIGHT=GROUP;
MEANS GROUP /TUKEY;
TITLE
'Compare WEIGHT across GROUPS';
PROC GPLOT; PLOT
GROUP*WEIGHT;
RUN;
Following is a
SAS job that performs a one-way ANOVA and produces a plot.
Suppose you are
comparing the time to relief of three headache medicines -- brands 1, 2, and
3. The time to relief data is reported in minutes. For this experiment, 15
subjects were randomly placed on one of the three medicines. Which medicine
(if any) is the most effective? The data for this example are as follows:
Brand 1
Brand 2 Brand 3
24.5
28.4 26.1
23.5
34.2 28.3
26.4
29.5 24.3
27.1
32.2 26.2
29.9
30.1 27.8
Notice that SAS
expects the data to be entered as two variables, a group and an observation.
Here is the SAS
code to analyze these data. (AANOVA EXAMPLE2.SAS)
DATA
ACHE;
INPUT
BRAND RELIEF;
CARDS;
1 24.5
1 23.5
1 26.4
1 27.1
1 29.9
2 28.4
2 34.2
2 29.5
2 32.2
2 30.1
3 26.1
3 28.3
3 24.3
3 26.2
3 27.8
;
ODS
RTF;ODS
LISTING
CLOSE;
PROC
ANOVA
DATA=ACHE;
CLASS
BRAND;
MODEL
RELIEF=BRAND;
MEANS
BRAND/TUKEY
CLDIFF;
TITLE
'COMPARE RELIEF ACROSS MEDICINES - ANOVA EXAMPLE';
PROC
GPLOT;
PLOT
RELIEF*BRAND;
PROC
BOXPLOT;
PLOT
RELIEF*BRAND;
TITLE
'ANOVA RESULTS';
RUN;
QUIT;
ODS
RTF
close;
ODS
LISTING;
Following is the
(partial) output for the headache relief study:
ANOVA Procedureu
Dependent
Variable: Relief
|
Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
|
Model |
2 |
66.7720000 |
33.3860000 |
7.14 |
0.0091 |
|
Error |
12 |
56.1280000 |
4.6773333 |
|
|
|
Corrected Total |
14 |
122.9000000 |
|
|
|
|
R-Square |
Coeff Var |
Root MSE |
RELIEF Mean |
|
0.543303 |
7.751664 |
2.162714 |
27.90000 |
|
Source |
DF |
Anova SS |
Mean Square |
F Value |
Pr > F |
|
BRAND |
2 |
66.77200000 |
33.38600000 |
7.14 |
0.0091 |
uThe initial table in this
listing is the Analysis of Variance Table. The most important line to observe
in this table is the “Model.” At the right of this line is the p-value for the
overall ANOVA test. It is listed as “Pr > F” and is p = 0.0091. This tests the
overall model to determine if there is a difference in means between BRANDS.
In this case, since the p-value is small, you can conclude that there is
evidence that there is a statistically significant difference in brands.
v Now that you know that there
are differences in BRAND, you need to determine where the differences lie. In
this case, that comparison is performed by the Tukey Studentized Range
comparison (at the alpha = 0.05 level). See the tables below.
The Tukey
Grouping table displays those differences. Notice the grouping labels “A” and
“B” in this table. There is only one mean associated with the “A” group, and
that is brand 2. This indicates that the mean for brand 2 is significantly
larger than the means of all other groups. There are two means associated with
the “B” group – brands 1 and 3. Since these two means are grouped, it tells
you that they were not found to be significantly different.
Tukey's Studentized Range (HSD) Test for RELIEFv
|
Alpha |
0.05 |
|
Error Degrees of Freedom |
12 |
|
Error Mean Square |
4.677333 |
|
Critical Value of
Studentized Range |
3.77278 |
|
Minimum Significant Difference |
3.649 |
|
Means with the same
letter are not significantly different. |
|
Tukey Grouping |
Mean |
N |
BRAND |
|
A |
30.880 |
5 |
2 |
|
|
|
|
|
|
B |
26.540 |
5 |
3 |
|
B |
|
|
|
|
B |
26.280 |
5 |
1 |
Thus, the Tukey
comparison concludes that the mean for brand 2 is significantly higher than
the means of brands 1 and 3, and that there is no significant difference
between brands 1 and 3. Another way to express the differences is to use the
CLDIFF option with TUKEY (same results, difference presentation). For example
MEANS
BRAND/TUKEY
CLDIFF;
Using this option
produces this versions of a comparison table:
|
Comparisons significant
at the 0.05 level are indicated by ***. |
|
BRAND
Comparison |
Difference
Between
Means |
Simultaneous 95%
Confidence Limits |
|
|
2 - 3 |
4.340 |
0.691 |
7.989 |
*** |
|
2 - 1 |
4.600 |
0.951 |
8.249 |
*** |
|
3 - 2 |
-4.340 |
-7.989 |
-0.691 |
*** |
|
3 - 1 |
0.260 |
-3.389 |
3.909 |
|
|
1 - 2 |
-4.600 |
-8.249 |
-0.951 |
*** |
|
1 - 3 |
-0.260 |
-3.909 |
3.389 |
|
Visual Comparisons: Two graphs of BRAND by RELIEF
shows you the distribution of relief across brands, which visually confirms
the ANOVA results. The first is a “dot” plot given by the PROC GPLOT command
and shows each data point by group. The second plot is a box and whiskers plot
created with PROC BOXPLOT. Note than Brand 2 relief results tend to be longer
(higher values) than the levels for brands 1 and 3.


Hands-on
exercise:
Modify the PROC ANOVA
program to perform Scheffe, LSD and Dunnett’s test using the following code
and compare results.
MEANS BRAND/SCHEFFE;
MEANS BRAND/LSD;
MEANS BRAND/DUNNETT ('1');
One-Way ANOVA using GLM
PROC GLM will produce
essentially the same results as PROC ANOVA with the addition of a few more
options. For example, your can include an OUTPUT statement and output
residuals that can then be examined. (PROCGLM1.SAS)
ODS RTF; ODS GRAPHICS ON;
PROC
GLM
DATA=ACHE;
CLASS
BRAND;
MODEL
RELIEF=BRAND;
MEANS
BRAND/TUKEY
CLDIFF;
OUTPUT
OUT=FITDATA
P=YHAT R=RESID;
* Now plot the residuals;
PROC
GPLOT;
plot
resid*BRAND;
plot
resid*yhat;
run;
ODS RTF CLOSE;
ODS GRAPHICS OFF;
Notice also the statements ODS GRAPHICS ON and ODS GRAPHIS OFF. This produces
better looking plots than we were able to get using PROC GPLOT in conjunction
with PROC ANOVA. This produces the more detailed box and whiskers plot as show
here:

However, there are still a couple of other plots that might be of interest.
These are requested using the code
PROC
GPLOT;
plot
resid*BRAND;
plot
resid*yhat;
run;
The resulting plots (below) are an analysis of the residuals. The first plot
residuals by brand. Typically, you want the residuals to be randomly scattered
by group (which looks okay in this plot)

The second plot looks at
residual by YHAT (the estimated RELIEF). You can see three estimates –
related to the three brands. For each estimate the residuals are randomly
distributed.

End of
tutorial
See
http://www.stattutorials.com/SAS