SAS Tutorials
Home

 

New
Quick Reference Guide for SAS

BeSmartNotes (tm)

Click for more info

Order

 

For quick and simple statistical analysis use WINKS SDA
Click for more info

 

 

 

 

Two-Way Frequency Table Analysis

PROC FREQ, Part 2

See www.stattutorials.com/SASDATA for files mentioned in this tutorial

© TexaSoft, 2006

 

 

Analyzing Two-Way Tables

 

To create a table in PROC FREQ comparing two variables, use the TABLES statement with both variables listed and separated by an asterisk (*). (i.e., A * B), PROC FREQ will produce a crosstabulation table  (also called a two-way table).

 

When you create a two-way crosstabulation, you may want to know the statistics associated with this table. The /CHISQ option in the TABLES statement is used to request that statistics be reported. For example:

 

PROC FREQ; TABLES GENDER*GP/CHISQ;

 

will create a two-way crosstabulation table and will also cause SAS to report a battery of statistics associated with the table.

 

Test Assumptions: For the Chi-square statistic, the observed data are assumed to be counts of qualitative/categorical data such as hair color, presence of a condition (i.e., a disease or not) etc.

 

A crosstabulation table (also sometimes called a contingency table) is formed by counting the number of occurrences in a sample across two grouping variables (specified in TABLES). The number of columns in a table is usually denoted by c and the number of rows by r. Thus, a table is said to have r x c "cells." For example, if in a dominate-hand (left-right) by hair color table, (with 5 hair colors used) the table would be referred to as a 2 x 5 table. Two types of tests are commonly associated with an r x c table. They are the test of independence and the test of homogeneity. The hypotheses for the test of independence are:

 

Ho: The variables are independent (no association between the two variables)

Ha: The variables are not independent

 

Thus, in the “hair” example, the null hypothesis would mean that there is no association between dominant hand and hair color (each hand dominance category has the same distribution of hair color). The alternative hypothesis would mean that left and right-handed people have difference distributions of hair color -- perhaps left-handed people are more likely to be brunette.

 

Another test that can be performed for a contingency table is a test of homogeneity. In this case, the table is built of data from two populations and tests whether the populations come from the same distribution. In this case the hypotheses are:

 

Ho: The populations are homogeneous.

Ha: The populations are not homogeneous.

 

Rows (or columns) represent data from different populations, and the other variable represents data observed on the population. The c2 (Chi-square) test of homogeneity or independence is reported (the tests are mathematically equivalent.)  Also included in the output is a likelihood ratio chi-square, Mantel-Hantzel chi-square, phi, contingency coefficient, and Cramer’s V.  For a 2*2 table, a Fisher’s exact test is also performed. 

 

For example, you could create a two-by-two table of GENDER by GP by using the following statements from the SOMEDATA data set (PROCFREQ4.SAS):

 

* ASSUMES YOU HAVE A SAS LIBRARY NAMED MYDATA;

ODS RTF;

PROC FREQ DATA=MYDATA.SOMEDATA;

     TABLES GENDER*GP/CHISQ;

TITLE 'Chi Square Analysis of a Contingency Table';

RUN;

* RUN IT AGIN, REQUESTING EXPECTED VALUES;

PROC FREQ DATA=MYDATA.SOMEDATA;

     TABLES GENDER*GP/CHISQ EXPECTED NOROW NOCOL NOPERCENT;

RUN;

ODS RTF CLOSE;

 

The output for the first two-way table in this job (in part) follows:

 

                                    

Table of GENDER by GP

GENDER

GP(Intervention Group)

Total

Frequency
Percent
Row Pct
Col Pct

A

B

C

Female

6
12.00
20.00
54.55

16
32.00
53.33
55.17

8
16.00
26.67
80.00

30
60.00


 

Male

5
10.00
25.00
45.45

13
26.00
65.00
44.83

2
4.00
10.00
20.00

20
40.00


 

Total

11
22.00

29
58.00

10
20.00

50
100.00

 

 

The four numbers in each cell are the frequency, the total percent, percent by row and percent by column. The statistic for this table are given in the next table:

 

Statistics for Table of GENDER by GP

Statistic

DF

Value

Prob

Chi-Square

2

2.0846

0.3526

Likelihood Ratio Chi-Square

2

2.2433

0.3257

Mantel-Haenszel Chi-Square

1

1.3157

0.2514

Phi Coefficient

 

0.2042

 

Contingency Coefficient

 

0.2001

 

Cramer's V

 

0.2042

 

WARNING: 33% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.

Sample Size = 50

 

The Chi-Square value is 2.08 with p=.3526. This provides evidence to not reject the null hypothesis – thus you would conclude that there is no relationship between gender and group. However, notice the warning at the bottom of the table. It tells you that 33% of the cells have expected values of 5 or less, which may make the Chi-Square test invalid. To check this out you look at the version of the table you requested in the second PROC FREQ – this one which requested that the expected values be included in the analysis using

 

TABLES GENDER*GP/CHISQ EXPECTED NOROW NOCOL NOPERCENT;

 

Table of GENDER by GP

GENDER

GP(Intervention Group)

Total

Frequency
Expected

A

B

C

Female

6
6.6

16
17.4

8
6

30

 

Male

5
4.4

13
11.6

2
4

20

 

Total

11

29

10

50

 

The TABLES statement also requested that ROW, COLUMN and total PERCENTS be excluded from the table.  From the resulting table you can see that two of the cells have expected values less than 5 (4.4 and 4).  Viewing the expected values can also help you understand why a Chi-Square statistic is significant by observing which observed values  depart most from expected values.

 

EXERCISE: Add FISHERS to the TABLES statement to get Fishers Exact statistic.

 

TABLES GENDER*GP/CHISQ FISHERS EXPECTED NOROW NOCOL NOPERCENT;

 

Fisher’s Exact test is often preferred over the Chi-Square when the numbers in the table are small or when the table contains expected values less than 5 (as is true in this example.)

 

 

Creating a Contingency Table from Summarized Data

 

If your data are already summarized into counts, you can use the programming features of SAS to create a dataset appropriate for the analysis. (PROCFREQ5.SAS)

 

The 2x2 table contains the values 12,15,18, and 3:

 

12

15

18

3

 

In the following SAS code, the DO LOOP statements are used to enter this data into a dataset in the proper format for the PROC FREQ statement.

 

DATA;

   DO A = 1 TO 2;

       DO B = 1 TO 2;

          INPUT WT @@;

          OUTPUT;

       END;

   END;

DATALINES;

12 15

18 3

;

ODS RTF;

PROC FREQ;

   WEIGHT WT;

   TABLES A*B /CHISQ;

   TITLE 'CHI-SQUARE ANALYSIS FOR A 2X2 TABLE';

RUN;

ODS RTF CLOSE;

 

The output for this program  follows. The basic table is the same as in the previous example. The Chi-Square statistic is 8.58 (1 df) and p=0.0034. From this evidence you would reject the null hypothesis and conclude that the observations for variable B are influenced by A. For example, looking at the row percentages for A=1, notice that B goes up from 44% to 56%. Whereas when A=2, B goes down from 86% to 14% -- the pattern of B is different across categories of A.

 

In the 2x2 case, SAS automatically also includes Fisher’s Exact Test. Most commonly, the two-sided Fishers p-value (p=.006) would be reported. Fisher’s is often preferred over the Chi-Square when the numbers in the table are small or when the table contains expected values less than 5.

    

The output for these test are given below:

 

Table of A by B

A

B

Total

Frequency
Percent
Row Pct
Col Pct

1

2

1

12
25.00
44.44
40.00

15
31.25
55.56
83.33

27
56.25


 

2

18
37.50
85.71
60.00

3
6.25
14.29
16.67

21
43.75


 

Total

30
62.50

18
37.50

48
100.00

 

 

Statistics for Table of A by B

 

Statistic

DF

Value

Prob

Chi-Square

1

8.5841

0.0034

Likelihood Ratio Chi-Square

1

9.1893

0.0024

Continuity Adj. Chi-Square

1

6.9136

0.0086

Mantel-Haenszel Chi-Square

1

8.4053

0.0037

Phi Coefficient

 

-0.4229

 

Contingency Coefficient

 

0.3895

 

Cramer's V

 

-0.4229

 

 

Fisher's Exact Test

Cell (1,1) Frequency (F)

12

Left-sided Pr <= F

0.0036

Right-sided Pr >= F

0.9996

 

 

Table Probability (P)

0.0032

Two-sided Pr <= P

0.0061


 

 

EXERCISE: Include RELRISK as an option in the TABLE statement:

 

TABLES A*B /CHISQ RELRISK;

 

This yields these additional statistics:

 

 

Estimates of the Relative Risk (Row1/Row2)

Type of Study

Value

95% Confidence Limits

Case-Control (Odds Ratio)

0.1333

0.0316

0.5621

Cohort (Col1 Risk)

0.5185

0.3285

0.8184

Cohort (Col2 Risk)

3.8889

1.2937

11.6902

 

 

It is important to note that the Odds Ratio is based on Row1/Row2. If you switch rows, the Chi-Square statistics are all the same, but the Odds Ratio is the inverse. (1/.1333 = 7.5).

 

  

End of tutorial

See http://www.stattutorials.com/SAS

 

 

Get the SAS BeSmartNotes Quick Reference Guide

Order


| Send comments | Back to Tutorial Menu | TexaSoft |

© Copyright TexaSoft, 1996-2007

This site is not affiliated with SAS(r) or SAS Institute