Inter-Rater Reliability/KAPPA
Cohen’s Kappa coefficient is a method for assessing
the degree of agreement between two raters. The weighted Kappa method is
designed to give partial, although not full credit to raters to get “near” the
right answer, so it should be used only when the degree of agreement can be
quantified.
For example, using an example from Fleiss (1981,
p 213), suppose you have 100 subjects rated by two raters on a psychological
scale that consists of three categories. The data are given below:
|
|
|
RATER A |
|
|
|
|
Psyc. |
Neuro. |
Organic |
|
|
|
|
1 |
2 |
3 |
|
|
Rater |
Psych 1 |
75 |
1 |
4 |
80 |
|
B |
Neuro 2 |
5 |
4 |
1 |
10 |
|
|
Organic 3 |
0 |
0 |
10 |
10 |
|
|
|
80 |
5 |
15 |
100 |
To perform this analysis in SAS open the file
PROCFREQ-KAPPA.SAS as shown here
DATA;
DO
RATER1 =
1
TO
3;
DO
RATER2 =
1
TO
3;
INPUT
WT @@;
OUTPUT;
END;
END;
DATALINES;
75 1 4
5 4 1
0 0 10
;
ODS
RTF;
PROC
FREQ;
WEIGHT
WT;
TABLE
RATER1*RATER2 /
AGREE
;
TEST
WTKAP;
TITLE
'KAPPA EXAMPLE FROM FLEISS';
RUN;
ODS
RTF
CLOSE;
This data statement creates a data set to create
the 3x3 table shown above. The analysis is performed using PROC FREQ.
To get the KAPPA statistics use the ‘/AGREE>”
option. This produces the results for a standard KAPPA analysis. The weighted
KAPPA analysis is requested using the “TEST WTKAP” option.
From the code above the following output is
created.
1. The (Bowker’s) Test of Symmetry tests the hypothesis that that
pij = pji
(marginal homogeneity). If r=c=2 then this is the same as McNemar’s test. If
this test is non-significant, it indicates that the two raters have the same
propensity to select categories. If it significant if means that the raters
are selecting the categories in differing proportions.
|
Test of Symmetry |
|
Statistic (S) |
7.6667 |
|
DF |
3 |
|
Pr > S |
0.0534 |
2. The simple Kappa Coefficient
measures the level of agreement between two raters. When Kappa is large (most
would say .7 or higher) it indicates a strong level of agreement.
|
Simple Kappa Coefficient |
|
Kappa |
0.6765 |
|
ASE |
0.0877 |
|
95% Lower Conf Limit |
0.5046 |
|
95% Upper Conf Limit |
0.8484 |
3. The weighted Kappa method is designed to give partial, although not full
credit to raters to get “near” the right answer, so it should be used only
when the degree of agreement can be quantified – that is, the categories must
be ordinal.
|
Weighted Kappa Coefficient |
|
Weighted Kappa |
0.7222 |
|
ASE |
0.0843 |
|
95% Lower Conf Limit |
0.5570 |
|
95% Upper Conf Limit |
0.8874 |
|
Test of H0: Weighted
Kappa = 0 |
|
ASE under H0 |
0.0879 |
|
Z |
8.2201 |
|
One-sided Pr > Z |
<.0001 |
|
Two-sided Pr > |Z| |
<.0001 |
The Kappa and Weighted Kappa results are displayed, along with 95%
confidence limits. Kappa generally ranges in value from 0 to 1 with a value of
1 meaning perfect agreement. (Negative values are possible.) The higher the
value of Kappa, the better the strength of agreement.
End of tutorial
See
http://www.stattutorials.com/SAS