|
Attribute Agreement AnalysisBetween Appraisers - Fleiss' Kappa Statistics |
You can assess the consistency of ratings between appraisers.
If kappa = 1, then there is perfect agreement. If kappa = 0, then agreement is the same as would be expected by chance. The higher the value of kappa, the stronger the agreement between appraisers. Negative values occur when agreement is weaker than expected by chance, but this rarely happens. Depending on the application, kappa less than 0.7 indicates that your measurement system needs improvement. Kappa values greater than 0.9 are considered excellent.
Compare the kappa statistics for each Response and Overall. Are appraisers having difficulty with a particular response?
For the fabric data, all kappa statistics for all responses are greater than 0.7. The consistency between the appraisers' ratings is within acceptable limits.
Use the p-values to choose between two opposing hypotheses, based on your sample data:
The p-value provides the likelihood of obtaining your sample, with its particular kappa statistic, if the null hypothesis (H0) is true. If the p-value is less than or equal to a predetermined level of significance (a-level), then you reject the null hypothesis and claim support for the alternative hypothesis.
Note |
The between-appraiser statistics do not compare the appraisers' ratings to the standard. Although the ratings between appraisers' may be consistent, they are not necessarily correct. |
Example Output |
Fleiss’ Kappa Statistics
Response Kappa SE Kappa Z P(vs > 0) 1 0.974356 0.0345033 28.2395 0.0000 2 0.934060 0.0345033 27.0716 0.0000 3 0.884560 0.0345033 25.6370 0.0000 4 0.911754 0.0345033 26.4251 0.0000 5 0.973542 0.0345033 28.2159 0.0000 Overall 0.937151 0.0173844 53.9076 0.0000 |
Interpretation |
For the fabric data, with a = 0.05, for all responses, p = 0.0000, so you can reject the null hypothesis. The between-appraiser agreement is significantly different than what would be due to chance.