Agreement Measures Statistics


    Kappa accepts its maximum theoretical value of 1 only if the two observers distribute equal codes, i.e. if the corresponding amounts of rows and columns are identical. Everything is less than a perfect match. Nevertheless, the maximum value that kappa could reach in the case of unequal distributions makes it possible to interpret the actually conserved value of kappa. The equation for the maximum κ is:[16] On the surface, these data seem to be analyzed with methods for 2 × 2 tables (if the variable is categorical) or correlation (if numerical) that we discussed earlier in this series. [1,2] However, a closer look would show that this is not true. For these methods, the two measurements performed on each individual refer to different variables (e.g.B. exposure, outcome, height and weight, etc.), while both measurements in compliance studies relate to the same variable (e.g.B. X-rays of the rib cage, evaluated by two radiologists, or hemoglobin, measured using two methods). Since the overall probability of the concordance is Σi πii, the probability of a concordance under the zero hypothesis is equal to Σi+π+i. Also note that Σi πii = 0 does not mean a match and that Σi πii = 1 indicates a perfect match. Kappa statistics are defined in such a way that a larger value implies greater consistency, and weighted kappa allows differences of opinion to be weighed differently[21] and is particularly useful when codes are classified.

    [8]:66 These are three matrices, the matrix of observed scores, the matrix of expected scores as a function of random correspondence and the weight matrix. The weight matrix cells on the diagonal (top left to the bottom right) are conform and therefore contain zeros. Off-diagonal cells contain weightings that indicate the severity of this disagreement. Often, the cells are weighted one of the diagonal with 1, these two of 2 etc. where po is the observed relative correspondence between the evaluators (identical to the accuracy) and pe is the hypothetical probability of a random convergence, the observed data being used to calculate the probabilities of each observer who sees each category by chance. If the evaluators completely match, then κ = 1 {textstyle kappa =1}. If there is no match between the evaluators other than what is expected by chance (as indicated by pe), κ = 0 {textstyle kappa =0}. It is possible that the statistics are negative[6], implying that there is no effective agreement between the two evaluators or that the agreement is worse than chance.

    Dispersal plot with correlation between hemoglobin measurements from two methods for the data presented in Table 3 and Figure 1. The polka dot line is a trend line (line of the smallest squares) through the observed values, and the correlation coefficient is 0.98. However, the individual points are far from the perfect match line (continuous black line) One is often interested in whether measurements of two observers (sometimes more than two) different or two different techniques give similar results. This is called concordance or concordance or reproducibility between measurements. Such an analysis considers the pairs of measurements, either categorical or both numerically, each pair having been made on an individual (or a pathology slide or an X-ray). Cohens Kappa is a unique synthesis index that describes the strength of the Inter-Rater agreement. Some researchers have raised concerns about the tendency of κ to take as data the frequencies of the observed categories, which may make them unreliable for measuring concordance in situations such as the diagnosis of rare diseases. . .