Table 7 Agreement Statistics

This is for 32-bit Facets 3.87. Here is Help for 64-bit Facets 4

When inter-rater= is used to specify a rater facet, then a count of the situations in which ratings are given in identical circumstances by different raters is made.

 

If exact inter-rater statistics are required, please do a special run of Facets in which all unwanted facets are Xed out, so that matching only occurs on facets relevant to agreement. For instance, if "rater gender" is irrelevant to agreement, then X out that facet in the Models= specifications.

 

The percent of times those ratings are identical is reported, along with its expected value. This supports an investigation as to whether raters are rating as "independent experts" or as "rating machines". The report is:

 

Table 7.3.1  Reader Measurement Report  (arranged by MN).

------------------------------------------------------------------------------------------------

| Obsvd   Obsvd  Obsvd  Fair-M|        Model | Infit      Outfit   | Exact Agree. |            |

| Score   Count Average Avrage|Measure  S.E. |MnSq ZStd  MnSq ZStd | Obs %  Exp % | Nu Reader  |

------------------------------------------------------------------------------------------------

|   1524    288     5.3   5.26|   -.30   .05 | 1.2   2    1.2   2  |  28.2   20.9 |  8 8       |

|   1455    288     5.1   5.00|   -.16   .05 |  .5  -7     .5  -7  |  30.8   21.6 |  4 4       |

....

------------------------------------------------------------------------------------------------

RMSE (Model)  .05 Adj S.D.  .19  Separation  4.02  Strata  5.69  Reliability  .94

......

Inter-Rater agreement opportunities: 60480  Exact agreements: 17838 = 29.5%  Expected: 13063.2 = 21.6%

------------------------------------------------------------------------------------------------

 

Exact Agree. is exact agreements under identical rating conditions. Agreement on qualitative levels relative to the lowest observed qualitative level.

So, imagine all your ratings are 4,5,6 and all my ratings are 1,2,3.

If we use the (shared) Rating Scale model. Then we will have no exact agreements.

But if we use the (individual) Partial Credit model, #, then we agree when you rate a 4 (your bottom observed category) and I rate a 1 (my bottom observed category). Similarly, your 5 agrees with my 2, and your 6 agrees with my 3.

If you want "exact agreement" to mean "exact agreement of data values", then please use the Rating Scale model statistics.

 

Obs % = Observed % of exact agreements between raters on ratings under identical conditions.

Exp % = Expected % of exact agreements between raters on ratings under identical conditions, based on Rasch measures.
If Obs % ≈  Exp % then the raters may be behaving like independent experts.
If Obs % » Exp % then the raters may be behaving like "rating machines".

 

Here is the computation for "Expected Agreement %". We pair up another rater with the target rater who rated the same ratee on the same item of the same task of the same ......, so the raters rated the same performance under identical circumstance.

 

Then, for each rater we have an observed rating. They agree or not. The percentage of times raters agree with the target rater is the "Observed Agreement%"

 

For each rater we also have an (average) expected rating based on the Rasch measures. The (average) expected ratings will not agree unless the raters have the same leniency/severity measure.

 

But we also have the Rasch-model-based probabilities for each category of the rating scale for each rater. Suppose this is a 1,2,3 (3-category) rating scale.

 

Rater A

Rater B  

Expected agreement between Raters A and B

(assuming they are rating independently)

probability of category 1 = 10%

probability of category 2 = 40%

probability of category 3 = 50%

probability of category 1 = 20%

probability of category 2 = 60%

probability of category 3 = 20%

Category 1 10%*20% = 2%

Category 2 40%*60% = 24%

Category 3 50%*20% = 10%

Expected agreement in any category = 2+24+10% = 36%

 

This expected-agreement computation is performed over all pairs of raters and averaged to obtain the reported "Expected Agreement %".

 

Higher than expected agreement indicates statistical local dependence among the raters. This biases all the standard errors towards zero. An approximate guideline is:
"True" Standard error = "Reported Standard Error" * Maximum( 1, sqrt (Exact agreements / Expected)) for all elements.

In this example, the inflator for the S.E.'s of all elements of all facets approximates sqrt( 17838/13063.2) = 1.17.

 

Alternatively, deflate the reported person-facet reliability, R, in accordance with the extent to which the raters are not independent. Based on the Spearman-Brown prophecy formula, an approximation is:
T = (100 - observed exact agreement%) / (100 - expected exact agreement%)
deflated reliability = T * R / ( (1-R) + T * R)

 

Example: 100 raters with a wide range of rater severity/leniency:

 

Exact agreements

781=18.8%

Expected

577.5=13.9%

 

With this large spread of rater severities, the prediction is that only 13.9% of the observations will show the raters giving the same rating under the same conditions. This accords with the wide range of severities.

There is somewhat more agreement than this in the data, 18.8%. This is typical of the psychology of rater behavior. We are conditioned from baby-hood to agree with what we conceive to be the expectations of others. This behavior continues even for expert raters. Subconsciously they continue to have a mental pressure to agree with the expectations of others. In this case, that pressure has increased observed agreement from 13.9% to 18.8%.

Whether you report this depends on the purpose for your paper. If it is an investigation into rater behavior, then this provides empirical evidence for a psychological conjecture. If your paper is a validity study of the instrument, then this aspect is probably too obscure to be meaningful for your audience.

 

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3900052/ "many texts recommend 80% agreement as the minimum acceptable interrater agreement."

 

See more at Inter-rater Reliability and Inter-rater correlations

 

 


Help for Facets Rasch Measurement and Rasch Analysis Software: www.winsteps.com Author: John Michael Linacre.
 

Facets Rasch measurement software. Buy for $149. & site licenses. Freeware student/evaluation Minifac download
Winsteps Rasch measurement software. Buy for $149. & site licenses. Freeware student/evaluation Ministep download

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn, 2024 George Engelhard, Jr. & Jue Wang Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes Statistical Analyses for Language Testers (Facets), Rita Green Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind Rasch Measurement: Applications, Khine Winsteps Tutorials - free
Facets Tutorials - free
Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse Rasch Measurement Theory Analysis in R, Wind, Hua Applying the Rasch Model in Social Sciences Using R, Lamprianou El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Rasch Models for Measurement, David Andrich Constructing Measures, Mark Wilson Best Test Design - free, Wright & Stone
Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias Diseño de Mejores Pruebas - free, Spanish Best Test Design A Course in Rasch Measurement Theory, Andrich, Marais Rasch Models in Health, Christensen, Kreiner, Mesba Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen
As an Amazon Associate I earn from qualifying purchases. This does not change what you pay.

facebook Forum: Rasch Measurement Forum to discuss any Rasch-related topic

To receive News Emails about Winsteps and Facets by subscribing to the Winsteps.com email list,
enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from Winsteps.com
The Winsteps.com email list is only used to email information about Winsteps, Facets and associated Rasch Measurement activities. Your email address is not shared with third-parties. Every email sent from the list includes the option to unsubscribe.

Questions, Suggestions? Want to update Winsteps or Facets? Please email Mike Linacre, author of Winsteps mike@winsteps.com


State-of-the-art : single-user and site licenses : free student/evaluation versions : download immediately : instructional PDFs : user forum : assistance by email : bugs fixed fast : free update eligibility : backwards compatible : money back if not satisfied
 
Rasch, Winsteps, Facets online Tutorials

Our current URL is www.winsteps.com

Winsteps® is a registered trademark