Table 7 Reliability and Chi-square statistics

Table 7 also provides summary statistics by facet.

+------------------------------------------------------------------------------------------ ----------------------+

|-------------------------------+--------------+---------------------+------+-------------+ +---------------------|

.....

|-------------------------------+--------------+---------------------+------+-------------+ +---------------------|

| 460.8 96.0 4.8 4.73| .00 .08 | 1.00 -.1 .99 -.2| | .61 | | Mean (Cnt: 12) |

| 29.5 .0 .3 .32| .19 .00 | .23 1.8 .22 1.7| | .05 | | S.D. (Population) |

| 30.8 .0 .3 .33| .20 .00 | .24 1.9 .23 1.8| | .06 | | S.D. (Sample) |

+------------------------------------------------------------------------------------------ ----------------------+

Model, Populn: RMSE .08 Adj (True) S.D. .17 Separation 2.17 Strata 3.22 Reliability (not inter-rater) .82

Model, Sample: RMSE .08 Adj (True) S.D. .18 Separation 2.28 Strata 3.38 Reliability (not inter-rater) .84

Model, Fixed (all same) chi-squared: 66.3 d.f.: 11 significance (probability): .00

Model, Random (normal) chi-squared: 9.4 d.f.: 10 significance (probability): .49

Inter-Rater agreement opportunities: 384 Exact agreements: 108 = 28.1% Expected: 82.6 = 21.5%

With extremes, Model, Populn: RMSE 1.05 Adj (True) S.D. 1.98 Separation 1.88 Strata 2.84 Reliability .78

With extremes, Model, Sample: RMSE 1.05 Adj (True) S.D. 2.01 Separation 1.91 Strata 2.89 Reliability .79

Without extremes, Model, Populn: RMSE 1.02 Adj (True) S.D. 1.71 Separation 1.68 Strata 2.57 Reliability .74

Without extremes, Model, Sample: RMSE 1.02 Adj (True) S.D. 1.75 Separation 1.71 Strata 2.62 Reliability .75

With extremes, Model, Fixed (all same) chi-squared: 175.9 d.f.: 34 significance (probability): .00

With extremes, Model, Random (normal) chi-squared: 33.8 d.f.: 33 significance (probability): .43

In summary:

"model" = "the unexpectedness in this facet is considered to be the randomness predicted by the Rasch model"

"population" = the elements in this facet are the entire population of possible elements

"sample" = the elements in this facet are a sample from the entire population of possible elements

"fixed" = we are testing the hypothesis "all the elements of this facet have statistically the same measure"

"random" = we are testing the hypothesis "all the elements of this facet are a random sample from a normally-distributed population"

RMSE = root-mean-square-error: it is the statistical average of the standard errors of the measures. It reports the overall precision of the measurement of the elements in the facet. RMSE is heavily influenced by the count of observations of each element.

Mean =	arithmetic average
Count =	number of elements reported
S.D. (Populn)	is the standard deviation when this sample comprises the entire population. If the element list includes every possible element for the facet: use the Population statistics, e.g., grade levels, genders (sexes), ...
S.D. (Sample)	is the standard deviation when this sample is a random sample from the population. If there are "more like this" elements in addition to the current elements: use the Sample statistics, e.g., candidates, items (usually), tasks, ....
With extremes	including elements with extreme (zero and perfect, minimum possible and maximum possible) scores
Without extremes	excluding elements with extreme (zero and perfect, minimum possible and maximum possible) scores
Model	Estimated as though all noise in the data is due to model-predicted stochasticity (i.e., the best-case situation for randomness in the data)
Real	Estimated as though all unpredicted noise is contradicting model expectations (i.e., the worst-case situation
RMSE	root mean square standard error (i.e., the average S.E. statistically) for all non-extreme measures.
Adj (True) S.D.	"true" sample standard deviation of the estimates after adjusting for measurement error
Separation	Adj "true" S.D. / RMSE, a measure of the spread of the estimates relative to their precision. The signal-to-noise ratio is the "true" variance/error variance = Separation². See also Separation.
Strata	(4*Separation + 1)/3, a measure of the spread of the estimates relative to their precisions, when extreme measures are assumed to represent extreme "true" abilities. See also Strata
Reliability (not inter-rater)	Spearman reliability: Rasch-measure-based equivalent to the KR-20 or Cronbach Alpha raw-score-based statistic, i.e., the ratio of "True variance" to "Observed variance" (Spearman 1904, 1911). This shows how different the measures are, which may or may not indicate how "good" the test is. High (near 1.0) person and item reliabilities are preferred. This reliability is somewhat the opposite of an interrater reliability, so low (near 0.0) judge and rater reliabilities are preferred. See also Reliability.
Fixed (all same) chi-squared:	A test of the "fixed effect" hypothesis: "Can this set of elements be regarded as sharing the same measure after allowing for measurement error?" The chi-squared value and degrees of freedom (d.f.) are shown. The significance is the probability that this "fixed" hypothesis is the case. Depending on the sub-Table, this tests the hypothesis: "Can these items be thought of as equally difficult?" The precise statistical formulation is: wi = 1/SE²i for i=1,L, where L is the number of items, and Di is the difficulty/easiness of item i. chi-squared = Sum(wi.D²i) - Sum( wi.Di)²/ Sum(wi) with d.f. = L-1 Or this tests the hypothesis: "Can these raters be thought of as equally lenient?" Is there a statistically significant rater effect? The precise statistical formulation is: wj = 1/SE²j for j=1,J, where J is the number of raters, and Cj is the leniency/severity of rater j. chi-squared = Sum(wj.C²j) - Sum( wj.Cj)²/ Sum(wj) with d.f. = J-1 And so on ....
Random (normal) chi-squared:	A test of the "random effects" hypothesis: "Can this set of elements be regarded as a random sample from a normal distribution?" The significance is the probability that this "random" hypothesis is the case. This tests the hypothesis: "Can these persons (items, raters, etc.) be thought of as sampled at random from a normally distributed population?" The precise statistical formulation is: var(D) = S(Di-Dmean)²/(L-1) - ( SSE²i)/L wi = 1/(var(D)+SE²i) chi-squared = S(wi.D²i) - ( Swi.Di)²/ Swi with d.f. = L-2
Rater agreement opportunities	when Inter-rater= facet-number. see Table 7 Agreement statistics

Help for Facets Rasch Measurement and Rasch Analysis Software: www.winsteps.com Author: John Michael Linacre.

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn, 2024 George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen
As an Amazon Associate I earn from qualifying purchases. This does not change what you pay.

Coming Rasch-related Events
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

Table 7 Reliability and Chi-squared Statistics

Questions, Suggestions? Want to update Winsteps or Facets? Please email Mike Linacre, author of Winsteps mike@winsteps.com