Reliability - separation - strata |
(Separation) Reliability and Strata
These are reporting "reliably different". These are the opposite of inter-rater reliability statistics that are intended to report "reliably the same."
The reported "Separation" Reliability is the Rasch equivalent of the KR-20 or Cronbach Alpha "test reliability" statistic, i.e., the ratio of "True variance" to "Observed variance" for the elements of the facet. This shows how reproducible is the ordering of the measures. This may or may not indicate how "good" the test is in other respects. High (near 1.0) person and item reliabilities are preferred. This "separation" reliability is somewhat the opposite of an interrater reliability, so low (near 0.0) judge and rater separation reliabilities are preferred.
Since the "true" variance of a sample can never be known, but only approximated, the "true" reliability can also only be approximated. All reported reliabilities, such as KR-20, Cronbach Alpha, and the Separation Reliability etc. are only approximations. These approximations are all attempts to compute:
"Separation" Reliability = True Variance / Observed Variance
Facets computes upper and lower boundary values for the region in which the true reliability lies. When SE=Model, the upper boundary, the "Model" reliability, is computed on the basis that all unexpectedness in the data is Rasch-predicted randomness.
When SE=Real, The lower boundary, the "Real" reliability is computed on the basis that all unexpectedness in the data contradicts the Rasch model. The unknowable True reliability generally lies somewhere between these two. As contradictory sources of noise are remove from the data, the reported Model and Real reliabilities become closer, and the True Reliability approaches the Model Reliability.
The "model" reliability is based on the model standard errors, which are computed on the basis that all superfluous unexpectedness in the data is the randomness predicted by the Rasch model.
The "real" reliability is based on the hypothesis that superfluous randomness in the data contradicts the Rasch model:
Real S.E. = Model S.E. * sqrt(Max(INFIT MnSq, 1))
Conventionally, only a Person Reliability is reported and called the "test reliability". Facets reports separation reliabilities for all facets. Separation reliability is estimated based on the premise that the elements are locally independent. Specifically that raters are acting as "independent experts", not as "scoring machines". But when the raters act as "scoring machines", then Facets overestimates reliability. It would be the same as running MCQ bubble sheets twice through an optical scanner, so doubling the amount of "items" per person, and then claiming that we had increased test reliability! To assist in identifying this situation, Facets reports to what extent the raters are acting as "independent experts", as aspect of inter-rater reliability, see Table 7 Agreement Statistics.
Separation = True S.D. / Average measurement error
This estimates the number of statistically distinguishable levels of performance in a normally distributed sample with the same "true S.D." as the empirical sample, when the tails of the normal distribution are modeled as due to measurement error. www.rasch.org/rmt/rmt94n.htm
Strata = (4*Separation + 1)/3
This estimates the number of statistically distinguishable levels of performance in a normally distributed sample with the same "true S.D." as the empirical sample, when the tails of the normal distribution are modeled as extreme "true" levels of performance. www.rasch.org/rmt/rmt163f.htm
So, is sample separation is 2, then strata are (4*2+1)/3 = 3.
Separation = 2: The test is able to statistically distinguish between high and low performers.
Strata = 3: The test is able to statistically distinguish between very high, middle and very low performers.
Strata vs. Separation: this depends on the nature of the measure distribution.
Statistically:
If it is hypothesized to be normal, then separation.
If it is hypothesized to be heavy-tailed, then strata.
Substantively:
If very high and very low scores are probably due to accidental circumstances, then separation.
If very high and very low scores are probably due to very high and very low abilities, then strata.
If in doubt, assume that outliers are accidental, and use separation.
Example: I have 3 criteria in my analysis. Facets reports 32 Strata.
Explanation: "Strata" is a conceptual number, based on a hypothetical normal distribution of the criteria, with the same mean and S.D. as the observed criteria. Each of the infinity of criteria in the hypothetical distribution has the same precision (S.E.) as the average S.E. of the observed criteria. The result is that there are 32 statistically different levels of difficulty in the hypothetical distribution. The large number is because the S.E. of an observed criterion is small due to the large number of observations of each criterion.
Help for Facets (64-bit) Rasch Measurement and Rasch Analysis Software: www.winsteps.com Author: John Michael Linacre.
Facets Rasch measurement software.
Buy for $149. & site licenses.
Freeware student/evaluation Minifac download Winsteps Rasch measurement software. Buy for $149. & site licenses. Freeware student/evaluation Ministep download |
---|
Forum: | Rasch Measurement Forum to discuss any Rasch-related topic |
---|
Questions, Suggestions? Want to update Winsteps or Facets? Please email Mike Linacre, author of Winsteps mike@winsteps.com |
---|
State-of-the-art : single-user and site licenses : free student/evaluation versions : download immediately : instructional PDFs : user forum : assistance by email : bugs fixed fast : free update eligibility : backwards compatible : money back if not satisfied Rasch, Winsteps, Facets online Tutorials |
---|
Our current URL is www.winsteps.com
Winsteps® is a registered trademark