Table 3.2+ Summary of dichotomous, rating scale or partial credit structures

(controlled by STEPT3=, STKEEP=, MRANGE=, MTICK=, DISCRIM=, CMATRIX=)

Table 3.1: Gives summaries for all persons and items.

Table 3.2: Summary of rating categories, probability curves and category (confusion) matrix

The average measures and category fit statistics are how the response structure worked "for this sample" (which might have high or low performers etc.). For each observation in category k, there is a person of measure Bn and an item of measure Di. Then:

average measure = sum( Bn - Di ) / count of observations in category. These are not estimates of parameters.

The probability curves are how the response structure is predicted to work for any future sample, provided it worked satisfactorily for this sample.

Our logic is that if the average measures and fit statistics don't look reasonable for this sample, why should they in any future sample? If they look OK for this sample, then the probability curves tell us about future samples. If they don't look right now, then we can anticipate problems in the future.

Persons and items with extreme scores (maximum possible and minimum possible) are omitted from Table 3.2 because they provide no information about the relative difficulty of the categories. See Table 14.3 for their details

a) For dichotomies,

SUMMARY OF CATEGORY STRUCTURE. Model="R"

FOR GROUPING "0" ACT NUMBER: 12 Go to museum

ACT DIFFICULTY MEASURE OF -1.14 ADDED TO MEASURES

-----------------------------------------------------------------------

|-------------------+------------+------------+-----------------+-----|

| 1 1 13 17| -.45 -.06| .83 .52| 100% 23% .6686| | 1 Neutral

| 2 2 62 83| 1.06 .98| .78 .85| 86% 100% .1725| 1.23| 2 Like

-----------------------------------------------------------------------

|MISSING 51 40| -.54 | | | |

-----------------------------------------------------------------------

OBSERVED AVERAGE is mean of measures in category. It is not a parameter estimate.

M->C = Does Measure imply Category?

C->M = Does Category imply Measure?

ITEM MEASURE OF -1.07 ADDED TO MEASURES

When there is only one item in a grouping (the Partial Credit model), the item measure is added to the reported measures.

CATEGORY LABEL is the number of the category in your data set after scoring/keying.

CATEGORY SCORE is the ordinal value of the category used in computing raw scores - and in Table 20.
MISSING are missing responses.

OBSERVED COUNT and % is the count of occurrences of this category. Counts by data code are given in the distractor Tables, e.g., Table 14.3. % is percentage of non-missing data, except for MISSING% which is of all data.

OBSVD AVERGE is the average of the (person measures - item difficulties) that are modeled to produce the responses observed in the category. This excludes observations in extreme response strings. The average measure is expected to increase with category value. Disordering is marked by "*". This is a description of the sample, not a Rasch parameter. Only observations used for estimating the Andrich thresholds are included in this average (not observations in extreme scores.) For all observations, see Table 14.3 For each observation in category k, there is a person of measure Bn and an item of measure Di. Then: average measure = sum( Bn - Di ) / count of observations in category.

Disordered categories: since extreme scores are theoretically at infinity, their inclusion in the observed-average computation would tend to force the top and bottom categories to be ordered. For evaluating category disordering, we need response strings in which categories can be compared with each other, i.e., non-extreme response strings.

SAMPLE EXPECT is the expected value of the average measure for this sample. These values always advance with category. This is a description of the sample, not a Rasch parameter.

INFIT MNSQ is the average of the INFIT mean-squares associated with the responses in each category. The expected values for all categories are 1.0. = Category infit mean-square = sum over all observations in the category ((category of response - expected value of the response)^2) / sum (model variance of the expected values)

OUTFIT MNSQ is the average of the OUTFIT mean-squares associated with the responses in each category. The expected values for all categories are 1.0. This statistic is sensitive to grossly unexpected responses which are listed in Tables 6.6, 10.6 and similar.

Note: Winsteps reports the MNSQ values in Table 3.2. An approximation to their standardized values can be obtained by using the number of observations in the category as the degrees of freedom, and then looking at the plot.

COHERENCE: the usefulness of M->C and C->M depends on the inferences you intend to make from your data.

M->C shows what percentage of the measures that were expected to produce observations in this category actually did. Do the measures imply the category?

Some users of Winsteps, particularly in medical applications, need to know

1) how accurately patient measure is predicted from single observations. This is C->M.
C->M is 10% means that we can predict the estimated measure from the observed category about 1 time in 10.

2) how accurately patient measure predicts single observations. This is M->C.
M->C is 50% means that we can predict the observed category from the estimated measure half the time.

In general, the wider the categories on the latent variable (more dispersed thresholds), the higher these percents will be.

Example with two observations:

observed category expected score

1 1.1 Yes for category 1 Yes for measure 1

2 1.1 No for category 2 No for measure 1

C->M for category 1 = 100%

C->M for category 2 = 0%

M->C for category 1 = 50%

Guttman's Coefficient of Reproducibility is the count-weighted average of the M->C, i.e., Reproducibility = sum across categories (COUNT * M->C) / sum(COUNT * 100)

C->M shows what percentage of the observations in this category were produced by measures corresponding to the category. Does the category imply the measures?

Computation: For each observation in the XFILE=,

"expected response value" - round this to the nearest category number = expected average category

if "expected average category" = "response value (after scoring and recounting)" then MC = 1, else, MC = 0

Compute average of MC for each observed category across all the relevant data for C->M

Compute average of MC for each expected category across all the relevant data for M->C

RMSR is the root-mean-square residual, summarizing the differences between observations in this category and their expectations (excluding observations in extreme scores).

ESTIM DISCR is an estimate of the local discrimination when the model is parameterized in the form: log-odds = aj (Bn - Di - Fj). Its expected value is 1.0.

RESIDUAL (when shown) is the residual difference between the observed and expected counts of observations in the category. Shown as % of expected, unless observed count is zero. Then residual count is shown. Only shown if residual count is >= 1.0. Indicates lack of convergence, structure anchoring, or large data set.

CATEGORY CODES are shown to the right from on CODES=. The original data code is shown. If several data codes are scored to the same value, only the first data code found in CODES= is shown. Change the reported data code by changing the order of the data codes on CODES= with matching changes in NEWSCORE= and IVALUE=.

CATEGORY LABELS are shown to the extreme right based on CFILE= and CLFILE=. The label from CFILE= is reported, if any. If none, then the label from CLFILE= is reported, if any.

Measures corresponding to the dichotomous categories are not shown, but can be computed using the Table at "What is a Logit?" and LOWADJ= and HIADJ=.

DICHOTOMOUS CURVES

P -+--------------+--------------+--------------+--------------+-

R 1.0 + +

O | |

B |0 1|

A | 000000 111111 |

B .8 + 00000 11111 +

I | 0000 1111 |

L | 0000 1111 |

I | 000 111 |

T .6 + 000 111 +

Y | 000 111 |

.5 + *** +

O | 111 000 |

F .4 + 111 000 +

| 111 000 |

R | 1111 0000 |

E | 1111 0000 |

S .2 + 11111 00000 +

P | 111111 000000 |

O |1 0|

N | |

S .0 + +

E -+--------------+--------------+--------------+--------------+-

-2 -1 0 1 2

KID [MINUS] TAP MEASURE

Dichotomous category probability curves. The curve for "1" is also the model Item Characteristic Curve (ICC), also called the Item Response Function (IRF). Rasch model dichotomous curves are all the same.

b) For rating scales and partial credit items, the structure calibration table lists:

SUMMARY OF CATEGORY STRUCTURE. Model="R"

FOR GROUPING "0" ACT NUMBER: 1 Watch birds

ACT DIFFICULTY MEASURE OF 1.25 ADDED TO JMLE MEASURES

ACT DIFFICULTY MEASURE OF 1.15 ADDED TO CMLE MEASURES

-------------------------------------------------------------------------------

|CATEGORY OBSERVED|OBSVD SAMPLE|INFIT OUTFIT|| JMLE | CMLE |CATEGORY|

|LABEL SCORE COUNT %|AVRGE EXPECT| MNSQ MNSQ||THRESHOLD|THRESHOLD| MEASURE|

|---------------------+------------+------------++---------+---------+--------|

| 0 0 3 4| -1.16 -.63| .73 .73|| NONE | NONE |( -3.83)| 0 Dislike

| 1 1 33 43| -.12 .05| .61 .75|| -3.94 | -3.62 | -1.19 | 1 Neutral

| 2 2 37 49| 1.19 .85| .98 .72|| -.92 | -.74 | 1.90 | 2 Like

| 3 3 1 1| 1.05* 1.70| 1.56 1.90|| 3.65 | 3.65 | 2.70 | 3

| 4 4 0 0| | .00 .00|| NULL | NULL | 3.16 |

| 5 5 2 3| .39* 2.53| 2.50 7.38|| 1.22 | .71 |( 3.74)| 4

-------------------------------------------------------------------------------

OBSERVED AVERAGE is mean of measures in category. It is not a parameter estimate.

Unobserved category. Consider: STKEEP=NO

ITEM MEASURE OF -.89 ADDED TO MEASURES

When there is only one item in a grouping (the Partial Credit model), the item measure is added to the reported measures.

CATEGORY LABEL, the number of the category in your data set after scoring/keying.

CATEGORY SCORE is the value of the category in computing raw scores - and in Table 20.

OBSERVED COUNT and %, the count of occurrences of this category. Counts by data code are given in the distractor Tables, e.g., Table 14.3.

OBSVD AVERGE is the average of the (person measures - item difficulties) that are model led to produce the responses observed in the category. The average measure is expected to increase with category value. Disordering is marked by "*". This is a description of the sample, not the estimate of a parameter. For each observation in category k, there is a person of measure Bn and an item of measure Di. Then: average measure = sum( Bn - Di ) / count of observations in category.

SAMPLE EXPECT is the expected value of the average measure for this sample. These values always advance with category. This is a description of the sample, not a Rasch parameter.

INFIT MNSQ is the average of the INFIT mean-squares associated with the responses in each category. The expected values for all categories are 1.0. Only values greater than 1.5 are problematic. Category infit mean-square = sum over all observations in the category ((category of response - expected value of the response)^2) / sum (model variance of the expected values)

ANDRICH THRESHOLD or JMLE THRESHOLD, the calibrated measure of the transition from the category below to this category. This is an estimate of the Rasch-Andrich model parameter, Fj. Use this for anchoring in Winsteps. (This corresponds to Fj in the Di+Fj parametrization of the "Rating Scale" model, and is similarly applied as the Fij of the (delta) Dij=Di+Fij of the "Partial Credit" model.) The bottom category has no prior transition, and so that the measure is shown as NONE. This parameter, sometimes called the Step Difficulty, Step Calibration, Rasch-Andrich threshold, Tau or Delta, indicates how difficult it is to observe a category, not how difficult it is to perform it. The Rasch-Andrich threshold is expected to increase with category value. Disordering of these estimates (so that they do not ascend in value up the rating scale), sometimes called "disordered deltas", indicates that the category is relatively rarely observed, i.e., occupies a narrow interval on the latent variable, and so may indicate substantive problems with the rating (or partial credit) scale category definitions. These Rasch-Andrich thresholds are relative pair-wise measures of the transitions between categories. They are the points at which adjacent category probability curves intersect. They are not the measures of the categories. See plots below. NULL indicates that this intermediate category is not observed in the data, but is included in the threshold structure. See Category Widths, Thurstone Thresholds and Andrich Thresholds

CMLE THRESHOLD, the calibrated measure of the transition from the category below to this category. This is an estimate of the Rasch-Andrich model parameter, Fj, using CMLE estimation.

CATEGORY MEASURE, the difficulty of the category. Add the item difficulty to the category measure to obtain the difficulty of the category for each item for comparison with the person ability.

The sample-free measure corresponding to this category. ( ) is printed where the matching calibration is infinite. The value shown corresponds to the measure .25 score points (or LOWADJ= and HIADJ=) away from the extreme. This is the best basis for the inference: "ratings averaging x imply measures of y" or "measures of y imply ratings averaging x". This is implied by the Rasch model parameters. These are plotted in Table 2.2 and the Expected Score ICC.

"Category measures" answer the question "If there were a thousand people at the same location on the latent variable and their average rating was the category value, e.g., 2.0, then where would that location be, relative to the item?" This seems to be what people mean when they say "a performance at level 2.0". It is estimated from the Rasch expectation

To discover this ability location, we start with the Rasch model, log (Pnij / Pni(j-1) ) = Bn - Di - Fj, For known Di, Fj and trial Bn. This produces a set of {Pnij}.

Compute the expected rating score: Eni = sum (jPnij) across the categories.

Adjust Bn' = Bn + (desired category - Eni) / (large divisor), until Eni = desired category, when Bn is the desired category measure.

Here is how to apply this formula. The purpose of this formula is to find the person ability measure corresponding to an expected score on an item.

a) Usually Winsteps has done this for us. Look at ISFILE= - the columns labeled "AT CAT", "BOT+0.25", "Top-0.25" show the person abilities, Bn, corresponding to an expected score equal to each category value of the rating scale, or a reasonable proxy for the extreme categories.

b) If you want to apply the formula yourself to find the expected value corresponding to an expected value of, say, 2.3 on an item with a 1-5 rating scale. Here's what to do.

1) Select the item, i, and the expected value, E, that you want.

2) Look at the Winsteps EFILE= output

If the expected value, E, is not there, then ...

c) set Bn = the ability with the Eni closest to the E you want. Note down Eni, the expected value corresponding to Bn in the EFILE=

3) Look up the item difficulty Di in Table 14, and the Andrich thresholds, {Fj}, in Table 3.2

4) Now start to apply the formula:

Bn' = Bn + (desired category - Eni) / (large divisor)

desired category = your value E

Bn and Eni are from step 2)

(large divisor) = 10 - this produces a small increment along the latent variable. Its size doesn't really matter. If this iterative process does not converge, try again with a ten times larger divisor.

We now have Bn'

5) Using Excel or similar, apply the Rasch model with Bn', Di, {Fj}, to obtain Eni' correspond to Bn'

6) If Eni' is not close enough to your desired E, then back to step 4)

with Bn = value of Bn', and Eni = value of Eni'

---------------------------------------------------------------------------------------------------------------

| LABEL MEASURE S.E. | MEASURE| AT CAT. ----ZONE----|PROBABLTY| M->C C->M RMSR |DISCR|RESIDUAL DIFFERENCE|

|------------------------+--------+---------------------+---------+-----------------+-----+-------------------|

| 0 NONE | NONE |( -3.83) -INF -2.82| | 0% 0% 1.0236| | .1% .0 | 0 Dislike

| 1 -2.69 .61 | -2.47 | -1.19 -2.82 .41| -2.74 | 68% 79% .3834| 1.12| .1% .0 | 1 Neutral

| 2 .32 .26 | .41 | 1.90 .41 2.42| .36 | 74% 68% .5298| 1.72| .0% .0 | 2 Like

| 3 4.90 .68 | 4.80 | 2.70 2.42 2.92| 2.85 | 0% 0% 1.3095| .74| -.2% .0 | 3

| 4 NULL | NULL | 3.16 2.92 3.47| 2.94 | 0% 0% .0000| 1.00| .0 |

| 5 2.47 .83 | 1.86 |( 3.74) 3.47 +INF | 2.94 | 0% 0% 3.5260|-12.8| -.8% .0 | 4

---------------------------------------------------------------------------------------------------------------

M->C = Does Measure imply Category?

C->M = Does Category imply Measure?

CATEGORY LABEL, the number of the category in your data set after scoring/keying.

JMLE STRUCTURE MEASURE, is the Rasch-Andrich threshold, the item measure add to the calibrated measure of this transition from the category below to this category. The bottom category has no prior transition, and so that the measure is shown as NONE. The Rasch-Andrich threshold is expected to increase with category value, but these can be disordered. "Dgi + Fgj" locations are plotted in Table 2.4, where "g" refers to the ISGROUPS= assignment. See Rating scale conceptualization.

For structures with only a single polytomous item, the Partial Credit Model, this is is the Rasch model parameter δ ij in the δ ij parametrization of the "Partial Credit" model.)
Item difficulty: Di = average(δ ij) for all thresholds j of item i.
Andrich threshold relative to the item difficulty: Fij = δ ij - Di
Andrich threshold relative to the latent variable: STRUCTURE MEASURE = δ ij = Di + Fij
When we map the location of Di on the latent variable (or inspect the algebra of the PCM model), we discover that Di is at the location on the latent variable where the highest and lowest categories of the item are equally probable. Conveniently, the sufficient statistic for Di is the item's raw score.

STRUCTURE S.E. is an approximate standard error of the Rasch-Andrich threshold measure.

CMLE MEASURE is the CMLE equivalent of the JMLE STRUCTURE MEASURE

SCORE-TO-MEASURE

These values are plotted in Table 21, "Expected Score" ogives. They are useful for quantifying category measures. This is implied by the Rasch model parameters. See Rating scale conceptualization.

AT CAT is the Rasch-full-point-threshold, the measure (on an item of 0 logit measure) corresponding to an expected score equal to the category label, which, for the rating (or partial credit) scale model, is where this category has the highest probability. See plot below.

( ) is printed where the matching calibration is infinite. The value shown corresponds to the measure .25 score points (or LOWADJ= and HIADJ=) away from the extreme.

--ZONE-- is the Rasch-half-point threshold, the range of measures from an expected score from 1/2 score-point below to the category to 1/2 score-point above it, the Rasch-half-point thresholds. Measures in this range (on an item of 0 measure) are expected to be observed, on average, with the category value. See plot below.

50% CUMULATIVE PROBABILITY gives the location of median probabilities, i.e. these are Rasch-Thurstonian thresholds, similar to those estimated in the "Graded Response" or "Proportional odds" models. At these calibrations, the probability of observing the categories below equals the probability of observing the categories equal or above. The .5 or 50% cumulative probability is the point on the variable at which the category interval begins. This is implied by the Rasch model parameters. See Rating scale conceptualization.

COHERENCE

M->C shows what percentage of the measures that were expected to produce observations in this category actually did. Do the measures imply the category?
For RSM, everything is relative to each target item in turn. So, the 69% for category 1 is a summary across the persons with abilities in the range, "zone", from -2.63 to .85 logits relative to each item difficulty, which are all the observations where (-2.63 ≤ (Bn-Di) ≤ .85). Of these, 69% are in category 1. Depending on the targeting of the persons and the items, there could be many eligible observations or only a few.

Guttman's Coefficient of Reproducibility is the count-weighted average of the M->C, i.e., Reproducibility = sum across categories (COUNT * M->C) / sum(COUNT * 100)

C->M shows what percentage of the observations in this category were produced by measures corresponding to the category. Does the category imply the measures?

RMSR is the root-mean-square residual, summarizing the differences between observations in this category and their expectations (excluding observations in extreme scores).

ESTIM DISCR (when DISCRIM=Y) is an estimate of the local discrimination when the model is parameterized in the form: log-odds = aj (Bn - Di - Fj)

OBSERVED - EXPECTED RESIDUAL DIFFERENCE (when shown) is the residual difference between the observed and expected counts of observations in the category. This indicates that the Rasch estimates have not converged to their maximum-likelihood values. These are shown if at least one residual percent >=1%.

residual difference % = (observed count - expected count) * 100 / (expected count)

residual difference value = observed count - expected count

1. Unanchored analyses: These numbers indicate the degree to which the reported estimates have not converged. Usually performing more estimation iterations reduces the numbers.

2. Anchored analyses: These numbers indicate the degree to which the anchor values do not match the current data.

For example,

(a) iteration was stopped early using Ctrl+F or the pull-down menu option.

(b) iteration was stopped when the maximum number of iterations was reached MJMLE=

(d) anchor values (PAFILE=, IAFILE= and/or SAFILE=) are in force which do not allow maximum likelihood estimates to be obtained.

ITEM MEASURE ADDED TO MEASURES, is shown when the rating (or partial credit) scale applies to only one item, e.g., when ISGROUPS=0. Then all measures in these tables are adjusted by the estimated item measure.

CATEGORY PROBABILITIES: MODES - Andrich Thresholds at intersections

P ++---------+---------+---------+---------+---------+---------++

R 1.0 + +

O | |

B |00 22|

A | 0000 2222 |

B .8 + 000 222 +

I | 000 222 |

L | 00 22 |

I | 00 22 |

T .6 + 00 22 +

Y | 00 1111111 22 |

.5 + 0 1111 1111 2 +

O | 1** **1 |

F .4 + 11 00 22 11 +

| 111 00 22 111 |

R | 11 00 22 11 |

E | 111 0*2 111 |

S .2 + 111 22 00 111 +

P | 1111 222 000 1111 |

O |111 2222 0000 111|

N | 2222222 0000000 |

S .0 +22222222222222 00000000000000+

E ++---------+---------+---------+---------+---------+---------++

-3 -2 -1 0 1 2 3

PUPIL [MINUS] ACT MEASURE

Curves showing how probable is the observation of each category for measures relative to the item measure. Ordinarily, 0 logits on the plot corresponds to the item measure, and is the point at which the highest and lowest categories are equally likely to be observed. The plot should look like a range of hills. Categories which never emerge as peaks correspond to disordered Rasch-Andrich thresholds. These contradict the usual interpretation of categories as a being sequence of most likely outcomes.

Null, Zero, Unobserved Categories

STKEEP=YES and Category 2 has no observations:

+------------------------------------------------------------------

|CATEGORY OBSERVED|OBSVD SAMPLE|INFIT OUTFIT|| ANDRICH |CATEGORY|

|LABEL SCORE COUNT %|AVRGE EXPECT| MNSQ MNSQ||THRESHOLD| MEASURE|

|-------------------+------------+------------++---------+--------+

| 0 0 378 20| -.67 -.73| .96 1.16|| NONE |( -2.01)|

| 1 1 620 34| -.11 -.06| .81 .57|| -.89 | -.23 |

| 2 2 0 0| | .00 .00|| NULL | .63 |

| 3 3 852 46| 1.34 1.33| 1.00 1.64|| .89 |( 1.49)|

| 4 20 1| | || NULL | |

+------------------------------------------------------------------

Category 2 is an incidental (sampling)zero. The category is maintained in the response structure.

Category 4 has been dropped from the analysis because it is only observed in extreme scores.

(1) Add one or more dummy person records to the data file with non-extreme scores, but with category 4 for the relevant item or rating scale. Rasch-analysis your data + dummy records, and save the Andrich thresholds: Winsteps "Output Files menu", SFILE=sf.txt

(2) Reanalyze your data. Omit the dummy records. Include SAFILE=sf.txt to anchor the Andrich thresholds at their (1) values.

This does an analysis with reasonable Andrich Thresholds for the items, but only uses the actual data for the sufficient statistics used to estimate the Di and the theta values.

STKEEP=NO and Category 2 has no observations:

+------------------------------------------------------------------

|CATEGORY OBSERVED|OBSVD SAMPLE|INFIT OUTFIT|| ANDRICH |CATEGORY|

|LABEL SCORE COUNT %|AVRGE EXPECT| MNSQ MNSQ||THRESHOLD| MEASURE|

|-------------------+------------+------------++---------+--------+

| 0 0 378 20| -.87 -1.03| 1.08 1.20|| NONE |( -2.07)|

| 1 1 620 34| .13 .33| .85 .69|| -.86 | .00 |

| 3 2 852 46| 2.24 2.16| 1.00 1.47|| .86 |( 2.07)|

+------------------------------------------------------------------

Category 2 is a structural (unobservable) zero. The category is eliminated from the response structure.

Category Matrix : Confusion Matrix : Matching Matrix

-------------------------------------------------------------------------------

| Category Matrix : Confusion Matrix : Matching Matrix |

| Predicted Scored-Category Frequency |

|Obs Cat Freq| 0 1 2 3 | Total |

|------------+-------------------------------------------------+--------------|

| 0 | 12.19 10.22 2.57 .02 | 25.00 |

| 1 | 10.53 13.68 6.67 .11 | 31.00 |

| 2 | 1.93 6.57 8.63 .87 | 18.00 |

| 3 | .33 .52 .15 .00 | 1.00 |

|------------+-------------------------------------------------+--------------|

| Total | 24.98 30.99 18.03 1.01 | 75.00 |

-------------------------------------------------------------------------------

When CMATRIX=Yes, a category matrix (confusion matrix, matching matrix) is produced for each rating scale group defined in ISGROUPS=. In this example, there are four categories, 0, 1, 2, 3 observed in the data for the item. The row totals show the observed frequency of each category. According to the Rasch model, there is a probability that each observation will be in any category. These probabilities are summed for each category, and the model-predicted counts are shown in each column. When the data are strongly deterministic (Guttman-like) , then the major diagonal of the matrix will dominate.

In this matrix, there is one observation of category 3 in the data (see Row Total for category 3). Since category frequencies are sufficient statistics for the maximum likelihood estimates, one observation of category 3 is predicted in these data (see Column Total for category 3). However, the Category Matrix tells us that the observation of 3 is predicted to have been a 1 (Row 3, Column 1). And an observation of category 3 would be predicted to replace an observation of 2 (Row 2, Column 3). We can compare this finding with the report in Table 10.6. The observation of 3 has an expected value near 1.

TABLE 10.6

MOST UNEXPECTED RESPONSES

----------------------------------------------------

|------+--------+--------+--------+--------+--------+

| 3 | 3 | .83 | 2.17 | 3.23 | -2.05 |

Category Misfit

Usually any MNSQ (mean-square, red box in figure) less than 2.0 is OK for practical purposes. A stricter rule would be 1.5. Overfit (values less than 1.0) are almost never a problem.

A bigger problem than category MNSQ misfit is the disordering of the "observed averages" (blue box in Figure). These contradict the Rasch axiom that "higher measure -> higher score on the rating scale".

Also large differences between the "observed average" and the "expected average" (green box in Figure). These indicate that the misfit in the category is systematic in some way.

In principle, an "expected value" is what we would see if the data fit the Rasch model. The "observed" value is what we did see. When the "observed" and "expected" are considerably misaligned, then the validity of the data (as a basis for constructing additive measures) is threatened. However, we can usually take some simple, immediate actions to remedy these defects in the data.

Usually, a 3-stage analysis suffices:

1. Analyze the data. Identify problem areas.

2. Delete problem areas from the data (PDFILE=, IDFILE=, EDFILE=, CUTLO=, etc.). Reanalyze the data. Produce item and rating-scale anchor files (IFILE=if.txt, SFILE=sf.txt) which contain the item difficulties from the good data.

3. Reanalyze all the data using the item and rating-scale anchor files (IAFILE=if.txt, SAFILE=sf.txt) to force the "good" item difficulties to dominate the problematic data when estimating the person abilities from all the data.

Category MnSq fit statistics

For all observations in the data:

Xni is the observed value

Eni is the expected value of Xni

Wni is the model variance of the observation around its expectation

Pnik is the probability of observing Xni=k

Category Outfit statistic for category k:

[sum ((k-Eni)²/Wni) for all Xni=k] / [sum (Pnik * (k-Eni)²/Wni) for all Xni]

Category Infit statistic for category k:

[sum ((k-Eni)²) for all Xni=k] / [sum (Pnik * (k-Eni)²) for all Xni]

Where does category 1 begin?

When describing a rating-scale to our audience, we may want to show the latent variable segmented into rating scale categories:

0-----------------01--------------------12------------------------2

There are 3 widely-used ways to do this:

1. "1" is the segment on the latent variable from where categories "0" and "1" are equally probable to where categories "1" and "2" are equally probably. These are the Rasch-Andrich thresholds (ANDRICH THRESHOLD) for categories 1 and 2.

2. "1" is the segment on the latent variable from where categories "0" and "1+2" are equally probable to where categories "0+1" and "2" are equally probably. These are the Rasch-Thurstonian thresholds (50% CUM. PROBABILITY) for categories 1 and 2.

3. "1" is the segment on the latent variable from where the expected score on the item is "0.5" to where the expected score is 1.5. These are the Rasch-half-point thresholds (ZONE) for category 1.

Alternatively, we may want a point on the latent variable correspond to the category:

----------0-------------------1------------------------2-----------

4. "1" is the point on the latent variable where the expected average score is 1.0. This is the Rasch-Full-Point threshold (AT CAT.) for category 1.

Help for Winsteps Rasch Measurement and Rasch Analysis Software: www.winsteps.com. Author: John Michael Linacre

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn, 2024 George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen
As an Amazon Associate I earn from qualifying purchases. This does not change what you pay.

Coming Rasch-related Events
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

Table 3.2+ Summary of dichotomous, rating scale or partial credit structures

Questions, Suggestions? Want to update Winsteps or Facets? Please email Mike Linacre, author of Winsteps mike@winsteps.com