Subsets and connection ambiguities

You see: Warning: Data are ambiguously connected into 6 subsets. Measures may not be comparable across subsets.

Quick (but arbitrary) solution:

Let's assume your data are scored 0 or 1. Then:

a. if subsets of items have the same average difficulty

Add two dummy persons with scored response strings:

10101010....

01010101....

b. if subsets of persons have the same average ability

Add two dummy items (columns) with scored response strings:

Subsets: are these because different persons were administered non-overlapping subsets of items?

If so, can we assume that each subset of items is equally difficult on average?

If so, we only need 2 dummy persons, because the subsetting is independent of the sample size.

The easiest two dummy persons have response strings for all the items:

101010....

010101...

So the two dummy persons will have about the same ability and about the same success rate on all the subsets of items. They will tie the subsets together.

The estimation may converge faster if you have more pairs of dummy persons, but then they will start to reduce the spread of the item difficulties in each subset.

More detail: add to the data file two dummy person records so that all persons and items become directly comparable.

Dichotomous data:

Dummy person 1: responses: 010101010...

Dummy person 2: responses: 101010101...

This says: "the middle level of performance for all subsets of persons is the same."

Rating scale data, where "1" is the lowest category, and "5" is the highest category:

Dummy person 1: responses: 1212121212...

Dummy person 2: responses: 2121212121...

This says: "the bottom level of performance for all subsets of persons is the same."

If you are concerned the dummy persons or items will skew your statistics, give them very small weights: PWEIGHT= or IWEIGHT=

Explanation: Connectivity (or subsetting) is a concern in any data analysis involving missing data. In general,

nested data are not connected.

fully-crossed data (also called "complete data") are connected.

partially-crossed data may or may not be connected.

Winsteps examines the responses strings for all the persons. It verifies that every non-extreme response string is linked into one network of success and failure on the items. Similarly, the strings of responses to the items are linked into one network of success and failure by the persons.

If person response string A has a success on item 1 and a failure on item 2, and response string B has a failure on item 1 and a success on item 2, then A and B are connected. This examination is repeated for all pairs of response strings and all pairs of items. Gradually all the persons are connected with all the other persons, and all the items are connected with all the other items. But it some persons or some items cannot be connected in this way, then Winsteps reports a "connectivity" problem, and reports which subsets of items and persons are connected.

Mathematics: connectivity is part of Graph Theory. The person/item/judge/... parameters of the Rasch model are the vertices and the observations are the edges. In an undirected graph, we need every vertex to be connected directly or indirectly to every other vertex. A connection is established between two vertices when one vertex is observed to have both a higher observation and a lower observation than another vertex in the same context, or when both both vertices have the same intermediate category of a rating scale in the same context.

Thus there are two situation for failure to connect:

1) there is no direct or indirect link between two vertices, e.g., two different datasets analyzed together with no common parameters. This is detected by the Winsteps/Facets subset routine.

2) the vertices are connected by observations, but the observations do not meet the requirements, e.g., all the person respond to all the items, but half the persons score in the upper half of the rating scale on every item, and the other half of the persons score in the lower half of the rating scale on every item. This is called a "Guttman split" in the data. This is usually obvious in the reported estimates as a big gap on the Wright maps between the two halves of the person distribution.

Example 1: Connection problems and subsets in the data are shown in this dataset. It is Examsubs.txt.

Title = "Example of subset reporting"

Name1 = 1

Namelength = 24 ; include response string in person label

Item1 = 13

NI = 12

CODES = 0123 ; x is missing data

ISGROUPS = DDDDDDDDDDRR ; items 1-10 are dichotomies; items 11-12 share a rating scale

MUCON = 3 ; Subsetting can cause very slow convergence

TFILE=*

18.1

14.1

0.4

&End

01 Subset 1

02 Subset 1

03 Subset 2

04 Subset 2

05 Subset 7

06 Subset 4

07 Subset 4

08 Subset 5

09 Subset 5

10 Subset 5

11 Subset 6

12 Subset 6

END LABELS

01 Extreme 11111

02 Subset 1 01111

03 Subset 1 10111

04 Subset 2 00101

05 Subset 2 00011

06 Subset 3 011

07 Subset 3 011

08 Subset 4 001

09 Subset 4 010

10 Subset 5 0x1

11 Subset 5 10x

12 Subset 5 x10

13 Subset 6 01

14 Subset 6 10

15 Subset 6 23

16 Subset 6 32

The Iteration Screen reports:

CONVERGENCE TABLE

-Control: \HOLDW95\examples\examsubs.txt Output: \examples\ZOU571WS.TXT

| PROX ACTIVE COUNT EXTREME 5 RANGE MAX LOGIT CHANGE |

| ITERATION PERSON ITEM CATS PERSON ITEM MEASURES STRUCTURE|

>=====================================<

| 1 15 12 8 2.00 1.06 -2.0794 |

>=====================================<

| 2 15 12 6 2.38 1.84 2.6539 -1.6094 |

>=====================================<

| 3 14 12 6 2.67 1.60 2.7231 .0000 |

>=====================================<

| 4 14 12 6 2.68 2.33 -2.3912 .0000 |

>=====================================<

| 5 14 12 6 2.97 1.77 2.3246 |

>=====================================<

| 6 14 12 6 2.97 2.54 -2.2191 |

>=====================================<

| 7 14 12 6 3.22 2.10 2.0372 |

Probing data connection: to skip out: Ctrl+F - to bypass: subset=no

Processing unanchored persons ...

>=====================================<

Consolidating 9 potential subsets pairwise ...

>=================================

Consolidating 9 potential subsets indirectly pairwise ...

>=====================================<

Consolidating 8 potential subsets pairwise ...

>=================================

Consolidating 7 potential subsets pairwise ...

>=================================

Consolidating 7 potential subsets indirectly pairwise ...

>=====================================<

Warning: Data are ambiguously connected into 7 subsets. Measures may not be comparable across subsets.

Subsets details are in Table 0.4

Table 18.1

PERSON STATISTICS: ENTRY ORDER

--------- ---------------------------

|ENTRY | |

|NUMBER | PERSON |

|-------- +-------------------------|

| 1 | 01 Extreme 11111 | MAXIMUM MEASURE

< Guttman split here >

| 2 | 02 Subset 1 01111 | SUBSET 1

| 3 | 03 Subset 1 10111 | SUBSET 1

< Guttman split here >

| 4 | 04 Subset 2 00101 | SUBSET 2

| 5 | 05 Subset 2 00011 | SUBSET 2

< Guttman split here >

| 6 | 06 Subset 3 011 | SUBSET 3

| 7 | 07 Subset 3 011 | SUBSET 3

< Guttman split here >

| 8 | 07 Subset 4 001 | SUBSET 4

| 9 | 09 Subset 4 010 | SUBSET 4

< Subset split here >

| 10 | 10 Subset 5 0x1 | SUBSET 5

| 11 | 11 Subset 5 10x | SUBSET 5 < Indirect connection >

| 12 | 12 Subset 5 x10 | SUBSET 5

< Subset split here>

| 13 | 13 Subset 6 01| SUBSET 6

| 14 | 14 Subset 6 10| SUBSET 6

< undetected Guttman split here: Winsteps failed! >

| 15 | 15 Subset 6 23| SUBSET 6

| 16 | 16 Subset 6 32| SUBSET 6

|-------- +-------------------------|

In Tables and Notes:	Explanation:
< Guttman split here >	The persons above the split performed an unknowable amount different from the persons below the split. There is no item on which this subset succeeded and another subset failed, and also this subset failed and the other subset succeeded. The data are not "well-conditioned" (Fischer G.H., Molenaar, I.W. (eds.) (1995) Rasch models: foundations, recent developments, and applications. New York: Springer-Verlag. p. 41-43).
< Subset split here >	The persons in this subset responded to different items than persons in other subsets. We don't know if these items are easier or harder than items in other subsets.
< Indirect connection >	The persons responded to different items, but they are connected by a loop of successes and failures.
< undetected Guttman split here >	Winsteps subset-detection did not report than persons 13 and 14 always score lower than persons 15 and 16, causing a Guttman split. We do not know how much better persons 15 and 16 are than persons 14 and 15. Winsteps subset-detection may fail to report subsets. Unreported subsets usually cause big jumps in the reported measures.
Data are ambiguously connected	Measures for persons in different subsets are not comparable. Winsteps always reports measures, but these are only valid within subsets. We do not know how the measures for persons in one subset compare with the measures for persons in another subset. Reliability coefficients are accidental and so is Table 20, the score-to-measure Table. Fit statistics and standard errors are approximately correct.
Measures may not be comparable across subsets	Please always investigate when Winsteps reports subsets, even if you think that all your measures are comparable.
MAXIMUM MEASURE, MINIMUM MEASURE, DROPPED, INESTIMABLE	Persons and items with special features are not included in subsets. Extreme scores (zero, minimum possible and perfect, maximum possible scores) imply measures that are beyond the current frame of reference. Winsteps uses Bayesian logic to provide measures corresponding to those scores.
SUBSET 1, 2, 4	These are directly connected subsets. Within each subset, a person has succeeded on an item and failed on an item, and vice-versa. The person performances are directly pairwise comparable within the subset. The persons in this subset have either succeeded on items in other subsets, or failed on items in other subsets, or have missing data on items in other subsets.
SUBSET 3	These two persons have the same responses, so they are in the same subset. No one succeeded on their failed items item, and also failed on their successful item.
SUBSET 5	This is an indirectly connected subset. There is a loop of successes and failures so that the performances of all three persons are connected indirectly pairwise.
SUBSET 6	Persons 13 and 14 are directly comparable using categories 0 and 1 of the rating scale. Persons 15 and 16 are directly comparable using categories 2 and 3 of the rating scale. Winsteps has not detected that persons 13 and 14 always rate lower than persons 15 and 16, causing a Guttman split.
SUBSET 7 (Table 14.1)	No person is in the same subset as this item. There is no subset in which persons both succeeded and failed on this item.
Connecting SUBSETs	Here are approaches: 1. Collect more data that links items across subsets. Please start Winsteps analysis as soon as you start data collection. Then subset problems can be remedied before data collection ends. 2. Dummy data. Include data for imaginary people in the data file that connects the subsets. 3. Anchor persons or items. Anchor equivalent items (or equivalent persons) in the different subsets to the same values - or juggle the anchor values to make the mean of each subset the same (or whatever) 4. Analyze each subset of persons and items separately. In Table 0.4, Winsteps reports entry numbers for each person and each item in each subset, so that you can compare their response strings. To analyze only the items and persons in a particular subset, such as subset 4 above, specify the items and persons in the subset: IDELETE= +9-10 PDELETE= +10-11
Memory was not allocatable to probe connectivity	If the data are complete, ignore this message. If the data are sparse, add dummy data records. They will have little influence on connected data, but will connected up data with subsets. See also Memory

Table 14.1

--------- -----------------

|ENTRY | |

|NUMBER | ITEM G |

|-------- +---------------|

| 1 | 01 Subset 1 D | SUBSET 1

| 2 | 02 Subset 1 D | SUBSET 1

< Guttman split here >

| 3 | 03 Subset 2 D | SUBSET 2

| 4 | 04 Subset 2 D | SUBSET 2

< Guttman split here >

| 5 | 05 Subset 7 D | SUBSET 7

< Guttman split here >

| 6 | 06 Subset 4 D | SUBSET 4

| 7 | 07 Subset 4 D | SUBSET 4

< Guttman split here >

| 8 | 08 Subset 5 D | SUBSET 5

| 9 | 09 Subset 5 D | SUBSET 5

| 10 | 10 Subset 5 D | SUBSET 5

< Guttman split here >

| 11 | 11 Subset 6 R | SUBSET 6

| 12 | 12 Subset 6 R | SUBSET 6

|-------- +---------------|

Table 0.4 reports

SUBSET DETAILS

Subset 1 of 2 ITEM and 2 PERSON

ITEM: 1-2

PERSON: 2-3

Subset 2 of 2 ITEM and 2 PERSON

ITEM: 3-4

PERSON: 4-5

Subset 3 of 2 PERSON

PERSON: 6-7

Subset 4 of 2 ITEM and 2 PERSON

ITEM: 6-7

PERSON: 8-9

Subset 5 of 3 ITEM and 3 PERSON

ITEM: 8-10

PERSON: 10-12

Subset 6 of 2 ITEM and 4 PERSON

ITEM: 11-12

PERSON: 13-16

Subset 7 of 1 ITEM

ITEM: 5

Example 2: Analyzing two separate datasets together.

Dataset 1. The Russian students take the Russian items. This is connected. All the data are in one subset.

Dataset 2. The American students take the American items. This is connected. All the data are in one subset.

Dataset 3. Datasets 1 and 2 are put into one analysis. This is not connected. The data form two subsets: the Russian one and the American one. The raw scores or Rasch measures of the Russian students cannot be compared to those of the American students. For instance, if the Russian students score higher than the American students, are the Russian students more able or are the Russian items easier? The data cannot tell us which is true.

Winsteps attempts to estimate an individual measure for each person and item within one frame of reference. Usually this happens. But there are exceptions.

The initial implimentation used the algorithm of David L. Weeks Donald R. Williams Technometrics 6:3 p.319-324 8/1964, but this fails for indirect linking.

Help for Winsteps Rasch Measurement and Rasch Analysis Software: www.winsteps.com. Author: John Michael Linacre

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn, 2024 George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen
As an Amazon Associate I earn from qualifying purchases. This does not change what you pay.

Coming Rasch-related Events
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

Subsets and connection ambiguities in the data

Questions, Suggestions? Want to update Winsteps or Facets? Please email Mike Linacre, author of Winsteps mike@winsteps.com