Subsets and connection ambiguities in the data |
You see: Warning: Data are ambiguously connected into 6 subsets. Measures may not be comparable across subsets.
Quick (but arbitrary) solution:
Let's assume your data are scored 0 or 1. Then:
a. if subsets of items have the same average difficulty
Add two dummy persons with scored response strings:
10101010....
01010101....
b. if subsets of persons have the same average ability
Add two dummy items (columns) with scored response strings:
01
10
01
10
..
More detail: add to the data file two dummy person records so that all persons and items become directly comparable.
Dichotomous data:
Dummy person 1: responses: 010101010...
Dummy person 2: responses: 101010101...
This says: "the middle level of performance for all subsets of persons is the same."
Rating scale data, where "1" is the lowest category, and "5" is the highest category:
Dummy person 1: responses: 1212121212...
Dummy person 2: responses: 2121212121...
This says: "the bottom level of performance for all subsets of persons is the same."
If you are concerned the dummy persons or items will skew your statistics, give them very small weights: PWEIGHT= or IWEIGHT=
Explanation: Connectivity (or subsetting) is a concern in any data analysis involving missing data. In general,
nested data are not connected.
fully-crossed data (also called "complete data") are connected.
partially-crossed data may or may not be connected.
Winsteps examines the responses strings for all the persons. It verifies that every non-extreme response string is linked into one network of success and failure on the items. Similarly, the strings of responses to the items are linked into one network of success and failure by the persons.
If person response string A has a success on item 1 and a failure on item 2, and response string B has a failure on item 1 and a success on item 2, then A and B are connected. This examination is repeated for all pairs of response strings and all pairs of items. Gradually all the persons are connected with all the other persons, and all the items are connected with all the other items. But it some persons or some items cannot be connected in this way, then Winsteps reports a "connectivity" problem, and reports which subsets of items and persons are connected.
Mathematics: connectivity is part of Graph Theory. The person/item/judge/... parameters of the Rasch model are the vertices and the observations are the edges. In an undirected graph, we need every vertex to be connected directly or indirectly to every other vertex. A connection is established between two vertices when one vertex is observed to have both a higher observation and a lower observation than another vertex in the same context, or when both both vertices have the same intermediate category of a rating scale in the same context.
Thus there are two situation for failure to connect:
1) there is no direct or indirect link between two vertices, e.g., two different datasets analyzed together with no common parameters. This is detected by the Winsteps/Facets subset routine.
2) the vertices are connected by observations, but the observations do not meet the requirements, e.g., all the person respond to all the items, but half the persons score in the upper half of the rating scale on every item, and the other half of the persons score in the lower half of the rating scale on every item. This is called a "Guttman split" in the data. This is usually obvious in the reported estimates as a big gap on the Wright maps between the two halves of the person distribution.
Example 1: Connection problems and subsets in the data are shown in this dataset. It is Examsubs.txt.
Title = "Example of subset reporting"
Name1 = 1
Namelength = 24 ; include response string in person label
Item1 = 13
NI = 12
CODES = 0123 ; x is missing data
ISGROUPS = DDDDDDDDDDRR ; items 1-10 are dichotomies; items 11-12 share a rating scale
MUCON = 3 ; Subsetting can cause very slow convergence
TFILE=*
18.1
14.1
0.4
*
&End
01 Subset 1
02 Subset 1
03 Subset 2
04 Subset 2
05 Subset 7
06 Subset 4
07 Subset 4
08 Subset 5
09 Subset 5
10 Subset 5
11 Subset 6
12 Subset 6
END LABELS
01 Extreme 11111
02 Subset 1 01111
03 Subset 1 10111
04 Subset 2 00101
05 Subset 2 00011
06 Subset 3 011
07 Subset 3 011
08 Subset 4 001
09 Subset 4 010
10 Subset 5 0x1
11 Subset 5 10x
12 Subset 5 x10
13 Subset 6 01
14 Subset 6 10
15 Subset 6 23
16 Subset 6 32
The Iteration Screen reports:
CONVERGENCE TABLE
-Control: \HOLDW95\examples\examsubs.txt Output: \examples\ZOU571WS.TXT
| PROX ACTIVE COUNT EXTREME 5 RANGE MAX LOGIT CHANGE |
| ITERATION PERSON ITEM CATS PERSON ITEM MEASURES STRUCTURE|
>=====================================<
| 1 15 12 8 2.00 1.06 -2.0794 |
>=====================================<
| 2 15 12 6 2.38 1.84 2.6539 -1.6094 |
>=====================================<
| 3 14 12 6 2.67 1.60 2.7231 .0000 |
>=====================================<
| 4 14 12 6 2.68 2.33 -2.3912 .0000 |
>=====================================<
| 5 14 12 6 2.97 1.77 2.3246 |
>=====================================<
| 6 14 12 6 2.97 2.54 -2.2191 |
>=====================================<
| 7 14 12 6 3.22 2.10 2.0372 |
Probing data connection: to skip out: Ctrl+F - to bypass: subset=no
Processing unanchored persons ...
>=====================================<
Consolidating 9 potential subsets pairwise ...
>=================================
Consolidating 9 potential subsets indirectly pairwise ...
>=====================================<
Consolidating 8 potential subsets pairwise ...
>=================================
Consolidating 7 potential subsets pairwise ...
>=================================
Consolidating 7 potential subsets indirectly pairwise ...
>=====================================<
Warning: Data are ambiguously connected into 7 subsets. Measures may not be comparable across subsets.
Subsets details are in Table 0.4
PERSON STATISTICS: ENTRY ORDER
--------- ---------------------------
|ENTRY | |
|NUMBER | PERSON |
|-------- +-------------------------|
| 1 | 01 Extreme 11111 | MAXIMUM MEASURE
< Guttman split here >
| 2 | 02 Subset 1 01111 | SUBSET 1
| 3 | 03 Subset 1 10111 | SUBSET 1
< Guttman split here >
| 4 | 04 Subset 2 00101 | SUBSET 2
| 5 | 05 Subset 2 00011 | SUBSET 2
< Guttman split here >
| 6 | 06 Subset 3 011 | SUBSET 3
| 7 | 07 Subset 3 011 | SUBSET 3
< Guttman split here >
| 8 | 07 Subset 4 001 | SUBSET 4
| 9 | 09 Subset 4 010 | SUBSET 4
< Subset split here >
| 10 | 10 Subset 5 0x1 | SUBSET 5
| 11 | 11 Subset 5 10x | SUBSET 5 < Indirect connection >
| 12 | 12 Subset 5 x10 | SUBSET 5
< Subset split here>
| 13 | 13 Subset 6 01| SUBSET 6
| 14 | 14 Subset 6 10| SUBSET 6
< undetected Guttman split here: Winsteps failed! >
| 15 | 15 Subset 6 23| SUBSET 6
| 16 | 16 Subset 6 32| SUBSET 6
|-------- +-------------------------|
In Tables and Notes: |
Explanation: |
< Guttman split here > |
The persons above the split performed an unknowable amount different from the persons below the split. There is no item on which this subset succeeded and another subset failed, and also this subset failed and the other subset succeeded. The data are not "well-conditioned" (Fischer G.H., Molenaar, I.W. (eds.) (1995) Rasch models: foundations, recent developments, and applications. New York: Springer-Verlag. p. 41-43). |
< Subset split here > |
The persons in this subset responded to different items than persons in other subsets. We don't know if these items are easier or harder than items in other subsets. |
< Indirect connection > |
The persons responded to different items, but they are connected by a loop of successes and failures. |
< undetected Guttman split here > |
Winsteps subset-detection did not report than persons 13 and 14 always score lower than persons 15 and 16, causing a Guttman split. We do not know how much better persons 15 and 16 are than persons 14 and 15. Winsteps subset-detection may fail to report subsets. Unreported subsets usually cause big jumps in the reported measures. |
Data are ambiguously connected |
Measures for persons in different subsets are not comparable. Winsteps always reports measures, but these are only valid within subsets. We do not know how the measures for persons in one subset compare with the measures for persons in another subset. Reliability coefficients are accidental and so is Table 20, the score-to-measure Table. Fit statistics and standard errors are approximately correct. |
Measures may not be comparable across subsets |
Please always investigate when Winsteps reports subsets, even if you think that all your measures are comparable. |
MAXIMUM MEASURE, MINIMUM MEASURE, DROPPED, INESTIMABLE |
Persons and items with special features are not included in subsets. Extreme scores (zero, minimum possible and perfect, maximum possible scores) imply measures that are beyond the current frame of reference. Winsteps uses Bayesian logic to provide measures corresponding to those scores. |
SUBSET 1, 2, 4 |
These are directly connected subsets. Within each subset, a person has succeeded on an item and failed on an item, and vice-versa. The person performances are directly pairwise comparable within the subset. The persons in this subset have either succeeded on items in other subsets, or failed on items in other subsets, or have missing data on items in other subsets. |
SUBSET 3 |
These two persons have the same responses, so they are in the same subset. No one succeeded on their failed items item, and also failed on their successful item. |
SUBSET 5 |
This is an indirectly connected subset. There is a loop of successes and failures so that the performances of all three persons are connected indirectly pairwise. |
SUBSET 6 |
Persons 13 and 14 are directly comparable using categories 0 and 1 of the rating scale. Persons 15 and 16 are directly comparable using categories 2 and 3 of the rating scale. Winsteps has not detected that persons 13 and 14 always rate lower than persons 15 and 16, causing a Guttman split. |
SUBSET 7 (Table 14.1) |
No person is in the same subset as this item. There is no subset in which persons both succeeded and failed on this item. |
Connecting SUBSETs |
Here are approaches: 1. Collect more data that links items across subsets. Please start Winsteps analysis as soon as you start data collection. Then subset problems can be remedied before data collection ends. 2. Dummy data. Include data for imaginary people in the data file that connects the subsets. 3. Anchor persons or items. Anchor equivalent items (or equivalent persons) in the different subsets to the same values - or juggle the anchor values to make the mean of each subset the same (or whatever) 4. Analyze each subset of persons and items separately. In Table 0.4, Winsteps reports entry numbers for each person and each item in each subset, so that you can compare their response strings. To analyze only the items and persons in a particular subset, such as subset 4 above, specify the items and persons in the subset: IDELETE= +9-10 PDELETE= +10-11 |
Memory was not allocatable to probe connectivity |
If the data are complete, ignore this message. If the data are sparse, add dummy data records. They will have little influence on connected data, but will connected up data with subsets. See also Memory |
--------- -----------------
|ENTRY | |
|NUMBER | ITEM G |
|-------- +---------------|
| 1 | 01 Subset 1 D | SUBSET 1
| 2 | 02 Subset 1 D | SUBSET 1
< Guttman split here >
| 3 | 03 Subset 2 D | SUBSET 2
| 4 | 04 Subset 2 D | SUBSET 2
< Guttman split here >
| 5 | 05 Subset 7 D | SUBSET 7
< Guttman split here >
| 6 | 06 Subset 4 D | SUBSET 4
| 7 | 07 Subset 4 D | SUBSET 4
< Guttman split here >
| 8 | 08 Subset 5 D | SUBSET 5
| 9 | 09 Subset 5 D | SUBSET 5
| 10 | 10 Subset 5 D | SUBSET 5
< Guttman split here >
| 11 | 11 Subset 6 R | SUBSET 6
| 12 | 12 Subset 6 R | SUBSET 6
|-------- +---------------|
Table 0.4 reports
SUBSET DETAILS
Subset 1 of 2 ITEM and 2 PERSON
ITEM: 1-2
PERSON: 2-3
Subset 2 of 2 ITEM and 2 PERSON
ITEM: 3-4
PERSON: 4-5
Subset 3 of 2 PERSON
PERSON: 6-7
Subset 4 of 2 ITEM and 2 PERSON
ITEM: 6-7
PERSON: 8-9
Subset 5 of 3 ITEM and 3 PERSON
ITEM: 8-10
PERSON: 10-12
Subset 6 of 2 ITEM and 4 PERSON
ITEM: 11-12
PERSON: 13-16
Subset 7 of 1 ITEM
ITEM: 5
Example 2: Analyzing two separate datasets together.
Dataset 1. The Russian students take the Russian items. This is connected. All the data are in one subset.
Dataset 2. The American students take the American items. This is connected. All the data are in one subset.
Dataset 3. Datasets 1 and 2 are put into one analysis. This is not connected. The data form two subsets: the Russian one and the American one. The raw scores or Rasch measures of the Russian students cannot be compared to those of the American students. For instance, if the Russian students score higher than the American students, are the Russian students more able or are the Russian items easier? The data cannot tell us which is true.
Winsteps attempts to estimate an individual measure for each person and item within one frame of reference. Usually this happens. But there are exceptions.
The initial implimentation used the algorithm of David L. Weeks Donald R. Williams Technometrics 6:3 p.319-324 8/1964, but this fails for indirect linking.
Help for Winsteps Rasch Measurement and Rasch Analysis Software: www.winsteps.com. Author: John Michael Linacre
Facets Rasch measurement software.
Buy for $149. & site licenses.
Freeware student/evaluation Minifac download Winsteps Rasch measurement software. Buy for $149. & site licenses. Freeware student/evaluation Ministep download |
---|
Forum: | Rasch Measurement Forum to discuss any Rasch-related topic |
---|
Questions, Suggestions? Want to update Winsteps or Facets? Please email Mike Linacre, author of Winsteps mike@winsteps.com |
---|
State-of-the-art : single-user and site licenses : free student/evaluation versions : download immediately : instructional PDFs : user forum : assistance by email : bugs fixed fast : free update eligibility : backwards compatible : money back if not satisfied Rasch, Winsteps, Facets online Tutorials |
---|
Our current URL is www.winsteps.com
Winsteps® is a registered trademark