Table 30.1 Differential item functioning DIF pairwise

Table 30 supports the investigation of item bias, Differential Item Functioning (DIF), i.e., interactions between individual items and types of persons.

 

Table

30.1 is best for pairwise comparisons, e.g., Females vs. Males.

30.2 DIF report (measure list: person class within item)

30.3 DIF report (measure list: item within person class)

30.4 DIF report (item-by-person class chi-squares)

30.5 Within-class fit report (person class within item)

30.6 Within-class fit report (item within person class)

30.7 Item measure profiles for classes of persons

Excel DIF Plots

Excel DIF Scatterplots

 

You need to choose a baseline item difficulty for your DIF comparisons.

 

In Table 30.1, we usually choose one group (the majority group) to be the baseline, and DIF is computed pairwise relative to that group. Both groups have statistical uncertainty.

 

In Table 30.2, we have many roughly equally-sized groups, such as age groups, and we take the average of all the groups (the item difficulty from the main analysis - this is the best estimate when the data fit the model) as the baseline. Then DIF is relative to this baseline which is regarded as a known value. Only the focus group has statistical uncertainty.

 

The rules for DIF reporting are the same for Tables 30.1 and 30.2, but the underlying computations are somewhat different.

 

In Table 30.1 - the hypothesis is "this item has the same difficulty for two groups"
In Table 30.2, 30.3 - the hypothesis is "this item has the same difficulty as its average difficulty for all groups"

In Table 30.4 - the hypothesis is "this item has no overall DIF across all groups"

 

Table 30.1 reports a probability and a size for DIF statistics. Usually we want:

1. probability so small that it is unlikely that the DIF effect is merely a random accident

2. size so large that the DIF effect has a substantive impact on scores/measures on the test

 

A general thought: Significance tests, such as DIF tests, are always of doubtful value in a Rasch context, because differences can be statistically significant, but far too small to have any impact on the meaning, or practical use, of the measures. So we need both statistical significance and substantive difference before we take action regarding bias, etc.

 

Table 30.1 is a pairwise DIF (bias) analysis: this is testing "item difficulty for Group A vs. item difficulty for Group B". Table 30.1 makes sense if there are only two groups, or there is one majority reference group.

 

Tables 30.2 and 30.3 are a global DIF (bias) analysis: this is testing "item difficulty for Group A vs. item difficulty for all groups combined." Tables 30.2 and 30.3 make sense when there are many small groups, e.g., age-groups in 5 year increments from 0 to 100.

 

DIF results are considerably influenced by sample size, so if you have only two person-groups, go to Table 30.1. If you have lots of person-groups go to Table 30.2

 

Specify DIF= for person classifying indicators in person labels. Item bias and DIF are the same thing. The widespread use of "item bias" dates to the 1960's, "DIF" to the 1980's. The reported DIF is corrected to test impact, i.e., differential average performance on the whole test. Use ability stratification to look for non-uniform DIF using the selection rules. Tables 30.1 and 30.2 present the same information from different perspectives.

 

From the Output Tables menu, the DIF/DPF dialog is displayed.

 

Table 31 supports person bias, Differential Person Functioning (DPF), i.e., interactions between individual persons and classifications of items.

 

Table 33 reports bias or interactions between classifications of items and classifications of persons.

 

In these analyses, persons with extreme scores are excluded, because they do not exhibit differential ability across items. For background discussion, see DIF and DPF considerations.

 

Example output:

You want to examine item bias (DIF) between Females and Males. You need a column in your Winsteps person label that has two (or more) demographic codes, say "F" for female and "M" for male (or "0" and "1" if you like dummy variables) in column 9.

 

Table 30.1 is best for pairwise comparisons, e.g., Females vs. Males.

DIF class specification is: DIF=@GENDER

 

-----------------------------------------------------------------------------------------------------------------------------------

| KID   Obs-Exp   DIF   DIF   KID   Obs-Exp   DIF   DIF      DIF    JOINT  Rasch-Welch   Mantel-Haenszel Size Active TAP          |

| CLASS Average MEASURE S.E.  CLASS Average MEASURE S.E.  CONTRAST  S.E.   t  d.f. Prob. Chi-squ Prob. CUMLOR Slices Number  Name |

|---------------------------------------------------------------------------------------------------------------------------------|

| F        .00   -6.59E  .00  M        .00   -6.59E  .00       .00   .00   .00   0 1.000                                  1 1-4   |

| F        .04   -5.24> 1.90  M       -.04   -3.87   .90     -1.37  2.10  -.65  28 .5194   .0000 1.000             7      4 1-3-4 |

| F        .01   -1.67   .68  M       -.01   -1.48   .70      -.19   .97  -.19  31 .8468   .1316 .7167   -.06      7     10 2-4-3-|

|---------------------------------------------------------------------------------------------------------------------------------|

| M        .00   -6.59E  .00  F        .00   -6.59E  .00       .00   .00   .00  30 1.000                                  1 1-4   |

-----------------------------------------------------------------------------------------------------------------------------------

Width of Mantel-Haenszel slice: MHSLICE = .010 logits

 

The most important numbers in Table 30.1: The DIF CONTRAST is the difference in difficulty of the item between the two groups. This should be at least 0.5 logits for DIF to be noticeable. "Prob." shows the probability of observing this amount of contrast by chance, when there is no systematic item bias effect. For statistically significance DIF on an item, Prob. ≤ .05.

 

DIF class specification defines the columns used to identify DIF classifications, using DIF= and the selection rules.

For summary statistics on each class, use Table 28.

To eliminate unwanted classes: PSELECT=@GENDER={FM}

 

Reading across the Table 30.1 columns:

PERSON CLASS identifies the CLASS of persons. PERSON is specified with PERSON=, e.g., the first here is CLASS is "A".

 

Obs-Exp Average is the average difference between the observed and expected responses for the Class on the Item. When this is positive, the Class has higher ability than expected or the item is easier than expected.

 

DIF estimates with the  the iterative-logit (Rasch-Welch) method:

DIF MEASURE is the difficulty of this item for this class, with all else held constant, e.g., -.40 is the local difficulty for Class A of Item 1. The more difficult, the higher the DIF measure. The measures are conveniently listed in the Excel file for the DIF plots, or copy them from the Table into Excel.
For the raw scores corresponding to these measures, see Table 30.2
-.52> reports that this measure corresponds to an extreme maximum person-class score. EXTRSCORE= controls extreme score estimate.
1.97< reports that this measure corresponds to an extreme minimum person-class score. EXTRSCORE= controls extreme score estimate.
-6.91E reports that this measure corresponds to an item with an extreme score, which cannot exhibit DIF
DIF MEASURE is the same doing a full analysis of the data, outputting PFILE=pf.txt and SFILE=sf.txt, then doing another analysis with  PAFILE=pf.txt and SAFILE=sf.txt and PSELECT=@DIF=code

DIF S.E. is the standard error of the DIF MEASURE. A value of ".00" indicates that DIF cannot be observed in these data.

PERSON CLASS identifies the CLASS of persons, e.g., the second CLASS is "D".

DIF MEASURE is the difficulty of this item for this class, with all else held constant, e.g., -.52 is the local difficulty for Class D of Item 1. > means "extreme maximum score".

DIF S.E. is the standard error of the second DIF MEASURE

DIF CONTRAST is the "effect size" in logits (or USCALE= units), the difference between the two DIF MEASURE, i.e., size of the DIF across the two classifications of persons, e.g., -.40 - -.52 = .11 (usually in logits). A positive DIF contrast indicates that the item is more difficult for the first, left-hand-listed CLASS.
If you want a sample-based effect size, then
effect size = DIF CONTRAST / (person sample measure S.D.)

JOINT S.E. is the standard error of the DIF CONTRAST = sqrt(first DIF S.E.² + second DIF S.E.²), e.g., 2.50 = sqrt(.11² + 2.49²)
Welch t gives the DIF significance as a Welch's (Student's) t-statistic » DIF CONTRAST / JOINT S.E. The t-test is a two-sided test for the difference between two means (i.e., the estimates) based on the standard error of the means (i.e., the standard error of the estimates). The null hypothesis is that the two estimates are the same, except for measurement error.

d.f. is the joint degrees of freedom, computed according to Welch-Satterthwaite. When the d.f. are large, the t statistic can be interpreted as a unit-normal deviate, i.e., z-score.

INF means "the degrees of freedom are so large they can be treated as infinite", i.e., the reported t-value is a unit normal deviate.

Prob. is the two-sided probability of Student's t. See t-statistics.

 

MantelHanzel reports Mantel-Haenszel (1959) DIF test for dichotomies or Mantel (1963) for polytomies using MHSLICE=. Statistics are reported when computable.

Chi-squ. is the Mantel-Haenszel for dichotomies or Mantel for polytomies chi-square with 1 degree of freedom.

Prob. is the probability of observing these data (or worse) when there is no DIF based on a chi-square value with 1 d.f.

Size CUMLOR (cumulative log-odds ratio in logits) is an estimate of the DIF (scaled by USCALE=). When the size is not estimable, +. and -. indicate direction. For dichotomous items, this is the size of the DIF, where it is a simple log-odds-ratio. For polytomous items, no definitive polytomous DIF size statistic has been defined, but the cumulative log-odds ratio usually gives an approximate indication of the polytomous DIF size. CUMLOR is the Liu-Agresti Cumulative Log-Odds Estimator (1996).

Active Slices is a count of the estimable stratified cross-tabulations used to compute MH. MH is sensitive to score frequencies. If you have missing data, or only small or zero counts for some raw scores, the MH statistic can go wild or not be estimable. Please try different values of MHSLICE= (thin and thick slicing) to see how robust the MH estimates are.

 

ITEM Number is the item entry number. ITEM is specified by ITEM=

Name is the item label.

 

Below "----", each line in the Table is repeated with the CLASSes reversed.

 

ETS DIF Category

with DIF Contrast and DIF Statistical Significance

C = moderate to large

|DIF| ≥  0.64 logits

prob( |DIF| ≤ 0.43 logits ) ≤ .05 (2-sided)

approximately: |DIF| > 0.43 logits + 2 * DIF S.E.

B = slight to moderate

|DIF| ≥ 0.43 logits

prob( |DIF| = 0 logits ) ≤ .05 (2-sided)

approximately: |DIF| > 2 * DIF S.E

A = negligible

-

-

C-, B- = DIF against focal group; C+, B+ = DIF against reference group

ETS (Educational Testing Service) use Delta δ units.

1 logit = 2.35 Delta δ units. 1 Delta δ unit = 0.426 logits.

Zwick, R., Thayer, D.T., Lewis, C. (1999) An Empirical Bayes Approach to Mantel-Haenszel DIF Analysis. . Journal of Educational Measurement, 36, 1, 1-28

More explanation at www.ets.org/Media/Research/pdf/RR-12-08.pdf pp. 3,4

 

For meta-analysis, the DIF Effect Size = DIF Contrast / S.D. of the "control" CLASS (or the pooled CLASSes). The S.D. for each CLASS is shown in Table 28.

 

Example: The estimated item difficulty for Females, the DIF MEASURE, is 2.85 logits, and for males the DIF MEASURE is 1.24 logits. So the DIF CONTRAST, the apparent bias of the item against Females, is 1.61 logits. An alternative interpretation is that the Females are 1.61 logits less able on the item than the males.

 

                             Males          Females

Item 13: +---------+---------+-+-------+-------+-+>> difficulty increases

         -1        0          1.24     +2     2.85   DIF measure

                               +---------------> = 1.61 DIF contrast


Help for Winsteps Rasch Measurement and Rasch Analysis Software: www.winsteps.com. Author: John Michael Linacre

Facets Rasch measurement software. Buy for $149. & site licenses. Freeware student/evaluation Minifac download
Winsteps Rasch measurement software. Buy for $149. & site licenses. Freeware student/evaluation Ministep download

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn, 2024 George Engelhard, Jr. & Jue Wang Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes Statistical Analyses for Language Testers (Facets), Rita Green Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind Rasch Measurement: Applications, Khine Winsteps Tutorials - free
Facets Tutorials - free
Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse Rasch Measurement Theory Analysis in R, Wind, Hua Applying the Rasch Model in Social Sciences Using R, Lamprianou El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Rasch Models for Measurement, David Andrich Constructing Measures, Mark Wilson Best Test Design - free, Wright & Stone
Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias Diseño de Mejores Pruebas - free, Spanish Best Test Design A Course in Rasch Measurement Theory, Andrich, Marais Rasch Models in Health, Christensen, Kreiner, Mesba Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen
As an Amazon Associate I earn from qualifying purchases. This does not change what you pay.

facebook Forum: Rasch Measurement Forum to discuss any Rasch-related topic

To receive News Emails about Winsteps and Facets by subscribing to the Winsteps.com email list,
enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from Winsteps.com
The Winsteps.com email list is only used to email information about Winsteps, Facets and associated Rasch Measurement activities. Your email address is not shared with third-parties. Every email sent from the list includes the option to unsubscribe.

Questions, Suggestions? Want to update Winsteps or Facets? Please email Mike Linacre, author of Winsteps mike@winsteps.com


State-of-the-art : single-user and site licenses : free student/evaluation versions : download immediately : instructional PDFs : user forum : assistance by email : bugs fixed fast : free update eligibility : backwards compatible : money back if not satisfied
 
Rasch, Winsteps, Facets online Tutorials


 

 

 

Our current URL is www.winsteps.com

Winsteps® is a registered trademark