Simulated file specifications |
Winsteps uses two methods to simulate data:
1) probabilistically-generated data based on anchored parameter estimates
2) re-sampling-with-replacement from the current dataset
Winsteps uses the estimated (or anchored) person, item and Andrich Thresholds or person-response-string re-sampling-with-replacement to simulate a data file equivalent to the raw data. This can be used to investigate the stability of measures, distribution of fit statistics and amount of statistical bias. Each time SIFILE= is run, or selected from the Output Files pull-down menu, a simulated data file produced. Do simulated analyses with several simulated datasets to verify their overall pattern.
The parts of the dialog box outside the red rectangle are described in Output file specifications. The file format matches the input data file if both are in fixed-field format. When SIFILE= is written with CSV=Y, comma-separated or CSV=T, tab-separated, the item responses precede the person label. |
|
Simulated data files: |
invoked with SIFILE= |
Number of files: |
SINUMBER=, specifies the number of simulated files to produce. If SINUMBER= is greater than 1, then the data file name is automatically incremented, and so is the SISEED= pre-set seed value |
Seed number (0 for random): |
SISEED=, controls whether the pseudo-random number generator is seeded with the system clock (0 or 1), or with a user-chosen value, (2 and above) |
Simulate: use measures or use the data |
SIMEASURE=, chooses whether the simulated data is generated from the estimated measures (use measure), or by re-sampling from the observed data (use the data). If you wish to over-ride the estimated measures, then use IAFILE=, PAFILE= and SAFILE= |
Re-sample persons: No or Yes: Persons |
SIRESAMPLE=, controls whether re-sampling occurs (sampling with or without replacement), and, if it does, how many person records to include in the simulated data file |
Complete data: Yes or No - allow missing data |
SICOMPLETE=, Yes for complete data. No for missing data patterns to be repeated in the simulated data file |
Extreme scores: Yes or No - avoid extreme scores |
SIEXTREME=, Yes to allow the simulated data to include extreme (zero, minimum possible or perfect, maximum possible) scores. No to avoid generating extreme scores (when possible) |
Winsteps simulates data two ways:
i) the default: using the parameter values (persons, items, Andrich thresholds) from the current analysis. This way generates simulated data according to the probabilistic distributions defined by the Rasch model and the generating Rasch parameters. This is for investigations relating to exact Rasch conditions.
ii) by resampling (with replacement) the data (observations, responses) in the current analysis. This way generates data according to the empirical distribution of the generating data so pervasive misfit, such as DIF, is replicated.
Example 0 . KCT.txt simulated with CSV=N fixed field format (re-sampling response strings):
Person label - simulated data - original person measure - original person entry number
Dorothy F 111111111100000000 -.2594 13
Elsie F 111101111100000000 -1.3696 14
Thomas M 111111111010000000 -.2594 31
Rick M 111111111010000000 -.2594 27
KCT.txt simulated with comma-separated, CSV=Y, HLINES=Y, QUOTED=Y format (re-sampling person measures):
"1-4","2-3","1-2-4","1-3-4","2-1-4", ... ,"KID","Measure","Entry"
1,1,1,1,1,1,0,1,1,1,0,0,0,0,0,0,0,0,"Rick M",-.2594,27
1,1,1,1,1,1,1,1,1,0,1,0,0,0,0,0,0,0,"Helen F",-.2594,16
1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,"Rod M",1.9380,28
1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,"William M",.9229,34
Example 1. It is desired to investigate the stability of Rasch measures.
(1) Estimate measures from your control and data files (e.g., SF.txt)
(2) Choose SIFILE= from the Output Files menu.
(3) Choose to output a permanent file:
(4) Simulated data filename: SFSIMUL.TXT
(5) Rerun Winsteps with your Winsteps control file and DATA=SFSIMUL.TXT on the "Extra Specifications" line.
(6) Compare person, item and Andrich Thresholds.
Example 2. To estimate the measure standard errors in a linked equating design.
1. Do a concurrent calibration with Winsteps
2. Simulate data files SIFILE= from the Output Files menu.
Specify "complete data" SICOMPLETE= as "No" to maintain the same data pattern.
Save 10 simulated sets, SINUMBER=, as S.txt S2.txt .....
3. Rerun your Winsteps analysis 10 times
Specify in Extra Specifications "DATA=S.txt PFILE=P1.txt CSV=TAB" etc.
This will produce 10 PFILE=s. Export them in Excel format.
4. Use Excel to compute the standard deviation of the measures for each person based on the 10 person measures
5. These are the model standard errors for the equating design for the measures.
6. Inflate these values by 20%, say, to allow for systematic equating errors, misfit, etc.
Example 3. If you do need estimation-bias-correction (STBIAS=) that is as accurate as possible with your data set, you will need to discover the amount of bias in the estimates and correct for it:
1. In your control file, STBIAS=No and USCALE=1
2. Obtain the Winsteps estimates for your data
3. Simulate many datasets using those estimates. (SIFILE= on the Winsteps Output Files menu).
4. Obtain the Winsteps estimates from the simulated data sets
5. Regress the simulated estimates on your initial estimates. These will give a slope near 1.0.
6. Obtain the Winsteps estimates for your data with USCALE = 1/slope. The set of estimates in 6 is effectively unbiased.
Example 4. You need to simulate data from generating values. You can use Winsteps to simulate a dataset.
1. Obtain the generating item difficulties, person abilities and threshold values. If you need a normal distribution of person abilities, you can generate this with Excel.
a. From your standard analysis, output IFILE=if.txt, SFILE=sf.txt
b. Use Excel or similar to simulate a normal distribution of person abilities with the mean and S.D. that you want.
In Excel:
Cell A1 = Mean
Cell B1 = S.D.
Cell A2 = =ROW()-1
Cell B2 = =NORMSINV(RAND())*$B$1 +$A$1
then copy A2, B2 for as many rows as you want the sample size.
c. Copy Columns A and B into a text file, pf.txt. Delete row 1.
For a random uniform distribution of item difficulties, use the Excel formula:
=(item difficulty range)*(RAND() - 0.5) + (mean item difficulty)
d. In your Winsteps control file:
IAFILE=if.txt
SAFILE=sf.txt
PAFILE = pf.txt
SIFILE= simulated.txt
2. Construct a Winsteps control file including the generating values in IAFILE=, PAFILE=, SAFILE=
3. Make a rectangular dataset with a row of valid responses (can be the same one) as wide as the number of items
and with a column of valid responses (can be the same one) as long as the number of persons.
For instance, number of persons = 2000, number of items =100, then an artificial dichotomous Winsteps control file and dataset can be:
ITEM1=1
NAME1=101
NI=100
CODES=01
EDFILE=* ; every person and item much have non-extreme data
1-1000 1-50 1
1-1000 51-100 0
1001-2000 1-50 0
1001-2000 51-100 1
*
IAFILE=*
1 2.3
2 1.5
.....
*
PAFILE=*
1 3.1
2 -2.8
.....
*
SAFILE=*
0 0
1 0
*
&END
END LABELS
and nothing else.
4. Run Winsteps. Ignore the results of this analysis. Choose SIFILE= option from the output files menu. Click on "complete data" to simulate the entire data matrix.
Example 5. Bootstrap estimation of the confidence interval for a reliability.
Bootstrapping is based on generating new datasets using "sampling with replacement" from the current dataset. The new datasets can be generated in Winsteps using:
Simulate: use the data
Re-sample persons: yes, with same number of rows as the original data.
Compute the reliability of each new dataset. See "Performing multiple simulations in Batch mode".
After transformation with Fisher's z, the distribution of the linearized reliabilities (mean and S.D.) are the linearized expected value and linearized S.E. of the observed linearized reliability for the original data set. Transform the linearized (mean-S.D.), mean, and (mean+S.D.) back to the original reliability metric by reversing the Fisher 's z transformation.
Example 6. Multiple simulations in Batch mode. See BATCH=
These can construct bootstrap confidence intervals for DIF estimates, etc.
Set up 100 simulations in a batch file, and extract the relevant numbers from the 100 output DIF tables.
PowerGREP (or its freeware equivalents) is great software for extracting values from files. For instance:
To pick out lines 10-35 in the output files (after line 9, for 26 lines):
Action type: Search
File sectioning: Search and collect sections
Section search: \A([^\r\n]*+\r\n){9}(([^\r\n]*+\r\n){0,26})
Section collect: \2
Search: the search string: .* for everything
Example 7. Simulate with more data lines than the original data.
Simple solution: copy and paste the exisiting data after itself in the data file as many times as required to obtain the desired number of data lines: Analyze with all the data lines, then Specification dialog box: PDELETE=+1-desired number of data lines. And Output Files menu, SIFILE=
1. Use NotePad to create a text file called "Simulate.bat"
2. In this file:
REM - produce the generating values: this example uses example0.txt:
START /WAIT c:\Winsteps\Winsteps BATCH=YES example0.txt example0.out.txt PFILE=pf.txt IFILE=if.txt SFILE=sf.txt
REM - initialize the loop counter
set /a test=1
:loop
REM - simulate a dataset - use anchor values to speed up processing (or use SINUMBER= to avoid this step)
START /WAIT c:\Winsteps\Winsteps BATCH=YES example0.txt example0%test%.out.txt PAFILE=pf.txt IAFILE=if.txt SAFILE=sf.txt SIFILE=SIFILE%test%.txt SISEED=0
REM - estimate from the simulated dataset
START /WAIT c:\Winsteps\Winsteps BATCH=YES example0.txt data=SIFILE%test%.txt SIFILE%test%.out.txt pfile=pf%test%.txt ifile=if%test%.txt sfile=sf%test%.txt TFILE=* 3.1 *
REM - do 100 times
set /a test=%test%+1
if not "%test%"=="101" goto loop
PAUSE
3. Save "Simulate.bat", then double-click on it to launch it.
4. The simulate files and their estimates are numbered 1 to 100.
5. The files of estimates can be combined and sorted using MS-DOS commands, e.g.,
Copy if*.txt combinedif.txt
Sort /+(sort column) <combinedif.txt >sortedif.txt
6. Individual lines from the output files can be written to one file using MS-DOS batch commands. For instance, using an MS-DOS batch routine (.bat or .cmd), the same text line can be extracted from many text files and output into a new text file. The new text file can be be pasted into Excel. Save these MS-DOS commands as extract.bat in the folder that has the files of statistics. Double click on extract.bat to execute it.
rem replace 2 with the number of lines to skip before the line you want
@echo off
setlocal EnableDelayedExpansion
if exist result.csv del result.csv
for %%f in (*.txt) do (
echo %%f
set i=a
for /F "skip=2 delims=" %%l in (%%f) do (
if "!i!" == "a" echo %%f, %%l >> result.csv
set i=b
)
)
notepad result.csv
Help for Winsteps Rasch Measurement and Rasch Analysis Software: www.winsteps.com. Author: John Michael Linacre
Facets Rasch measurement software.
Buy for $149. & site licenses.
Freeware student/evaluation Minifac download Winsteps Rasch measurement software. Buy for $149. & site licenses. Freeware student/evaluation Ministep download |
---|
Forum: | Rasch Measurement Forum to discuss any Rasch-related topic |
---|
Questions, Suggestions? Want to update Winsteps or Facets? Please email Mike Linacre, author of Winsteps mike@winsteps.com |
---|
State-of-the-art : single-user and site licenses : free student/evaluation versions : download immediately : instructional PDFs : user forum : assistance by email : bugs fixed fast : free update eligibility : backwards compatible : money back if not satisfied Rasch, Winsteps, Facets online Tutorials |
---|
Our current URL is www.winsteps.com
Winsteps® is a registered trademark