References to Many-Facet Rasch Measurement

Please cite the current version of Facets as:

Linacre, J. M. (2024) Facets computer program for many-facet Rasch measurement, version 4.1.3. Beaverton, Oregon: Winsteps.com

 

For a webpage in Facets Help:

Linacre, J. M. (%YEAR%, %MONTH%) Web page title, Retrieved from www.winsteps.com/facetman/webpage.htm

 

Books

MFRM means Linacre J.M. (1994). Many-Facet Rasch Measurement, 2nd Ed. Chicago: MESA Press www.winsteps.com - free

BTD means Wright B.D. & Stone M.H. (1979) Best Test Design, Chicago: MESA Press www.rasch.org - free

RSA means Wright B.D. & Masters G.N. (1982) Rating Scale Analysis, Chicago: MESA Press www.rasch.org - free

 

 

"Introduction to Many-Facet Rasch Measurement" by Thomas Eckes (2015, 2nd Edn.), Frankfurt am Main: Peter Lang.

"Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments" (2017, 2018) George Engelhard Jr., Stefanie Wind. Routledge - Taylor & Francis

"Statistical Analyses for Language Testers" (2013) Rita Green, Palgrave Macmillan

 

"Measuring Second Language Performance" by T. F. McNamara, Addison-Wesley Longman, 1996.

"Applying the Rasch Model: Fundamental Measurement in the Human Sciences", by Trevor G. Bond & Christine M. Fox, 4th edn 2020. Routledge

"Introduction to Rasch Measurement", Everett V. Smith, Jr. & Richard M. Smith (Eds.) JAM Press, 2004: jampress.org

Rasch G. (1960, 1980, 1992) Probabilistic Models for Some Intelligence and Attainment Tests. Chicago: MESA Press. www.rasch.org/models.htm

 

Other recommended sources:

Rasch Measurement Transactions: www.rasch.org/rmt/

Journal of Applied Measurement: jampress.org

 

References: (If you publish a paper using many-facet Rasch measurement, or know of one not on this list, please notify www.winsteps.com.)

For quick access to online papers, Google Scholar "+Facets +Rasch +Linacre filetype:pdf" - as of January 2018, this produced about 600 hits.

 

Ahlström S., Bernspång B. (2003) Occupational Performance of Persons Who Have Suffered a Stroke: a Follow-up Study. Scandinavian Journal of Occupational Therapy, 10, 2, 88 - 94.

Allen J.M. & Schumacker R.E.(1998) Team Assessment Utilizing a Many-Facet Rasch Model. Journal of Outcome Measurement 2:2, 142-158.

Atchison B.T., Fisher A.G., Bryze K. (1998) Rater reliability and internal scale and person response validity of the School Assessment of Motor and Process Skills. American Journal of Occupational Therapy, 52, 843-850.

Bachman L. F., Davidson, F., Ryan, K., & Choi, I. (1995). An investigation into the comparability of two tests of English as a foreign language: The Cambridge-TOEFL comparability study. Cambridge University Press.

Bachman L.F., Lynch, B.K., and Mason M. (1995) Investigating variability in tasks and rater judgements in a performance test of foreign language speaking. Language Testing, 12, 238-257.

Banerji M. (2000) Construct Validity of Scores/Measures from a Developmental Assessment of Mathematics using Classical and Many-Facet Rasch Measurement. Journal of Applied Measurement, 1:2, 177-198.

Barnett R.V., Easton J., Israel G.D. (2002) Keeping Florida's Children Safe in School: How One State Designed a Model Safe School Climate Survey. School Business Affairs, 68, 6, 31-38.

Barrett, S. (2001) The impact of training on rater variability. International Education Journal, 2 (1), 49-58

Basturk, R. (2008) Applying the many-facet Rasch model to evaluate PowerPoint presentation performance in higher education. Assessment and Evaluation in Higher Education, 33, (4) 431-444.

Batty, A.O. (2015). A comparison of video- and audio-mediated listening tests with many-facet Rasch modeling and differential distractor functioning. Language Testing, 32: 3-20.

Beck C.T. & Gable R.K. (2000) Postpartum Depression Screening Scale: Development and Psychometric Testing. Nursing Research. 49(5):272-282 www.nursingresearchonline.com

Beck C.T. & Gable R.K. (2001) Comparative Analysis of the Performance of the Postpartum Depression Screening Scale With Two Other Depression Instruments. Nursing Research. 50(4):242-250. www.nursingresearchonline.com

Beck C.T. & Gable R.K. (2001) Further Validation of the Postpartum Depression Screening Scale. Nursing Research. 50(3):155-164. www.nursingresearchonline.com

Beck C.T. & Gable R.K. (2003) Postpartum Depression Screening Scale: Spanish Version. Nursing Research. 52(5):296-306. www.nursingresearchonline.com

Bernspång B. (1999) Rater Calibration Stability for the Assessment of Motor and Process Skills. Scandinavian Journal of Occupational Therapy, 6, 3, 101-109

Bernspång, B., & Fisher, A.G. (1995) Differences between persons with right or left CVA on the Assessment of Motor and Process Skills. Archives of Physical Medicine and Rehabilitation, 76, 1144-1151.

Bernspång, B., & Fisher, A.G. (1995) Validation of the Assessment of Motor and Process Skills for use in Sweden. Scandinavian Journal of Occupational Therapy, 2, 3-9.

Bode R.K., Klein-Gitelman M.S., Miller M.L., Lechman T.S., Pachman L.M. (2003) Disease activity score for children with juvenile dermatomyositis: Reliability and validity evidence . Arthritis Care & Research, 49, 1, 7-15.

Bonk W. (2000) KEPT ARGENTINA: 2000 Administration Final Report. Kanda University of International Studies.

Bonk W.J. & Ockey G.J. (2003) A many-facet Rasch analysis of the second language group oral discussion task. Language Testing, 20, 1, 89-110

Bray, K., Fisher, A.G., Duran, L. (2001) The validity of adding new tasks to the Assessment of Motor and Process Skills. American Journal of Occupational Therapy, 55, 409-415.

Breton, G., Lepage, S., & North, B. (2008). Cross-language benchmarking seminar to calibrate examples of spoken production in English, French, German, Italian and Spanish with regard to the six levels of the Common European Framework of Reference for Languages (CEFR). Strasbourg: Language Policy Division.

Brindley, G. (1998) Outcomes-based assessment and reporting in language learning programs: a review of the issues. Language Testing, 15, 1: 45-85.

Brown A. & Hill K. (1998) Interviewer style and candidate performance in the IELTS oral interview. IELTS Research 1, 1

Brown A. (2003) Interviewer variation and the co-construction of speaking proficiency. Language Testing, 20, 1, 1-25

Caban H.L. (2003) Rater group bias in the speaking assessment of four l1 Japanese ESL students. Second Language Studies, 21(2), 1-44.

Campbell S.K., Kolobe T.H.A., Osten E.T., Lenke M., Girolami G.L. (1995). Construct Validity of the Test of Infant Motor Performance. Physical Therapy 75:7 p.585-596.

Campbell S.K., Kolobe T.H.A., Wright B.D., Linacre J.M. (2002) Validity of the Test of Infant Motor Performance for prediction of 6-, 9- and 12-month scores on the Alberta Infant Motor Scale. Developmental Medicine & Child Neurology, 44: 263 - 272

Campbell, S.K., Osten, E.T., Kolobe, T.H. A., Fisher, A.G. (1993) Development of the Test of Infant Motor Performance. Physical Medicine and Rehabilitation Clinics of North America: New Developments in Functional Assessment, 4, 541-550.

Chatterji, M. (2002). Measuring leader perceptions of school readiness for reforms: Use of an iterative model combining classical and Rasch methods. Journal of Applied Measurement, 3, 455-485.

Chesnut R.M. et al. (1999) AHRQ Evidence Reports, Number 2. Rehabilitation for Traumatic Brain Injury. National Library of Medicine. HSTAT.

Chi, E. (2001) Comparing Holistic and Analytic Scoring for Performance Assessment with Many-facet Rasch Measurement. Journal of Applied Measurement 2:4, 379-388.

Choi, S.E. (1997) Rasch invents "Ounces". Rasch Measurement Transactions 11:2, 557.

College Board (2003) Monitoring Faculty Consultant Performance in the Advanced Placement English Literature and Composition Program with a Many-Faceted Rasch Model, RR No. 2003-1

Congdon, P. (1998). Unmodelled rater discrimination error. In M. Wilson & G. Engelhard (Eds.), Objective measurement: Theory into practice (Vol. 5). Stamford, CT: Ablex.

Coniam D. & Falvey P. (2002) Does Student Language Ability Affect the Assessment of Teacher Language Ability? Journal of Personnel Evaluation in Education, 16, 4, 269-285

Connally J, Jorgensen K, Gillis S, Griffin P. (2003) A multi-source measurement approach to the assessment of higher order competencies. AVETRA.

Cooke, K.Z., Fisher, A.G., Mayberry, W., Oakley, F. (2000) Differences in activities of daily living process skills of persons with and without Alzheimer's disease. Occupational Therapy Journal of Research, 20, 87-104.

Dapueto J.J., Francolino C., Servente L., Chang C.H., Gotta I., Levin R., del Carmen Abreu M. (2003) Evaluation of the Functional Assessment of Cancer Therapy-General (FACT-G) Spanish Version 4 in South America: Classic Psychometric and Item Response Theory Analyses. Health and Quality of Life Outcomes, 1, 32.

Darragh A.R., Sample P.L., Fisher A.G. (1998) Environment effect of functional task performance in adults with acquired brain injuries: use of the assessment of motor and process skills. Arch Phys Med Rehabil. 79, 4, 418-23.

Daud N.M. & Kasim N.L.A. (2004) Class assessment: cans students be relied on? The Australian Association for Research in Education.

de Jong J.H.A.L. & Bernstein J. (2001) Relating PhonePassTM Overall Scores to the Council of Europe Framework Level Descriptors. Proceedings of Eurospeech 2001

DeCarlo L. T. (2005) A Model of Rater Behavior in Essay Grading Based on Signal Detection Theory. Journal of Educational Measurement, 42, 1, 53-76

Derrickson J.P., Fisher A.G., Anderson J.E.L. (2001) Lessons learned from an assessment of the Individual-Level Core Food Security Module. In Andrews M.S. & Prell M.A. (Eds.) Second Food Security Measurement and Research Conference, Volume II: Papers. USDA, 50-58.

Derrickson J.P., Fisher A.G., Anderson J.E.L., Brown A.C. (2001) An Assessment of Various Household Food Security Measures in Hawaiì Has Implications for National Food Security Research and Monitoring. Journal of Nutrition. 131:749-757.)

Derrickson, J.P., Anderson, J.E. L., Fisher, A.G. (2000) Concurrent validity of a face valid food security measure [Online]. Institute for Research on Poverty, University of Wisconsin (Discussion Paper DP-1206-00).

Derrickson, J.P., Fisher, A.G., Anderson, J.E. L. (2000) The Core Food Security Module scale measure is valid and reliable when used with Asians and Pacific Islanders. Journal of Nutrition, 130, 2666-2674.

DeShea L. (2003) A scenario-based scale of Willingness to Forgive. Individual Differences Research, 1, 3, 201-217.

Dickerson, A.E., & Fisher, A.G. (1993) Age differences in functional performance. American Journal of Occupational Therapy, 47, 686-692.

Dickerson, A.E., & Fisher, A.G. (1995) Culture-relevant functional performance assessment of the Hispanic elderly. Occupational Therapy Journal of Research, 15, 50-68.

Dickerson, A.E., & Fisher, A.G. (1997) The effects of familiarity of task and choice on the functional performance of young and old adults. Psychology and Aging, 12, 247-254.

Doble, S.E., & Fisher, A.G. (1998) The dimensionality and validity of the Older Americans Resources and Services (OARS) activities of daily living (ADL) scale. Journal of Outcome Measurement, 2, 2-23.

Doble, S.E., Fisk, J.D., Fisher, A.G., Ritvo, P.G., Murray, T.J. (1994) Functional competence of community-dwelling persons with multiple sclerosis using the Assessment of Motor and Process Skills. Archives of Physical Medicine and Rehabilitation, 75, 843-851.

Doble, S.E., Fisk, J.D., MacPherson, K.M., Fisher, A.G., Rockwood, K. (1997) Measuring functional competence in older persons with Alzheimer's disease. International Psychogeriatrics, 9, 25-38.

Doble, S.E., Fisk, J.D., Rockwood, K. (1999) Assessing the ADL Functioning of Persons With Alzheimer's Disease: Comparison of Family Informants' Ratings and Performance-Based Assessment Findings. International Psychogeriatrics (1999), 11: 399-409

Du, Y. and Brown, W.L. (2000) Raters and Single Prompt-to-Prompt Equating Using the FACETS Model in a Writing Performance Assessment. In Objective Measurement: Theory into Practice, Vol. 5 Westport Ct: Ablex.

Du, Y. and Wright, B.D. (1997) Effects of Item Characteristics in a Large-scale Direct Writing Assessment. In M. Wilson, G. Engelhard, Jr., & K. Draney (Eds.), Objective Measurement: Theory into Practice (Vol. 4, pp. 1-24). Norwood, NJ: Ablex.

Duran, L., & Fisher, A.G. (1996) Male and female performance on the Assessment of Motor and Process Skills. Archives of Physical Medicine and Rehabilitation, 77, 1019-1024.

Dávid, G.A. (2007) Investigating the Performance of Alternative Types of Grammar Items. Language Testing. 24/1, pp. 65-97.

Dávid, G.A. (2010). Linking the General English Suite of Euro Examinations to the CEFR: a case study report. In: Martyniuk, W. (Szerk.). Aligning Tests with the CEFR: Reflections on Using The Council of Europe’s Draft Manual. Cambridge: Cambridge University Press, 177-203.

Dávid, G.A. (2011). Linking the Euroexams to the Common European Framework of Reference: A full report. Budapest: Euroexam Nyelvvizsga Kft.

Dávid, G.A. (2012). A szintleírások nyelvének szerepe a közös európai referenciakeret magyar, angol és német nyelvu kiadásában. Magyar Pedagógia.112(1), 19-39. (Hungarian). English  summary.

Eckes T. (2004) Beurteilerübereinstimmung und Beurteilerstrenge: Eine Multifacetten-Rasch-Analyse von Leistungsbeurteilungen im "Test Deutsch als Fremdsprache" (TestDaF). Diagnostica, 50, 2, 65-77.

Eckes T. (2004) Facetten des Sprachtestens: Strenge und Konsistenz in der Beurteilung sprachlicher Leistungen. In A. Wolff, T. Ostermann & C. Chlosta (Hrsg.), Integration durch Sprache (Materialien Deutsch als Fremdsprache, Bd. 73, S. 485 - 518). Regensburg: FaDaF.

Eckes T. (2005) Analyse und Evaluation sprachproduktiver Prüfungen beim TestDaF. In I. Kühn, W. Timmermann & M. Lehker (Hrsg.), Sprachtests in der Diskussion (S. 60 - 93). Frankfurt: Lang.

Eckes T. (2005) Evaluation von Beurteilungen: Psychometrische Qualitätssicherung mit dem Multifacetten-Rasch-Modell. Zeitschrift für Psychologie, 213, 77 - 96.

Eckes T. (2005) Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly, 2, 3, 197-221.

Eckes T. (2006) Multifacetten-Rasch-Analyse von Personenbeurteilungen. [German] Zeitschrift fur Sozialpsychologie, 37, 3, 185-195.

Eckes T. (2008) Assuring the quality of TestDaF examinations: A psychometric modeling approach. In L. Taylor & C. J. Weir (Eds.), Multilingualism and assessment: Achieving transparency, assuring quality, sustaining diversity – Proceedings of the ALTE Berlin Conference May 2005 (pp. 157–178). Cambridge, UK: Cambridge University Press.

Eckes T. (2008) Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25, 155–185.

Eckes T. (2009) Manual for relating Language Examinations to the Common European Framework of Reference for Languages (CEFR). Council of Europe. www.coe.int/t/dg4/linguistic/Source/CEF-refSupp-SectionH.pdf

Eckes T. (2016) Setting cut scores on an EFL placement test using the prototype group method: A receiver operating characteristic (ROC) analysis. Language Testing.

Eckes, T. & Grotjahn, R. (2006) A closer look at the construct validity of C-tests. Language Testing 2006 23 (3) 290 - 325

Elder, C., Barkhuizen, G., Knoch, U. and von Randow, J. (2007). Evaluating rater responses to an online training program for writing assessment. Language Testing 24, 1, 1-28.

Elder, C., Iwashita, N., & McNamara, T. (2002). Estimating the difficulty of oral proficiency tasks: What does the test-taker have to offer? Language Testing, 19, 347-368.

Elder, C., Knoch, U., Barkhuizen, G. and von Randow, J. (2005). Feedback to enhance rater training. Does it work? Language Assessment Quarterly 2, 3, 175-196

Ellison, S., Fisher, A.G., Duran, L. (2001) The alternate forms reliability of the new tasks added to the Assessment of Motor and Process Skills. Journal of Applied Measurement, 2, 120-133.

Engelhard G., Jr. & Anderson, D.W. (1998). A binomial trials model for examining the ratings of standard-setting judges. Applied Measurement in Education, 11(3), 209-230.

Engelhard G., Jr. & Myford, C.M. (2003) Monitoring Faculty Consultant Performance in the Advanced Placement English Literature and Composition Program with a Many-Faceted Rasch Model, RR No. 2003-1. College Board.

Engelhard G., Jr. & Stone, G.E. (1998) Evaluating the Quality of Ratings Obtained From Standard-Setting Judges, Educational and Psychological Measurement, 58(2), 179-196.

Engelhard G., Jr. (1992). The measurement of writing ability with a many-faceted Rasch model. Applied Measurement in Education, 5(3), 171-191.

Engelhard G., Jr. (1994). Examining rater errors in the assessment of written composition with a many-faceted Rasch model. Journal of Educational Measurement, 31(2), 93-112.

Engelhard G., Jr. (1996). Evaluating rater accuracy in performance assessments. Journal of Educational Measurement, 33(1), 56-70.

Engelhard G., Jr. (1997). Constructing rater and task banks for performance assessments. Journal of Outcome Measurement, 1(1), 19-33.

Engelhard G., Jr. (2001) Examining the Psychometric Quality of the National Board for Professional Teaching Standards Early ... Journal of Personnel Evaluation in Education 15:4, 253-285

Engelhard G., Jr. (2002). Monitoring raters in performance assessments. In G. Tindal & T. Haladyna (Eds.), Large-scale assessment programs for ALL students: Development, implementation, and analysis (p. 261-287). Mahway, NJ: Lawrence Erlbaum Associates, Pub.

Engelhard G., Jr., Cramer S. (1997) Using Rasch Measurement to Evaluate Ratings of Standard-Setting Judges. In M. Wilson, G. Engelhard, Jr., & K. Draney (Eds.), Objective Measurement: Theory into Practice (Vol. 4, pp. 97-112). Norwood, NJ: Ablex.

Engelhard G., Jr., David, M., & Hansche, L. (1999). Evaluating the accuracy of judgments obtained from item review committees. Applied Measurement in Education, 12, 199-210.

Engelhard G., Jr., Myford, C. M. (2003). Monitoring Faculty Consultant Performance in the Advanced Placement English Literature and Composition Program with a Many-Faceted Rasch Model. College Board Research Report No.  2003-1. ETS RR-03-01.

Engelhard G., Jr., Myford, C. M., & Cline, F. (2000). Investigating assessor effects in National Board for Professional Teaching Standards assessments for Early Childhood/Generalist and Middle Childhood/Generalist certification (ETS Research Report RR-00-13). Princeton, NJ: Educational Testing Service.

Englund, B., Bernspång, B., Fisher, A.G. (1995) Development of an instrument for assessment of social interaction skills in occupational therapy. Scandinavian Journal of Occupational Therapy, 2, 17-23.

Fisher A.G. (1992) Commentary [Haley, S.M., Ludlow, L.H. Applicability of the hierarchical scales of the Tufts Assessment of Motor Performance for school-aged children and adults with disabilities]. Physical Therapy, 72, 202-203.

Fisher A.G. (1993) The assessment of IADL motor skills: An application of many-faceted Rasch analysis. American Journal of Occupational Therapy, 47, 319-329.

Fisher A.G. (1993, April) The assessment of IADL motor skills: An application of many-faceted Rasch analysis. American Journal of Occupational Therapy, 47(4), 319-329.

Fisher A.G. (1994) Development of a functional assessment that adjusts ability measures for task simplicity and rater leniency. In M. Wilson (Ed.), Objective measurement: Theory into practice. Vol II (pp. 145-175) Norwood, New Jersey: Ablex Publishing Corporation.

Fisher A.G. (1997) Multifaceted measurement of daily life task performance : Conceptualizing a test of instrumental ADL and validating the addition of personal ADL tasks. Physical medicine and rehabilitation : State of the Art Reviews. 11(2) : 289-303.

Fisher A.G. (2003) Assessment of Motor and Process Skills (AMPS). Vol. 1: Development, Standardization, and Administration Manual (5th ed.) Fort Collins, CO: Three Star Press. www.ampsintl.com

Fisher A.G. (2004) Assessment of Motor and Process Skills (AMPS). Vol. 2: User Manual (revised 5th ed.) Fort Collins, CO: Three Star Press.

Fisher A.G., & Duran, G.A. (2004) Schoolwork Task Performance of Students At Risk for Delays. Scandinavian Journal of Occupational Therapy, 11, 1-8.

Fisher A.G., Bryze K.A., Granger C.V., Haley S.M., Hamilton B.B., Heinemann A.W., Puderbaugh J.K., Linacre J.M., Ludlow L.H., McCabe M.A. & Wright B.D. (1994) Applications of conjoint measurement to the development of functional assessments. International Journal of Educational Research, 21(6), 579-593.

Fisher A.G., Bryze, K., Hume. V. (2002) School AMPS: School Version of the Assessment of Motor and Process Skills. Ft. Collins, CO: Three Star Press.

Fisher A.G., Liu, Y., Velozo, C.V. Pan, A.W. (1992) Cross-cultural assessment of process skills. American Journal of Occupational Therapy, 46, 876-885.

Fisher P. (1991) Baseball Plays and Players, Rasch Measurement Transactions, 5:2 p.142

Fisher P. (1994) Measurement of Golf proficiency, Rasch Measurement Transactions, 1994, 7:4 p.332

Fisher W.P. (2003) Mathematics, Measurement, Metaphor and Metaphysics II. Theory & Psychology, 13, 6, 791-828

Fisher W.P., & Fisher, A.G. (1993) Applications of Rasch analysis to studies in occupational therapy. Physical Medicine and Rehabilitation Clinics of North America: New Developments in Functional Assessment, 4, 551-569.

Fitzpatrick A.R, Ercikan K., Yen W., Ferrara S. (1998) The consistency between raters scoring in different test years. Applied Measurement in Education, 11, 195-208.

Fujioka-Ito, N (2004) Development of Speech Contest appraisal. Central Association of Teachers of Japanese. tell.fll.purdue.edu/CATJ16/fujioka.html

Galbraith P. & Haines C. (1998) Disentangling the Nexus: Attitudes to Mathematics and Technology in a Computer Learning Environment. Educational Studies in Mathematics, 36, 3, 275-290

Gallegos P.J., Peeters M.J.(2011). A measure of teamwork perception for team-based learning. Currents in Pharmacy Teaching and Learning, 3(1), 30-35.

Gerdes R., Bauske E,  Kaiser F.G.(2023) A general explanation for environmental policy support: An example using carbon taxation approval in Germany. Journal of Environmental Psychology, 90. https://doi.org/10.1016/j.jenvp.2023.102066

Girard C.R., Fisher A.G., Short M.A., Duran L. (1999) Occupational performance differences between psychiatric groups. Scandinavian Journal of Occupational Therapy, 6, 119 - 126

Goldman, S.L., & Fisher, A.G. (1997) Cross-cultural validation of the Assessment of Motor and Process Skills (AMPS) British Journal of Occupational Therapy, 60, 77-85.

Goto, S., Fisher, A.G., Mayberry, W.L. (1996) AMPS applied cross-culturally to the Japanese. American Journal of Occupational Therapy, 50, 798-806.

Griffin P. & Gillis S. (2000) A multi source measurement approach to assessment of higher order competencies. British Education Research Association Annual Conference.

Haans A., Kaiser F.G., Bouwhuis D.G., IJsselsteijn W.A. (2012) Individual differences in the rubber-hand illusion: Predicting self-reports of people's personal experiences. Acta Psychologica, 141, 169–177

Haines C. & Crouch R. (2001) Recognizing constructs within mathematical modelling. Teaching Mathematics and its Applications, 20, 3, 129-138

Haines C.R. & Houston S.K. (2004?) Assessing student project work. ICMI Study On the Teaching and Learning of Mathematics at University Level.

Haladyna T. & Hess R. (1999) An Evaluation of Conjunctive and Compensatory Standard-Setting Strategies for Test Decisions. Educational Assessment. 6, 2, 129-153

Harasym, P. H., Woloschuk, W., & Cunning, L. (2008). Undesired variance due to examiner stringency/leniency effect in communication skill scores assessed in OSCEs. Advances in Health Sciences Education, 13, 617–632.

Hartman M.L., Fisher A.G., Duran L. (1999) Assessment of functional ability of people with Alzheimer's disease, 6, 111 - 118

Hayase D., Mosenteen D., Thimmaiah D., Zemke S., Atler K., Fisher A.G. (2004) Age-related changes in activities of daily living ability . Australian Occupational Therapy Journal, 51, 4, 192

Heller, J. I., Sheingold, K., & Myford, C. M. (1999). Reasoning about evidence in portfolios: Cognitive foundations for valid and reliable assessment. Educational Assessment, 5 (1), 5-40.

Hermansson L.M., Fisher A.G., Bernspång B., Eliasson A.-C. (2004) Assessment of Capacity for Myoelectric Control: a new rasch-built measure of prosthetic hand control. Journal of Rehabilitation Medicine, 36, 1-6.

Hess R.J. & Becker M.S. (1997) Applied Assessment in the Glendale Union High School District: An Application of the Many-Faceted Rasch Model. National Association of Test Directors.

Hickey D.T., Wolfe E.W., Kindfield A.C.H. (2001) Assessing Learning in a Technology-Supported Genetics Environment: Evidential and Systemic Validity Issues. Educational Assessment, 6, 3, 155-196.

Hoskens, M., & Wilson, M. (2001). Real-time feedback on rater drift in constructed response items: An example from the Golden State Examination. Journal of Educational Measurement, 38, 121-146.

Houran J. (2004) The public's expectation of finding a soul mate. True Online Magazine.

Hsieh, Mingchuang (2013) An application of Multifaceted Rasch measurement in the Yes/No Angoff standard setting procedure. Language Testing Journal, 30, 491-512.

Hung, L-F, Wang, W-C (2012). The Generalized Multilevel Facets Model for Longitudinal Data. Journal of Educational and Behavioral Statistics, 37, 2, 231–255.

Iramaneerat, C.,  Myford, C., Yudkowsky, R., Lowenstein T. (2009). Evaluating the effectiveness of rating instruments for a communication skills assessment of medical residents.  Adv in Health Sci Educ (2009) 14:575–594

Iramaneerat, C., Yudkowsky, R., Myford, C., & Downing, S. (2007). Quality control of an OSCE using generalizability theory and many-faceted Rasch measurement. Advances in Health Sciences Education.

Iwashita, N., McNamara, T., & Elder, C. (2001). Can we predict task difficulty in an oral proficiency test? Exploring the potential of an information-processing approach to task design. Language Learning, 51, 401-436.

Johnson, J. S., and Lim, G. S. (2009). The influence of rater language background on writing performance assessment. Language Testing, 26(4), 485-505.

Jones N. & Shaw S.D. (2003) Task difficulty in the assessment of writing: Comparing performance across three levels of CELS. Cambridge Esol Research Notes, 11, 11-15.

Kaliski, P.K., Wind, S.A., Engelhdard, G. Jr., Morgan, D.L., Plake, B.S., Reshetar, R.A. (2013) Using the Many-Faceted Rasch Model to Evaluate Standard Setting Judgments: An Illustration With the Advanced Placement Environmental Science Exam. Educational and Psychological Measurement, 73(3) 386-411.

Kassim, Noor Lide Abu (2011) Judging behaviour and rater errors: an application of the many-facet Rasch model. GEMA: Online Journal of Language Studies, 11 (3). pp. 179-197.

Kinnman J., Andersson U., Wetterquist L., Kinnman Y., Andersson U. (2000) Colling suit for multiple sclerosis: functional improvement in daily living? Scandinavian Journal of Rehabilitation Medicine, 32, 1, 20 - 24

Kirkley, K., & Fisher, A.G. (1999) Alternate forms reliability of the Assessment of Motor and Process Skills. Journal of Outcome Measurement, 3, 53-70.

Kjellberg A., Haglund L., Forsyth K., Kielhofner G. (2003) The measurement properties of the Swedish version of the assessment of communication and interaction skills. Scandinavian Journal of Caring Sciences, 17, 3, 271-277

Kline TL, Schmidt KM, Bowles RP. (2006). Using LinLog and FACETS to model item components in the LLTM. Journal of Applied Measurement, 7(1):74-91.

Knoch U. (2007). Do empirically developed rating scales function differently to conventional rating scales for academic writing? Spaan Fellow Working Papers in Second or Foreign Language Assessment, 5

Knoch U. (2007). ‘Little coherence, considerable strain for reader’: A comparison between two rating scales for the assessment of coherence. Assessing Writing 12, 108-128

Knoch U. (2008). Diagnostic writing ability: A rating scale for the assessment of accuracy, fluency and complexity. New Zealand Studies in Applied Linguistics 14, 2, 1-25

Knoch U. (2008). The assessment of academic style in EAP writing: The case of the rating scale. Melbourne Papers in Language Testing 13, 1, 34-67

Knoch U. (2009). The development and validation of a rating scale for diagnostic writing assessment. Language Testing 26, 2, 275-304

Knoch U., Read, J and von Randow, J. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing Writing 12, 26-43

Kondo-Brown, K. (2002). A FACETS analysis of rater bias in measuring Japanese writing performance. Language Testing, 19(1), 3-31.

Kottorp A., Bernspång B., Fisher A.G. (2003) Validity of a performance assessment of activities of daily living for people with developmental disabilities. Journal of Intellectual Disability Research, 47, 8, 597 - November 2003

Kottorp A., Bernspång B., Fisher, A .G., Bryze, K. (1995) IADL ability measured with the AMPS: Relation to two classification systems of mental retardation. Scandinavian Journal of Occupational Therapy, 2, 121-128.

Kottorp A., Bernspång B., Fisher, A.G. (2003) Activities of daily living in persons with intellectual disability: Strengths and limitations in specific motor and process skills. Australian Occupational Therapy Journal, 50, 195-204.

Kottorp A., Bernspång, B., Fisher, A.G. (2003) Validity of a performance assessment of activities of daily living for persons with developmental disabilities. Journal of Intellectual Disability Research, 47, 597-605.

Kottorp A., Hällgren, M., Bernspång, B., Fisher, A.G. (2003) Client-centred occupational therapy for persons with mental retardation: Implemention of an intervention programme in activities of daily living tasks. Scandinavian Journal of Occupational Therapy, 10, 51-60.

Kozaki Y. (2004) Using GENOVA and FACETS to set multiple standards on performance assessment for certification in medical translation from Japanese into English. Language Testing, 21, 1, 1-27

Kubinger, K.D. (2012). Ein mathematisches Modell zur Fußball-Toto-Prognose (12er-Wette) (in German). In M. Voracek (Hrsg.), Of things past – Memorial Book for Anton K. Formann (1949-2010) (S. 71-78).

Lai J.S., Velozo C.A., Linacre J.M. (1997) Adjusting for Rater Severity in an Un-linked FIM national Data Base: An Application of the Many-Facets model. Physical Medicine and Rehabilitation 11:2 325-332.

Lai, J.S., Fisher, A.G., Magalhaes, L.C., Bundy, A.C. (1996) Construct validity of the praxis tests on the Sensory Integration and Praxis Tests. Occupational Therapy Journal of Research, 16, 75-97.

Lang S. (2002) Ranking and Measuring of Sailing Teams. Rasch Measurement Transactions. 15:4,  851.

Lange R.(2003) Scaling Methods: Model Sailplane Competition: From Awarding Points to Measuring Performance Skills. R/C Soaring Digest. 20:8, 4-11.

Lange R., Greyson B., Houran J. (2004) A Rasch scaling validation of a 'core' near-death experience. British Journal of Psychology, 95, 2, 161-177

Lange R., Jerabeck, I., Houran J. (2004) Building blocks for satisfaction in long-term romantic relationships: Evidence for the complementarity hypothesis for romantic compatibility. Adult Development Symposium Society for Research in Adult Development Preconference, AERA

Lange R.; Lange X.. (2012) Quality Control in Crowdsourcing: An Objective Measurement Approach to Identifying and Correcting Rater Effects in the Social Evaluation of Products and Services. AAAI Spring Symposium Series, North America, Mar. 2012. Available at: www.aaai.org/ocs/index.php/SSS/SSS12/paper/view/4322/4691.

Lange, R., and Lange, X. (2012). Quality control in crowdsourcing: An Objective Measurement approach to identifying and correcting rater effects in the social evaluation of products and services. Paper presented at the AAAI Spring Symposium, Stanford University. March 26, 2012.

Lee M., Zhu W., Ulrich D.A. (2005) Many-Faceted Rasch Calibration of TGMD-2. AAHPERD aahperd.confex.com/aahperd/2005/preliminaryprogram/abstract_7094.htm

Lee Y.-W. & Kantor R. (2003) Investigating Differential Rater Functioning for Academic Writing Samples: An MFRM Approach. Educational Testing Service.

Li J. (2014) Examining genre effects on test takers' summary writing performance. Assessing Writing, 22,75-90.

Liao, P. M., Campbell, S.K. (2002) Comparison of Two Methods for Teaching Therapists to Score the Test of Infant Motor Performance. Pediatric Physical Therapy. 14, 4, 191-198.

Liao, P. M., Campbell, S.K. (2004) Examination of the Item Structure of the Alberta Infant Motor Scale. Pediatric Physical Therapy, 16, 1, 31-38.

Lim, G. S. (2010). Investigating prompt effects in writing performance assessment. Spaan Fellow Working Papers in Second or Foreign Language Assessment, 8, 95-116.

Linacre J.M. (1988) Rasch Analysis and Rank Ordering. Rasch Measurement Transactions, 2:4,  41.

Linacre J.M. (1991) Inter-rater reliability. Rasch Measurement Transactions, 5:3,  166.

Linacre J.M. (1991) Stress after Three Mile Island. Rasch Measurement Transactions, 5:4,  188.

Linacre J.M. (1992) Paired comparisons with ties. Rasch Measurement Transactions 9:2,  425.

Linacre J.M. (1993) Rasch-based generalizability theory. Rasch Measurement Transactions 7:1, 283-284.

Linacre J.M. (1994) Constructing measurement with a many-facet Rasch model. In M. Wilson (Ed.) Objective Measurement: Theory in Practice. Vol. II. Newark NJ: Ablex.

Linacre J.M. (1996) Generalizability Theory and Rasch Measurement, Chapter 5., In Engelhard G, & Wilson M. (Eds) Objective Measurement, Vol. 3. Newark NJ: Ablex.

Linacre J.M. (1997) Communicating Examinee Measures as Expected Ratings. Rasch Measurement Transactions, 11:1,  550-551.

Linacre J.M. (2001) Generalizability theory and Rasch measurement. Rasch Measurement Transactions 15:1,  806-7.

Linacre J.M. (2001) Paired Comparisons for Measuring Team Performance. Rasch Measurement Transactions 15:1, 812

Linacre J.M. (2002) Facets, Factors, Elements and Levels. Rasch Measurement Transactions, 16:2,  880.

Linacre J.M. (2002) Judging debacle in Pairs Figure Skating. Rasch Measurement Transactions 15:4, 839-40.

Linacre J.M., Engelhard, G., Jr., Tatum, D.S., & Myford, C.M. (1994). Measurement with judges: Many-faceted conjoint measurement. International Journal of Educational Research, 21(6), 569-577.

Linacre J.M., Wright B.D. (2002) Understanding Rasch Measurement: Construction of Measures from Many-facet Data. Journal of Applied Measurement, 3:4, 486-512.

Linacre J.M., Wright B.D. (2004) Construction of measures from many-facet data. Chapter 8 in E.V. Smith & R. M. Smith (Eds.) Introduction to Rasch Measurement. Maple Grove MN: JAM Press.

Linacre J.M., Wright B.D., Lunz, M.E. (1990) A Facets Model for Judgmental Scoring. Memo 61. MESA Psychometric Laboratory. University of Chicago. www.rasch.org/memo61.htm

Looney M.A. (1996) Figure skating fairness. Rasch Measurement Transactions, 10:2, 500.

Looney M.A. (2004) Evaluating Judge Performance in Sport. Journal of Applied Measurement 5:1, 31-47.

Ludlow L.H. & Haley S.M. (1996) Effect of context in rating of mobility activities in children with disabilities. Educational and Psychological Measurement, 56, 122-129.

Lumley T, Congdon P., Linacre J. (1999) Rater Variability. Rasch Measurement Transactions 12:4, 671.

Lumley T. (2002) Assessment criteria in a large-scale writing test: what do they really mean to the raters? Language Testing, 19, 3, 246-276

Lumley T., & McNamara, T.F. (1995). Reader characteristics and reader bias: Implications for training. Language Testing, 12, 54-71.

Lunz M.E. & Stahl J.A. (1993, April) The effect of rater severity on person ability measures: A Rasch model analysis. American Journal of Occupational Therapy, 47(4), 311-317.

Lunz M.E. & Stahl, J.A. (1990). Judge consistency and severity across grading periods. Evaluation and the Health Professions, 13, 425-444.

Lunz M.E. & Wright B.D. (1997) Latent Trait Models for Performance Examinations. In J. Rost & R. Langeheine (Hrsg.), Applications of latent trait and latent class models in the social sciences. Munster: Waxmann.

Lunz M.E. (2004?) Standardized oral examinations. Measurement Research Associates.

Lunz M.E., & Stahl, J.A. (1990). Judge consistency and severity across grading periods. Evaluation and the Health Professions, 13, 425-444.

Lunz M.E., Linacre J.M. (1998) Measurement designs using multi-facet Rasch modeling. Chapter 3 in Marcoulides G. (Ed.) Modern Methods for Business Research. New York: Lawrence Erlbaum.

Lunz M.E., Stahl, J.A., & Wright, B.D. (1996). The invariance of rater severity calibrations. In G. Engelhard, Jr., & M. Wilson (Eds.), Objective Measurement: Theory into Practice (Vol. 3, pp. 99-112). Norwood, NJ: Ablex.

Lunz M.E., Wright, B.D., & Linacre, J.M. (1990). Measuring the impact of judge severity on examination scores. Applied Measurement in Education, 3(4), 331-345.

Lynch B. & McNamara T.F. (1998) Using G-theory and many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing 15: 158-180.

MacMillan, P. (2000) Classical, generalizability and mutlifaceted Rasch detection of interrater variablity in large, sparse data sets. Journal of Experimental Education, 68, 167-190.

MacMillan, P. (2000) Simultaneous Measurement of Reading Growth, Gender, and Relative-Age Effects: Many-Faceted Rasch Approach to CBM Reading Scores. Journal of Applied Measurement 1:4, 393-408.

Magalhães, L., Fisher, A.G., Bernspång, B., Linacre, J.M. (1996) Cross-cultural assessment of functional ability. Occupational Therapy Journal of Research, 16, 45-63.

Malec. J.F. (2004) Comparability of Mayo-Portland Adaptability Inventory ratings by staff, significant others and people with acquired brain injury. Brain Injury, 18, 6, 563-575

Maris, G. & Bechger, T. (2003) Two methods for the practical analysis of rating data. CITO Measurement and Research Department Report 2003-1.

McCutcheon L.E., Lange R., Houran J. (2002) Conceptualization and measurement of celebrity worship. British Journal of Psychology, 93, 1, 67-87

McManus I.C, Thompson M, Mollon J. (2006) Assessment of examiner leniency and stringency ('hawk-dove effect') in the MRCP(UK) clinical examination (PACES) using multi-facet Rasch modelling. BMC Medical Education 2006, 6:42. www.biomedcentral.com/1472-6920/6/42/

McNulty, M.C., & Fisher, A.G. (2001) Validity of using the Assessment of Motor and Process Skills to estimate overall home safety in persons with psychiatric conditions. American Journal of Occupational Therapy, 55, 649-655.

Merritt B.K., Fisher A.G. (2003) Gender differences in the performance of activities of daily living. Arch Phys Med Rehabil. 84, 12, 1872-7

Micele R., Settanni M., Vidotto, G. (2008) Measuring change in training programs: An empirical illustration. Psychology Science Quarterly, Volume 50 (3), 433-447

Monsaas J, Engelhard G (1996) Examining Changes in the Home Environment with the Rasch Measurement Model. In G. Engelhard, Jr., & M. Wilson (Eds.), Objective Measurement: Theory into Practice (Vol. 3, pp. 127-142). Norwood, NJ: Ablex.

Mulqueen C., Baker D., Dismukes R.K. (2000) Using Multifacet Rasch Analysis to Examine the Effectiveness of Rater Training. SIOP.

Mulqueen C., Baker D., Dismukes R.K. (2002) Pilot Instructor Rater Training: The Utility of the Multifacet Item Response Theory Model. The International Journal of Aviation Psychology, 12(3), 87 - 303

Myford C.M. (2002) Investigating Design Features of Descriptive Graphic Rating Scales. Applied Measurement in Education, 15, 2, 187-215

Myford C.M. (2004). [Review of the book Automated essay scoring: A cross-disciplinary perspective]. Journal of Applied Measurement, 5 (1), 111-114

Myford C.M. & Engelhard, George, Jr. (2001) Examining the Psychometric Quality of the National Board for Professional Teaching Standards Early Childhood/Generalist Assessment System. Journal of Personnel Evaluation in Education, v15 n4 p253-85 Dec 2001

Myford C.M. & Mislevy, R.J. (1995). Monitoring and improving a portfolio assessment system. ETS Center for Performance Assessment Report No. MS 94-05. Princeton, NJ: Educational Testing Service.

Myford C.M. & Sims-Gunzenhauser, A. (2004). The evolution of large-scale assessment programs in the visual arts. In E. W. Eisner & M. D. Day (Eds.). Handbook of research and policy in art education (pp. 637-666). Mahwah, NJ: Lawrence Erlbaum Associates.

Myford C.M. & Wolfe, E.W. (2000). Monitoring sources of variability within the Test of Spoken English assessment system (TOEFL Research Report No. 65). Princeton, NJ: TOEFL Research Program, Educational Testing Service.

Myford C.M. & Wolfe, E.W. (2000). Strengthening the ties that bind: Improving the linking network in sparsely connected rating designs (TOEFL Technical Report No. 15). Princeton, NJ: TOEFL Research Program, Educational Testing Service. www.ets.org/Media/Research/pdf/RR-00-09.pdf

Myford C.M. & Wolfe, E.W. (2002). When raters disagree, then what: Examining a third-rater discrepancy resolution procedure and its utility for identifying unusual patterns of ratings. Journal of Applied Measurement, 3, 300-324.

Myford C.M. & Wolfe, E.W. (2003) Detecting and Measuring Rater Effects using Many-Facet Rasch Measurement: Part I. Journal of Applied Measurement, 4(4), 386-421 and in E. V. Smith, Jr. & R. M. Smith (Eds.), Introduction to Rasch measurement (pp. 460-517). Maple Grove, MN: JAM Press.

Myford C.M. & Wolfe, E.W. (2004) Detecting and Measuring Rater Effects using Many-Facet Rasch Measurement: Part II. Journal of Applied Measurement, 5(2), 189-227 and in E. V. Smith, Jr. & R. M. Smith (Eds.), Introduction to Rasch measurement (pp. 518-574). Maple Grove, MN: JAM Press.

Myford C.M., Marr D., Linacre J.M. (1996) Reader Calibrations and Its Potential Role in Equating for the Test of Written English. TOEFL Research Report RR-52. (ETS Center for Performance Assessment Report No. MS 95-02). Princeton, NJ: Educational Testing Service.

Nakamura Y, (2003) Oral proficiency assessment: Dialogue test and multilogue test. JALT Pan-SIG Conference Proceedings. www.jalt.org/pansig/2003/HTML/Nakamura.htm

Nilsson, I, Löfgren, B, Fisher, A., Bernspång, B. (2005) Focus on leisure repertoire in the oldest old, the Umeå 85+ study.

Nilsson, I., & Fisher, A.G. (2006) Evaluation of leisure activities in the oldest old. Manuscript submitted for publication. Scandinavian Journal of Occupational Therapy.

North B. & Schneider G. (1998). Scaling descriptors for language proficiency scales. Language Testing, 15, 217-263.

North B. (2000). The development of a common framework scale of language proficiency. New York: Peter Lang.

North, B. (2008). The CEFR levels and descriptive scales. In L. Taylor & C. J. Weir (Eds.), Multilingualism and assessment: Achieving transparency, assuring quality, sustaining diversity – Proceedings of the ALTE Berlin Conference May 2005 (pp. 21–66). Cambridge, UK: Cambridge University Press.

North, B., & Jones, N. (2009). Further material on maintaining standards across languages, contexts and administrations by exploiting teacher judgment and IRT scaling. Strasbourg: Language Policy Division.

Nygård, L., Bernspång, B., Fisher, A.G., Winblad, B. (1994) Comparing motor and process ability of persons with suspected dementia in home and clinic settings. American Journal of Occupational Therapy, 48, 689-696.

O'Loughlin K. (2000) The impact of gender in the IELTS oral interview. IELTS Research 3, 1

O'Neill, T.R., & Lunz, M.E. (2000). A method to study rater severity across several administrations. In M. Wilson & G. Engelhard, Jr. (Eds.), Objective Measurement: Theory into Practice (Vol. 5, pp. 135-146). Stamford, CT: Ablex.

O'Sullivan B. (2002) Investigating variability in a test of second language writing ability. Cambridge Esol Research Notes, 7, 14-17

Oakley F., Duran L., Fisher A.G., Merritt, B. (2003) Differences in activities of daily living motor skills of persons with and without Alzheimer's disease. Australian Occupational Therapy Journal, 50, 2, 72-78

Oakley F., Khin N.A., Parks R., Bauer L., Sunderland T. (2002) Improvement in activities of daily living in elderly following treatment for post-bereavement depression. Source: Acta Psychiatrica Scandinavica, 105, 3, 231-234

Oakley F., Sunderland T. (1997) Assessment of Motor and Process Skills as a Measure of IADL Functioning in Pharmacologic Studies of People With Alzheimer's Disease: A Pilot Study. International Psychogeriatrics, 9: 197-206

Pan, A.W., & Fisher, A.G. (1994) The Assessment of Motor and Process Skills of persons with psychiatric disorders. American Journal of Occupational Therapy, 48, 775-780.

Pape T. L.-B., Heinemann A.W., Kelly J.P., Hurder A.G. (2005) A measure of neurobehavioral functioning after coma. Part I: Theory, reliability, and validity of the Disorders of Consciousness Scale. Journal of Rehabilitation Research and Development, 42, 1, 1-18.

Park, S., Fisher, A.G., Velozo, C.A. (1994) Using the Assessment of Motor and Process Skills to compare occupational performance between clinic and home settings. American Journal of Occupational Therapy, 48, 697-709.

Park, T. (2004) An investigation of an ESL placement test using Many-Facet Rasch Measurement. Teachers College, Columbia University Working Papers in TESOL and Applied Linguistics, 4, 1.

Parra-López, E. & Oreja-Rodríguez J.R. (2014) Evaluation of the competitiveness of tourism zones of an island destination: An application of a Many-Facet Rasch Model (MFRM). Journal of Destination Marketing and Management, 3(2), 114-121

Patomella A.-H., Caneman G., Kottorp A., Tham K. (2004) Identifying scale and person response validity of a new assessment of driving ability.Scandinavian Journal of Occupational Therapy, 11, 2, 70-77.

Paulukonis, S.T., Myford, C.M., & Heller, J.I. (2000). Formative evaluation of a performance assessment scoring system. In M. Wilson & G. Engelhard, Jr. (Eds.) Objective measurement: Theory into practice (Vol. 5, pp. 15-40). Stamford, CT: Ablex Publishing.

Peeters M.J., Churchwell M.D., Mauro L.S., Cappelletty D.M., Stone G.E. (2010). A student-inclusive pharmacotherapeutic textbook selection process. Currents in Pharmacy Teaching and Learning, 2(1), 31-38.

Peeters M.J., Sahloff E.G., Stone G.E. (2010). A standardized rubric for student presentations. American Journal of Pharmaceutical Education, 74(9), article 171.

Peeters M.J., Serres M.L., Gundrum T.E. (2014). Improving reliability of a resident interview process. American Journal of Pharmaceutical Education, 77(8), article 168.

Pollitt A. & Elliott G. (2003) Monitoring and investigating comparability: a proper role for human judgement. UCLES.

Pollitt, A. (1997). Rasch measurement in latent trait models. In C. Clapham, & D. Corson (Eds.), Encyclopedia of language and education Vol. 7: Language testing and assessment (pp. 243-253). Netherlands: Kluwer Academic.

Pollitt, A., & Hutchinson, C. (1987). Calibrated graded assessment: Rasch partial credit analysis of performance in writing. Language Testing, 4, 72-92.

Pomplun, M. & Custer, M. (2004) The Equivalence of Three Data Collection Methods with Field Test Data: A FACETS Application, Journal of Applied Measurement 5:3,  319-327

Puderbaugh, J.K., Fisher, A.G. (1992) Assessment of motor and process skills in normal young children and children with dyspraxia. Occupational Therapy Journal of Research, 12, 195-216.

Ravaud, J.-F., Delcey M., Yelnik A. (1999) Construct validition of the Functional Independence Measure (FIM): questioning the unidimensionality of the scale and the "value" of FIL scores. Scandinavian Journal of Rehabilitation Medicine, 31, 1, 31-41

Rehfeldt, T.K. (1994) Ranks in sensory measurement. Rasch Measurement Transactions 8:2,  368.

Rexroth, P., Fisher, A.G., Merritt, B.K., Gliner, J. (2004) Ability differences in persons with unilateral hemispheric stroke. Manuscript submitted for publication.

Robinson, S.E., & Fisher, A.G. (1999) Functional and cognitive differences between cognitively-well people and people with dementia. British Journal of Occupational Therapy, 62, 466-471.

Révész, A. (2009). Task complexity, focus on form, and second language development. Studies in Second Language Acquisition, 31(3), 437-470.

Sampson S.O. & Bradley K.D. (2003) Rasch Analysis of Educator Supply and Demand Rating Scale Data. University of Kentucky.

Schatz, R., Belloto R.J.Jr., White D.B., Bachmann, K. (2003) Provision of Drug Information to Patients by Pharmacists: The Impact of the Omnibus Budget Reconciliation Act of 1990 a Decade Later. American Journal of Therapeutics. 10(2):93-103

Schulman J.A., Trujillo M.J., Karney B.R. (2001) Electronic Notes: Facets: Computer Software for Evaluating Assessment Tools. American Journal of Health Behavior, 25, 1, 25-77

Schumacker, R.E. (1999) Many-facet Rasch Analysis with Crossed, Nested, and Mixed Designs. Journal of Outcome Measurement 3:4. 323-338.

Schumacker, R.E. & Lunz M.E. (1997) Interpreting the Chi-Squared Statistics Reported in the Many-Faceted Rasch Model. Journal of Outcome Measurement 1:3. 239-257.

Sellers, S.W., Fisher, A.G., & Duran, L. (2001) Validity of the Assessment of Motor and Process Skills with students who are visually impaired. Journal of Visual Impairment and Blindness, 95, 164-167.

Sluijmans D. & Moerkerke G. (1999) Student involvement in performance assessment: A research project. European Journal of Open and Distance Learning.

Smith E.V.Jr. & Kulikowich J.M. (2004) An Application of Generalizability Theory and Many-Facet Rasch Measurement Using a Complex Problem-Solving Skills Assessment. Educational and Psychological Measurement, 64, 4, 617-639

Stahl J.A. & Lunz, M.E. (1991) Answering the "Call for a New Psychometrics". Rasch Measurement Transactions 5:1,  127

Stahl J.A. & Lunz, M.E. (1996) Judge Performance Reports: Media and Message. In G. Engelhard, Jr., & M. Wilson (Eds.), Objective Measurement: Theory into Practice (Vol. 3, pp. 113-126). Norwood, NJ: Ablex.

Stahl J.A. (1994) What Does Generalizability Theory offer that Many-Facet Rasch Measurement cannot duplicate? Rasch Measurement Transactions, 8:1,  342-3.

Stahl J.A., Shumway, R., Bergstrom, B., & Fisher, A.G. (1997) On-line performance assessment using rating scales. Journal of Outcome Measurement, 1, 173-191.

Stauffer, L.M., Fisher, A.G., & Duran, L. (2000) ADL performance of black and white Americans on the Assessment of Motor and Process Skills. American Journal Occupational Therapy, 54, 607-613.

Stemler, S. (2004) A Comparison of Consensus, Consistency, and Measurement Approaches to Estimating Interrater Reliability. Practical Assessment, Research & Evaluation, 9(4).

Tatum, D. (2003) Assessing E-Portfolios. National Communication Association. www.eport2passport.com/presentations/nca_2003/2003NCAPaperEportfolio.doc

Taylor, L. & Jones N. (2001) Revising the IELTS Speaking Test. Research Notes of the University of Cambridge Local Examination Syndicate English as a Second Language. 4, 9-11. Feb.

Tham K., Bernspaang B., Fisher A.G. (1999) Development of the assessment of awareness of disability. Scandinavian Journal of Occupational Therapy, 6, 184 - 190

Tham K., Ginsburg, E., Fisher, A.G., & Tegnér, R. (2001) Training to improve awareness of disabilities in clients with unilateral neglect. American Journal of Occupational Therapy, 55, 46-54.

Trace J., Janssen G., Meier V. (2105) Measuring the impact of rater negotiation in writing performance assessment. Language Testing, July 28, 2015

Twing J.S., Nichols P.D., Harrison I. (2003) The comparability of paper-based and image-based marking of a high-stakes, large-scale writing assessment. International Association for Educational Assessment.

Tyndall, B., & Kenyon, D. M. (1995). Validation of a new holistic rating scale using Rasch multifaceted analysis. In A. Cumming & Berwick (Eds), Validation in language testing (pp. 39-57). Clevedon, England: Multilingual Matters.

Upshur, J. A., & Turner, C. E. (1999). Systematic effects in the rating of second-language speaking ability: Test method and learner discourse. Language Testing, 16, 82-111.

Uto, M. (2021). A multidimensional generalized many-facet Rasch model for rubric-based performance assessment. Behaviormetrika, 48(2), 425-457. doi.org/10.1007/s41237-021-00144-w

Vianello, M., & Robusto, E. (2010). The Rasch Models in the Analysis of the Go/No Go Association Task. Behavior Research Methods, 42(4), 944-956. sites.google.com/site/michelangelovianello/rcv2008-pdf-1/VR.2010.pdf

Wang J.C., Chi C.C., Lian W.Y. (2005) The Application of Rasch Poisson counts model to construct norm-reference testing on the Badminton Skill Tests in Primary schools.  Journal of Physical Education and Sports, 16, 1, 73-83.

Wang N. (2003) Examining reliability and validity of job analysis survey data. Journal of Applied Measurement, 4, 4, 358-369

Wang N., Schnipke D., Witt E.A. (2005) Use of Knowledge, Skill, and Ability Statements in Developing Licensure and Certification Examinations. Educational Measurement: Issues and Practice, 24, 1, 15

Wang W.-C., Cheng Y.-Y. (1999) A Multi-Facet Rasch Analysis of the College Teacher Evaluation Inventory. Paper presented at AARE.

Watson J., Callingham R. (2003) Statistical literacy: A complex hierarchical construct. Statistics Education Research Journal, 2, 2, 3-46.

Weigle S. (1998) Using FACETS to model rater training effects. Language Testing 15: 263-287.

Wigglesworth G. (1993) Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction. Language Testing 10: 305-335.

Wigglesworth G. (1994) The investigation of rater and task variability using multi-faceted measurement. Report for the National Centre for English Language Teaching and Research, Macquarie University.

Williams E.J. (1999) Developmental Reading Assessment: Reliability Study. Pearson Learning Group.

Wilson M, & Wang W. (1995) Complex composites: Issues that arise in combining different modes of assessment. Applied Psychological Measurement, 19, 51-71.

Wilson M.R. & Case H. (2000) An examination of variation in rater severity over time: a study of rater drift. In M. Wilson and G. Engelhard, Jr. (Eds.) Objective Measurement: Theory into practice: Vol. 5 (pp. 113-134). Stamford CT: Ablex.

Winke P., Gass S., Myford C. (2013) Raters’ L2 background as a potential source of bias in rating oral performance. Language Testing 30 231-252.

Wolfe E.W. (1997). The Relationship between essay reading style and scoring proficiency in a psychometric scoring system. Assessing Writing, 4, 83-106.

Wolfe E.W. (2004) Identifying rater effects using latent trait models. Psychology Science, 46, 1, 35-51.

Wolfe E.W., & Chiu C. W. T. (1997) Detecting rater effects with a multi-faceted rating scale model. East Lansing MI: National Center for Research on Teacher Learning. ERIC ED408 324.

Wolfe E.W., & Dobria, L. (2008). Applications of the multifaceted Rasch model. In J. W. Osborne (Ed.), Best practices in quantitative methods (pp. 71–85). Los Angeles: Sage.

Wolfe E.W., & Gitomer, D. (2001). The influence of changes in assessment design on the psychometric quality of scores. Applied Measurement in Education, 14, 91-107.

Wolfe E.W., Chiu, W. T., & Myford, C. M. (1999). The manifestation of common rater effects in multi-faceted Rasch analyses (RR-97-02). Princeton, NJ: Educational Testing Service, Center for Performance Assessment.

Wolfe E.W., Chiu, W. T., & Myford, C. M. (2000). Detecting rater effects in simulated data with a multi-faceted Rasch rating scale model. In M. Wilson & G. Engelhard, Jr. (Eds.), Objective measurement: Theory into practice (Vol. 5, pp. 147-164). Stamford, CT: Ablex Publishing Co.

Wolfe E.W., Kao, C.W., & Ranney, M. (1998). Cognitive differences in proficient and nonproficient essay scorers. Written Communication, 15, 465-492.

Wolfe E.W., Moulder, B.C., Myford, C. (2001) Detecting Differential Rater Functioning over Time (DRIFT) Using the Rasch Multi-faceted Rating Scale Model. Journal of Applied Measurement 2:3, 256-280.

Wolfe E.W., Myford, C. M., Engelhard, G. E., & Manalo, J. R. (2007). Monitoring reader performance and DRIFT in the AP English Literature and Composition examination using benchmark essays. (College Board Research Report No. 2007-2). New York: The College Board.

Wright B.D. (1993) Discrete-time survival analysis. Rasch Measurement Transactions 7:3,  307

Yamauchi K. (1999) Comparing Many-facet Rasch Model and ANOVA model: Analysis of ratings of essays [in Japanese]. Japanese Journal of Educational Psychology. Vol 47(3), Sep., 383-392.

Zhang Y., Elder C. (2010) Judgments of oral proficiency by non-native and native English speaking teacher raters: Competing or complementary constructs? Language Testing 28(1) 31-50

Zhu W., Cole, E. L. (1996). Many-faceted Rasch calibration of a gross motor instrument. Research Quarterly for Exercise and Sport, 67(1), 24-34.

Zhu W., Ennis C.D., Chen A. (1998) Many-faceted Rasch modeling experts' judgement in test development. Measurement in Physical Education and Exercise Science, 221-39.

Zhu W., Updyke, W.F, & Lewandowski, C. (1997). Post-Hoc Rasch Analysis of Optimal Categorization of an Ordered-Response Scale. Journal of Outcome Measurement, 1(4), 286-304.

 

Useful background:

Andrich D. (1978) A rating scale formulation for ordered response categories. Psychometrika, 43, 561-573

Edgeworth F.Y. (1890) The element of chance in competitive examinations. Journal of the Royal Statistical Society 53, 460-75 and 644-63.

Lumley, T. (2005). Assessing second language writing: The rater’s perspective. Frankfurt, Germany: Lang.

Masters G.N. (1982) A Rasch model for partial credit scoring. Psychometrika 47, 149-174.

Saal, F.E., Downey, R.G. and Lahey, M.A (1980) Rating the Ratings: Assessing the Psychometric Quality of Rating Data, Psychological Bulletin, 88(2), 413-428.


Help for Facets (64-bit) Rasch Measurement and Rasch Analysis Software: www.winsteps.com Author: John Michael Linacre.
 

Facets Rasch measurement software. Buy for $149. & site licenses. Freeware student/evaluation Minifac download
Winsteps Rasch measurement software. Buy for $149. & site licenses. Freeware student/evaluation Ministep download

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn, 2024 George Engelhard, Jr. & Jue Wang Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes Statistical Analyses for Language Testers (Facets), Rita Green Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind Rasch Measurement: Applications, Khine Winsteps Tutorials - free
Facets Tutorials - free
Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse Rasch Measurement Theory Analysis in R, Wind, Hua Applying the Rasch Model in Social Sciences Using R, Lamprianou El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Rasch Models for Measurement, David Andrich Constructing Measures, Mark Wilson Best Test Design - free, Wright & Stone
Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias Diseño de Mejores Pruebas - free, Spanish Best Test Design A Course in Rasch Measurement Theory, Andrich, Marais Rasch Models in Health, Christensen, Kreiner, Mesba Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen
As an Amazon Associate I earn from qualifying purchases. This does not change what you pay.

facebook Forum: Rasch Measurement Forum to discuss any Rasch-related topic

To receive News Emails about Winsteps and Facets by subscribing to the Winsteps.com email list,
enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from Winsteps.com
The Winsteps.com email list is only used to email information about Winsteps, Facets and associated Rasch Measurement activities. Your email address is not shared with third-parties. Every email sent from the list includes the option to unsubscribe.

Questions, Suggestions? Want to update Winsteps or Facets? Please email Mike Linacre, author of Winsteps mike@winsteps.com


State-of-the-art : single-user and site licenses : free student/evaluation versions : download immediately : instructional PDFs : user forum : assistance by email : bugs fixed fast : free update eligibility : backwards compatible : money back if not satisfied
 
Rasch, Winsteps, Facets online Tutorials

Coming Rasch-related Events
Jan. 17 - Feb. 21, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025 On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025 On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Apr. 21 - 22, 2025, Mon.-Tue. International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
May 16 - June 20, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

 

Our current URL is www.winsteps.com

Winsteps® is a registered trademark