|
|
The estimation accuracy comparison of different types of rating designs under many-facet Rasch model |
CHEN Qinglin1,2, YAN Desheng3, LI Guangming1 |
1 School of Psychology, Center for Studies of Psychological Application, South China Normal University,Guangzhou 510631; 2 Mingde Primary School of Baiyun District, Guangzhou 510407; 3 Inner Mongolia Minzu Preschool Education college, Ordos 017000 |
|
|
Abstract The designs of many-faceted Rasch model analysis generally include cross, nested and mixed designs. This simulation study compares the results of many-facet Rasch analyses based on crossed nested and mixed designs with the three different totals of subjects -- 50, 200 and 600. These designs and totals of subjects are evaluated in terms of their impact on the estimation accuracy of three parameters -- rater severity, item difficulty and examinee ability. First, R software was used to simulate 100 batches of cross-designed data in three different totals of subjects. Second, nested-designed and mixed-design data were created by making the necessary changes in the simulated cross-designed data set. Finally, FACET software was used to estimate the parameters. The results of three different totals of subjects showed that :(1) The accuracy of rater severity parameters in cross design was better than that in nested and mixed designs, and the results of nested and mixed designs are similar. (2) The accuracy of item difficulty parameters and the mean parameter of subject ability in nested design was better than that in cross and mixed designs; (3) In three designs, the accuracy of all parameters is good except the standard deviation parameter of the subject ability.
|
|
|
|
|
[1] 姚若松, 赵葆楠, 刘泽, 苗群鹰. (2013). 无领导小组讨论的多侧面Rasch模型应用.心理学报, (09), 1039-1049. [2] Fishman, G. S. (1972). Bias considerations in simulation experiments.Operations Research, 20(4), 785-790. [3] He T. H., Gou W. J., Chien Y. C., Chen I. S., & Chang S. M. (2013). Multi-faceted Rasch measurement and bias patterns in EFL writing performance assessment.Psychological Reports, 112(2), 469-485. [4] Hombo C. M., Donoghue J. R., & Thayer D. T. (2001). A simulation study of the effect of rater designs on ability estimation.Educational Testing Service Research Report Series, 2001(1), 1-41. [5] Ilhan, M. N. (2016). A comparison of the results of many-facet Rasch analyses based on crossed and judge pair designs.Educational Sciences Theory and Practice, 2016(2), 579-601. [6] Linacre J. M.(1989). Many-facet Rasch measurement. Chicago, IL: MESA Press. [7] Lunz, M. E., & Stahl, J. A. (1993). The effect of rater severity on person ability measure: A Rasch model analysis.The American Journal of Occupational Therapy, 47(4), 311-317. [8] Putka D. J., Le H., McCloy R. A., & Diaz T. (2008). Ill-structured measurement designs in organizational research: Implications for estimating interrater reliability.Journal of Applied Psychology, 93(5), 959-981. [9] Schaefer, E. (2008). Rater bias patterns in an EFL writing assessment.Language Testing, 25(4), 465-493. [10] Schumacker R. E. (1999). Many-facet Rasch analysis with crossed, nested, and mixed designs.Journal of Outcome Measurement, 3(4), 323-338. [11] Sinharay, S., & Holland, P. W. (2007). Is it necessary to make anchor tests miniversions of the tests being equated or can some restrictions be relaxed? Journal of Educational Measurement, 44(3), 249-275. [12] Wang, W. C., & Qiu, X. L. (2013). A multidimensional and multilevel extension of a random-effect approach to subjective judgment in rating scales.Multivariate Behavioral Research, 48(3), 398-427. [13] Wang, Z. & Yao, L. (2013). The effects of rater severity and rater distribution on examinees' ability estimation for constructed-response items.ETS Research Report Series, 2013(2), 1-22. [14] Wind, S. A., & Jones, E. (2019). The effects of incomplete rating designs in combination with rater effects.Journal of Educational Measurement, 56(1), 76-100. [15] Winke P., Gass S., & Myford C. (2013). Raters' L2 background as a potential source of bias in rating oral performance.Language Testing, 30(2), 231-252. |
|
|
|