Date of Completion


Embargo Period



Helen J. Rogers, Hariharan Swaminathan, Aarti Bellara

Field of Study

Educational Psychology


Master of Arts

Open Access

Open Access


In the context of educational measurement, a test item is identified as differentially functioning across groups when the probability an examinee’s response to it depends on group membership. Methods for detecting uniform and nonuniform DIF have been studied and examined over decades to improve the validity of tests. The current study focused on examining and comparing the effectiveness of six DIF detection methods: the Mantel-Haenszel (MH) procedure, the Logistic Regression procedure, the multiple indicators multiple causes (MIMIC) model, the item response theory likelihood-ratio test (IRT-LR), Lord’s IRT-based Wald test and a Randomization Test based on a R-square change statistic. A simulation study was conducted in which the factors manipulated were the percentage of DIF items (%DIF), sample size (number of examinees in each group), test length (number of items in test), type and magnitude of DIF, and the mean ability difference between groups of examinees.

The results showed that the MIMIC model had the greatest power in detecting uniform DIF items, as well as nonuniform DIF items with longer tests. The logistic regression method and the randomization test are quite efficient in detecting uniform DIF items, but the randomization test only applies when the two groups of people have the same mean ability. The IRT methods are more useful for detecting nonuniform DIF items. The percentage of DIF items does not have much effect on the power of each method, while most methods are better when detecting large magnitude DIF than small, and are better when the sample size for each group is large.

Major Advisor

Helen J. Rogers