The focus of this study was to evaluate the effect of different approaches of anchor test construction on the accuracy of equating for test adaptation. The term “equating” in cross-lingual studies refers to a statistical procedure that adjusts test scores from the source language (SL) version of the test and the target language (TL) version of the test using a set of common translated items of the same examination so that scores can be interpreted interchangeably. In each test, the verbal section and the non-verbal section of the test were investigated. The Levine Linear equating method and Mean-Sigma equating method were utilized with an anchor item design and an equivalent group design, respectively. The double linking method and the standard errors of equating method were used to evaluate the accuracy of the equating for different anchor tests. The average difference between the two anchor tests for the verbal and non-verbal sections of the test over three target language groups reflected the degree of overall instability that existed in the cross-lingual equating process. These differences were associated with real and systematic variance that underlies the cross-lingual equating process. Scoring outcomes of an actual certification examination with a sample of nearly 9,000 examinees taking both SL and TL versions of the test data set were utilized for this research study. Findings indicated that the differences between the double linking chains for each anchor test were greater for the verbal section than the non-verbal section of the test. The results of the double linking method supported the notion that different choices for anchor items can result in different equatings and using items with the more stable parameters was a better choice than using items with less DIF. The results of MSEE did not show large differences between the parameter and the DIF methods of anchor item selection. However, the MSEE differences were in the same direction as the double linking method differences. That is, the parameter method was superior to the DIF method using both criteria.
