The purpose of this study was to compare and evaluate five online pretest item calibration/scaling methods in computerized adaptive testing (CAT): (1) the marginal maximum likelihood estimate with one-EM cycle (OEM); (2) the marginal maximum likelihood estimate with multiple EM cycles (MEM); (3) Stocking's Method A (M. Stocking, 1988); (4) Stocking's Method B (M. Stocking, 1988); and (5) the BILOG/Prior method. The five methods were evaluated in terms of item parameter recovery under three different sample size conditions (300, 1,000, and 3,000). The MEM method appears to be the best choice among the methods used in this study because it produced the smallest parameter estimation errors for all sample size conditions. Stocking's Method B also worked very well, but it requires anchor items, which would make test lengths longer. The BILOG/Prior method did not seem to work with small sample sizes. Until more appropriate ways of handling the sparse data with BILOG are devised, the BILOG/Prior method may not be a reasonable choice. Because Stocking's Method A has the largest weighted total error, as well as a theoretical weakness (i.e., treating estimated ability as true ability), there appears to be little reason to use it. The MEM method should be preferred to the OEM method unless amount of time involved in iterative computation is a great concern. Otherwise, the OEM method and the MEM method are mathematically similar, and the OEM method produces larger errors than the MEM method. (Contains 2 tables, 3 figures, and 16 references.) (Author/SLD)