Holcombe, Kanwisher, and Treisman (2001) reported that apprehending the relative order of a 4-letter sequence is easier in a single presentation than in a cycling presentation, in which a 4-letter sequence is presented many times. They termed this phenomenon the midstream order deficit (MOD) and concluded that MOD reflected the difficulty of apprehending relative order in an undifferentiated stream. In this study, using Japanese Kanji characters for stimuli, we investigated how relative order was encoded in the occurrence of MOD and whether the heterogeneity of items in visual complexity would heighten the correct report of relative order in the cycling presentation. In Experiments 1 and 2, four letters had different pronunciations and participants were required to report relative order verbally. On the other hand, in Experiments 3 and 4, four characters had the same pronunciation and participants were required to report relative order in connecting the printed four letters by arrows. As a result, the difference in relative order accuracy between the single and the cycling presentations did not depend on the heterogeneity, although the higher relative order accuracy in the heterogeneous condition compared with the homogeneous condition was observed. In Experiments 1 and 2, the lower relative order accuracy in the cycling presentation comparing with the single presentation showed the occurrence of MOD, which was not explained by the higher letter identity accuracy in the cycling presentation than in the single presentation. In contrast, the opposite difference in relative order accuracy between the single and the cycling presentations was found in Experiments 3 and 4. Our results suggest that MOD can occur only when relative order is phonologically encoded and that phonological and visual codes have different properties in order encoding.