Visual crowding is a deterioration of object recognition in periphery vision due to the presence of neighbored objects. Instead of being fully suppressed, recent studies have shown that crowded high-level information such as words meaning, number and facial expression gets through to conscious level in crowding (Fischer, Whitney, 2011; Huckauf, Knops, Nuerk, Willmes, 2008; Kouider, Berthet, Faivre, 2011; Peng, Zhang, Chen, Zhang, 2013; Yeh, He, Cavanagh, 2012). While it is well documented that the gist of a natural scene can be extracted rapidly (e.g., Greene & Oliva, 2009) and even without focal attention (Li, VanRullen, Koch, Perona, 2002), no study so far, to our knowledge, has investigated whether gist can be extracted in crowding. To test this, a scene categorization task was conducted. The target scene, either surrounded by four other scenes (0, 2 or 4 shared category with the target) or presented alone, appeared at three eccentricities (9°, 11°, 13°) to the left or right of the fixation for 100ms. Participants categorized whether or not the target scene belonged to a basic-level category (building, highway, forest, mountain) specified at the beginning of each block. Preliminary data showed that both hit rate and false alarm became higher as the number of category-share flankers increased, indicating a confusion of target and flankers. More importantly, crowding significantly impaired scene categorization, but even so, accuracies at all three eccentricities were higher than .71, which was significantly higher than chance level (.50). Furthermore, reaction times were not affected by the presence of nearby scenes. Taken together, the study indicates that scene gist can largely get through crowding even though its extraction is significantly impaired. Further study will examine the effects at superordinate-level (e.g., naturalness, indoor/outdoor) and with scrambled flankers.