In this paper, we improve the speaker independent emotion classification of the CASIA Mandarin emotional speech corpus, which is provided by Chinese-LDC covering four basic emotions, angry, happy, neutral, and sad. We achieve this by restoring the human processing on emotion perception with a three layered model. The three layered model is constructed with acoustic features in the bottom layer, semantic primitives in the middle layer, and emotion dimensions in the top layer. To implement the proposed system, we first investigate the optimal acoustic feature set that is related to each emotion dimension, then mapping these acoustic features to emotion dimensions through the estimated semantic primitives by using Fuzzy Inference System (FIS). In addition, with the highly predicted emotion dimensions, emotional classification procedure is addressed using the knowledge of commonalities and differences of humans emotion perception. The experimental results show that improved estimation performance compared to previous study is furnished.