This paper presents an extension to our previously developed fusion framework [10] involving a depth camera and an inertial sensor in order to improve its view invariance aspect for real-time human action recognition applications. A computationally efficient view estimation based on skeleton joints is considered in order to select the most relevant depth training data when recognizing test samples. Two collaborative representation classifiers, one for depth features and one for inertial features, are appropriately weighted to generate a decision making probability. The experimental results applied to a multi-view human action dataset show that this weighted extension improves the recognition performance by about 5% over equally weighted fusion deployed in our previous fusion framework.