Analyzing large data sets requires proper understanding of the data in advance. This would help domain experts to influence the data mining process and to properly evaluate the results of a data mining application. In this paper, we introduce an algorithm to identify anomalies in the data. We also propose an approach to include the results of data characteristics checking in a data mining application. The application, reported in this paper, involves developing a disease model from gene expression data using machine learning techniques. We demonstrate how: (i) simple models can be generated from a large set of attributes and (ii) the structure of the models change, when potentially anomalous cases are removed.