]]>
https://www.gepsoft.com/blog/genexprotools-5-mini-release-2-whats-new
Wed, 19 Feb 2014 19:07:35 GMTDr. Candida Ferreiratag:www.gepsoft.com,1970-01-01:e078cf5dfb563908fd81ee641862fb59/7633b7a59a11cc8d415e9a221ee78a5eVariable Importance Chart
The importance of each variable in a model (what we call Variable Importance) is evaluated and shown in the Data Panel, both in the Statistics Report and in the Variable Importance Chart:

Because of its centrality in model assessment and analysis, we are making the Variable Importance Chart more accessible both through an icon and menu shortcuts. So now, instead of having to go to the Data Panel, then selecting "Model Variables" on top and then selecting "Statistics" in the charts on the bottom in order to access the Variable Importance Chart, you can just click the new icon (the Gold Bars icon) to take you directly to the Variable Importance Chart of the active model.

We've implemented this new feature for all modeling categories (Regression, Classification, Logistic Regression, Time Series Prediction, and Logic Synthesis) with both the new icon and menu shortcuts both in the Data Menu and Results/Predictions Menu. This new feature is part of the new mini-release "New Project: Cross-Validation, Var Importance & More" and will be launched shortly with GeneXproTools 5.0 MR2.

]]>
https://www.gepsoft.com/blog/variable-importance-chart
Tue, 18 Feb 2014 19:51:02 GMTDr. Candida Ferreiratag:www.gepsoft.com,1970-01-01:e078cf5dfb563908fd81ee641862fb59/7f28a892212bb6b59734647492122581Hits/Outliers Favorite Statistics
With the new mini-release "New Project: Cross-Validation, Var Importance & More" we are re-implementing and improving the favorite statistics Hits and Outliers in order to make them available irrespective of the fitness function that you're using (before, these stats only became available if you were using fitness functions based on the relative or absolute errors).

The new implementation of these stats in GeneXproTools 5.0 MR2 allows you to choose the error type (relative or absolute error) and the precision. And like for all the other favorite statistics, you can now also evaluate the Cross-Validation Hits and Cross-Validation Outliers (see the post "Bootstrap Cross-Validation"):

These new stats are available in the Regression Framework and in Time Series Prediction and can also be used for model selection during Ensemble Deployment to Excel:

And by the way, these Hits and Outliers statistics are the same ones that you can conveniently visualize in the new multi-functional Data Panel using different charts (Sequential Distribution Chart, Bivariate Line Chart, and Scatter Plot):

]]>
https://www.gepsoft.com/blog/hits-outliers-favorite-statistics
Tue, 18 Feb 2014 18:07:23 GMTDr. Candida Ferreiratag:www.gepsoft.com,1970-01-01:e078cf5dfb563908fd81ee641862fb59/0eaf4a6812042d19b3f58fcf054091bbBootstrap Cross-Validation
Some of the most important new features that we introduced in GeneXproTools 5.0 include different methods for dataset partitioning and subsampling. Now with Mini-Release 2 "New Project: Cross-Validation, Var Importance & More" we are building up on these methods to implement what I call Bootstrap Cross-Validation.

The Bootstrap Cross-Validation technique consists of evaluating a particular measure of fit, for example, the classification accuracy or the R-square of a model, across k different random samples of a specific dataset (training or validation/test dataset) and then averaging the results for the k folds. For each dataset, the random sampling is done with replacement using the number of records chosen by the user in the Settings Panel for the training and validation/test datasets.

So, in conclusion, the Bootstrap Cross-Validation technique is a powerful tool for model selection as it allows you to cross-validate model performance across a wide range of performance metrics, including all the Favorite Statistics and Fitness Functions available for each modeling category and also User Defined Statistics through Custom Fitness Functions.

So, whether you have a big dataset or a small one, you can make the most of it to help you in the selection of the very best model using cross-validation. For example, if you have a big dataset, say, 20k records, and want to both speed up testing and a more accurate measure for the generalization error, you can use instead just 2k records in a 30-fold cross-validation. If, on the other hand, you have a small dataset and are afraid that your validation/test dataset is not representative of the sample population, by using Bootstrap Cross-Validation on the validation/test or on the entire dataset you can increase the odds of selecting the very best model.

The classical cross-validation technique was developed to deal with model overfitting that plagues different algorithms, from Decision Trees to Linear Regression. Model overfitting is not much of an issue in Gene Expression Programming, but still I hope you’ll find this adaptation of cross-validation to the evolutionary context of model building and selection in GeneXproTools a valuable and powerful tool.

Now repeating the process for the R language (or any other language for that matter) is very similar to what we did for the Go language, so I won't repeat it here. Instead I recommend you take a look at the posts I wrote for the Go language as they cover all you need to know about generating all the Boolean Grammars for any programming language:

But now back to the R language and our new Boolean Grammars, more specifically my choice of template for the R Grammars.

I used the Matlab Boolean Grammars as template for the new R Grammars because both languages implement the XOR function as a function call rather than as an operator (most programming languages implement XOR as an operator and, indeed, of all the programming languages supported by GeneXproTools, only Matlab, Octave, and R implement XOR as a function call). And like we saw for the Go language, the way XOR is implemented is crucial for the way we map each of the 258 built-in logical functions of GeneXproTools in terms of NOT-AND-XOR, which, as you know by now, are the building blocks of the Reed-Muller System.

Let's now take a look at some R code generated with the new Boolean Grammars.

For example, the code below is a minimal logic circuit for the 6-Multiplexer and was designed using just NOT, AND, and OR gates:

Now, thanks to the different grammars we have for different Universal Logical Systems, we can convert automatically the logic circuit above to just NAND gates or NOR gates or MUX gates or NOT-AND-XOR gates (the NOT-AND-OR System would obviously give us the same output as above).

As an example here's the corresponding MUX circuit for the code above generated with our new MUX Grammar for the R language:

We could have also generated NAND or NOR circuits for this circuit with the NAND or NOR Grammars, but they both are huge and ungainly to show here. Like we saw in the previous posts about the NAND System and NOR System, if we are concerned about performance and our goal is to design NAND or NOR circuits (or any other kind of circuit, for that matter), it's best to design the original circuit with building blocks that map compactly to the gates we are interested in. But if performance is not a concern (I for one love to enter a Zen state where I feed really huge NAND or NOR circuits to a compiler and marvel each time it spits out the correct answer at how perfect and reliable computers really are. By the way, R does not handle these long lines of code very well, which I must say interfered tremendously with my Zen states; on the other hand, the Go compiler worked like a charm…) and you just need to convert whatever circuit you have to NAND gates or NOR gates or MUX gates or what have you, you can use any of the Universal Logical Systems that GeneXproTools implements for automatic circuit conversion.

In the next post I'll move away from Boolean Grammars and Universal Logical Systems and talk about a new way of cross-validating your models in GeneXproTools.