Hi! Welcome back to the course on Data Mining with Weka. I'm Ian, up here in New Zealand. This is Lesson 1.2. Remember: there are five classes in this course, and each class consists of about six lessons. This is the second lesson of the first class, and we're going to explore the Explorer -- the Weka Explorer interface.
Actually, first we're going to download the Weka system. This is something you're going to have to do on your computer. We're going to download it from this URL. Without delay, let's go straight there. Here we are. This is www.cs.waikato.ac.nz/ml/weka. You can read about Weka here.
I'm going to go straight to the Download button and download and install Weka on my computer. I'm running on a Windows machine here, but there are versions down at the bottom you can see for Mac OS X and Linux and so on. You need to download the appropriate version for your machine. We want Weka 3.6.10; that's the latest version of Weka. I'm going to download a self-extracting executable without the Java Virtual Machine -- I already have the Java Virtual Machine on my computer. I'm going to click here, but you're going to need to do whatever's appropriate for your computer.
While it's downloading, let's have a word about the pronunciation of the word 'Weka'. It's called Weh-kuh. We don't like calling it 'weaker' system. It's not 'weaker', it's Weka, pronounced to rhyme with 'Mecca'. That's the name of the bird; that's the name of our software. Weka.
I think it has downloaded now, and I'm going to open it. This is a standard kind of setup wizard. We're installing Weka 3.6.10. I'm just going to keep clicking "Next". Yes, I'm happy with the GNU public license. I'm going to have a full install. I'm going to install it in the default place -- you just need to remember the name of this place. We're going to need to visit there in a moment. We're going to install the whole thing. This is going to take a couple of minutes. I'm just off for a cup of coffee; I'll be back in a second.
Now, it's installed. Let's just carry on here. I want to click "Finish", but actually I'm not going to start Weka. I'm going to uncheck that, and click "Finish", because there are a couple of things I want to do first. Let's go and see where Weka is. It's on my computer in Program Files. It should be down here —- Weka 3.6. I'm going to create a shortcut to that, because we're going to be using it a lot in this course. I'm just going to put it on the desktop. Then, I'm going to do one more thing. I'm going to go inside this folder, and I'm going to look at the data folder. This contains a bunch of datasets we're going to be using. I'm going to take this folder and copy it and put it somewhere convenient. Let's cut that, and I'm going to put it in the "My Documents" folder. I'm going to rename it Weka datasets. I'm all set.
I've finished installing Weka. I've got my shortcut to Weka here. Oops -- I made my shortcut to the wrong place; I meant to make the shortcut to this here. Let me just make a shortcut here. Create shortcut, put it on the desktop. That's the one I want. Now, when I click here, it will open Weka. Back to the slide. There are four interfaces in Weka. The Explorer is the one that we'll be using throughout this course. We're just using the Explorer, but also, there is the Experimenter for large scale performance comparisons for different machine learning methods on different datasets. There's the KnowledgeFlow interface, which is a graphical interface to the Weka tools, and there's a command-line interface. But we're just going to use the Explorer. So let's get on with it.
Here's the Explorer. Across the top, there are five panels: the Preprocess panel; the Classify panel, where you build classifiers for datasets; Clustering, another procedure Weka is good at, although we won't be talking about clustering in this course; Association rules; Attribute selection; and Visualization. In this course, we'll be using mainly the Preprocess panel to open files and so on, the Classify panel to experiment with classifiers, and the Visualize panel to visualize our datasets.
I'm going to open a dataset. The dataset that I'm going to open is the weather data; it's a little toy dataset that we'll be seeing a lot of in this course. It's got 14 instances, 14 days, and for each of these days, we have recorded the values of five attributes. Four are to do with the weather: Outlook, Temperature, Humidity, and Windy. The fifth, Play, is whether or not we're going to play a particular, unspecified game. Actually, what we're going to be doing is predicting the Play attribute from the other attributes.
Let's not worry about that at the moment. Let's just open the dataset and take a look at it in Weka. Here's "My Documents". Here are the Weka datasets; this is what I copied. I'm going to open weather.nominal.arff. All Weka data files are called ARFF files; we'll talk about that later on. This is the weather data. Just ignore these colorful bars at the moment. There are 14 instances; these correspond to the 14 days that we saw in the dataset on the slide. For each day we have five attributes: outlook, temperature, humidity, windy, and play. If you select one of these attributes —- outlook is selected at the moment —- we can see the values. The values for the outlook attribute are sunny, overcast, and rainy. These are the number of times they appear in the dataset: 5 sunny days, 4 overcast days, and 3 rainy days, for a total of 14 days, 14 instances. If we look at the temperature attribute, hot, mild, and cool are the possible values, and these are the number of times they appear in the dataset. Let's go to the play attribute. There are two values for play, yes and no.
Now, let's look at these two bars here. Blue corresponds to yes, and red corresponds to no. If you look at one of the other attributes, like outlook, you can see that when the outlook is sunny —- this is like a histogram -— there are three "no" instances and two "yes" instances. When the outlook is overcast, there are four "yes" instances and zero "no" instances. These are like a histogram of the attribute values in terms of the attribute we're trying to predict. It makes it kind of useful to click around and visualize your data.
We've opened the weather data, weather.nominal.arff. We've looked at the attribute values and the attributes in Weka. There's one more thing I want to do before we summarize here. If I go to the Edit panel, I see the data in the form that it was on the slide, with the 14 days down here and the 5 attributes across here. This is another view of the data. I can actually change this dataset. If I click here, I can change this "no" to "yes". Or, if I click here, I can change on this day the outlook from rainy to sunny. If only it were so easy in real life to change a day from rainy to sunny! Then I can click OK, and we've got this edited dataset, which we could save if we'd like. We haven't saved any of this; the dataset on the disk is still the same as it was. I'm not going to save it, and I don't think you should save it, because we're going to be using this dataset quite a bit in this course.
This is what we've done in this lesson. We've installed Weka. We've got the datasets. We've opened the Explorer. We've looked at a dataset —- the weather.nominal.arff dataset. We've looked at the attributes and their values. We've edited the dataset, but we didn't save it. You can read more about this in the course text; Section 1.2 talks about the weather data, and Chapter 10 is a little introduction to the Weka system. Now you should go and do the activity associated with this lesson. Good luck, and I'll see you in the next lesson. Bye for now!