What's One Rule?
- a classification algorithm
- simple but accurate
- works best with categorical data
- generates one rule for each predictor in the data, then selects the rule with the smallest total error as its "one rule".
What data set are we going to use?
We're going to use the famous Iris flower data set. It consists of 50 samples from each of three species of Iris (setosa, virginica, and versicolor). Four features were measured from each sample: sepal length, sepal width, petal length, and petal width. Here's a snippet of the table:
Sepal length | Sepal width | Petal length | Petal width | Species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | I. setosa |
4.9 | 3.0 | 1.4 | 0.2 | I. setosa |
4.7 | 3.2 | 1.3 | 0.2 | I. setosa |
4.6 | 3.1 | 1.5 | 0.2 | I. setosa |
5.0 | 3.6 | 1.4 | 0.2 | I. setosa |
5.4 | 3.9 | 1.7 | 0.4 | I. setosa |
4.6 | 3.4 | 1.4 | 0.3 | I. setosa |
5.0 | 3.4 | 1.5 | 0.2 | I. setosa |
4.4 | 2.9 | 1.4 | 0.2 | I. setosa |
4.9 | 3.1 | 1.5 | 0.1 | I. setosa |
5.4 | 3.7 | 1.5 | 0.2 | I. setosa |
4.8 | 3.4 | 1.6 | 0.2 | I. setosa |
Learn more about its history and see the complete table here.
Ready? Before we proceed, here's a picture of an Iris Versicolor:
from Wikipedia |
Goal:
Train a one rule classifier which we can use to predict the species of a new flower we've never seen before.In other words, if we're given a new flower, which one of the features above (the Predictors: sepal length, petal width, etc.) can best tell whether it's a Sentosa, a Virginica, or a Versicolor?
How to find the One Rule manually:
- Construct a frequency table for each predictor against the target
- Count how often each value of target (class) appears --> for this example we put the values into bins (see below)
- Find the most frequent class
- Make the rule assign that class to this value of the predictor
- Calculate the total error of the rules of each predictor
- Choose the predictor with the smallest total error
Short Answer:
If you'd like to determine the classification of a new flower, there's one rule that corresponds to each type:
If Petal.Width = (0.0976,0.791] then Species = setosa
If Petal.Width = (0.791,1.63] then Species = versicolor
If Petal.Width = (1.63,2.5] then Species = virginica
The accuracy for this method is very high. Out of 150, 144 instances were classified correctly or roughly 96%.
Finding our First Rule
Let's try it out for Sentosa:
So out of the 50 types of Setosas that have a petal length of 0.994-2.46 and a petal width of 0.0976-0.791, there are
- 1 with a Sepal Length (SL) between 4.3-5.41 and a Sepal Width (SW) between 2.87-3.19,
- 11 with the same SL and a SW between 2.87-3.19
- 33 with the same SL and a SW between 3.19-4.4
- 5 with a SL between 5.41-6.25 and a SW between 3.19-4.4
Comments
Post a Comment