Iris Data

This cube uses a classic data set constructed by Edgar Anderson in 1935 ("The irises of the Gaspe Peninsula" Bulletin of the American Iris Society 59: 2-5) and introduced by Sir Ronald Fisher in 1936 ("The Use of Multiple Measurements in Taxonomic Problems" Annals of Eugenics 7: 179-188).

Anderson collected the data to quantify the geographic variation of Iris flowers amongst three distinctive species (Iris setosa, Iris viginica and Iris versicolor). Four characters were measured from 50 samples of each flower (sepal length and width, and petal length and width).

Based on the combination of the four characters, Fisher developed a Linear Discriminant Model to determine which species they are. Note that Anderson already knew which species they were (that is how he separated his three sets of samples), but Fisher showed how this could be done entirely with mathematics, given only the data.

The mathematics is not trivial and few people could understand it (his article can be downloaded using Google), but if you will look at the scatter plot matrix in this cube, the visualization of the relationships amongst the three species leaps out at you.

Iris Data.CUBE (16.1 KB)
150 records, 4 measures, 4 dimensions. Installed with Antaeus.