Here we will implement discriminant analysis with multiple groups. We will look at the famous iris data of Fisher. The problem is to classify folowers into three different species, namely Setosa, Versicolor and Virginica, based on four characetristics: sepal length, sepal width, petal length and petal width.
library(kernlab)
library(MASS)
# iris data
data(iris)
X<-iris
Let us visualize the data. There are four columns but let us look at the scatter plot w.r.t two columns.
speciesnames<-as.matrix(unique(X$Species))
colors <- ifelse(X$Species == speciesnames[1], "red", ifelse(X$Species == speciesnames[2],"blue", "dark green"))
plot(X[,1],X[,2], xlab=colnames(X)[1],ylab=colnames(X)[2], col= colors, main="Three Species of Iris" )
legend(7,4.2, legend=c("setosa", "versicolor", "virginica"), col=c("red","blue", "green"), pch=c(1,1,1))
Note that since there are 3 specicies (3 groups), we will need two discriminant functions.
fit <- lda(Species ~ . , data=X, na.action="na.omit",CV=FALSE)
ldavar1<- as.matrix(X[,1:4])%*% (fit$scaling[,1])
ldavar2<- as.matrix(X[,1:4])%*% (fit$scaling[,2])
plot(ldavar1,ldavar2, xlab="First Discriminant Function",ylab="Second Discriminant Function", col= colors , main="axes as discriminant functions")
pred<-predict(fit, X)
pred[[1]][1:10] # Classification based on maximum posterior probability
## [1] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
## Levels: setosa versicolor virginica
round(pred[[2]][1:10,],2) # Posterior probability of belonging to group
## setosa versicolor virginica
## 1 1 0 0
## 2 1 0 0
## 3 1 0 0
## 4 1 0 0
## 5 1 0 0
## 6 1 0 0
## 7 1 0 0
## 8 1 0 0
## 9 1 0 0
## 10 1 0 0
predclass<- as.matrix(pred[[1]])
# confusion Matrix on development sample
ct <- table(as.matrix(X$Species), predclass)
diag(prop.table(ct, 1))
## setosa versicolor virginica
## 1.00 0.96 0.98
# total percent correct
sum(diag(prop.table(ct)))
## [1] 0.98
prop.table(ct)
## predclass
## setosa versicolor virginica
## setosa 0.333333333 0.000000000 0.000000000
## versicolor 0.000000000 0.320000000 0.013333333
## virginica 0.000000000 0.006666667 0.326666667