In many applications, we deal with dissimilarity measures that are not absolute. In these cases, it is mostly the ordering that matters. Let \(\delta_{ij}\) denote the dissimilarity between objects \(i\) and \(j\). These \(\delta_{ij}\) need not be Euclidean distances. Our aim in non-metric MDS is to obtain \(r-\)dimensional co-ordinate vectors \(X_1,X_2, \ldots ,X_n\) for objects \(1,2,...,n\) such that distances between \(X_i\) and \(X_j\) denoted by \(d_{ij}\) will "closely resemble" the ordering as per \(\delta_{ij}\).
Step 1: Choose dimension \(r\).
Step 2: Choose a staring set of co-ordinates (e.g. by Metric MDS):\[X_1^T=(x_{11},x_{12},....,x_{1r}), X_2^T=(x_{21},x_{22},...,x_{2r}),..., X_n^T=(x_{n1},x_{n2},....,x_{nr})\]
Step 3: Compute \(d_{ij}=\sqrt{( x_{i1}-x_{j1})^2+.....+({x_{ir}-x_{jr})^2}}\)
Step 4: Assess closeness between \(d_{ij}\) and \(\delta_{ij}\). Kruskal's method finds the best monotonic transformation \(\phi\) of \(((\delta_{ij}))\) such that \(\hat{d_{ij}}=\phi{(\delta_{ij})}\) and minimizes \[STRESS=\sqrt{\frac{\sum\limits_i\sum\limits_j(d_{ij}-\hat{d_{ij}})^2}{\sum\limits_i\sum\limits_jd_{ij}^2}}.\] A thumbrule as provided in the text is as follows: Poor : 0.2, Fair: 0.1, Good: .05, Excellent: 0.02.
Step 5: Determine new set of coordinates based on \(\hat{d}_{ij}\) and repeat steps 3 and 4. Shephard's diagram is a visual way to check closeness: it is a plot of dissimilarities versus distances \(d_{ij}\) as well as \(\hat{d}_{ij}\).
We consider the pairwise dissimilarities between car brands. The example is adapted from the book ``Analyzing Multivariate Data" by Lattin, Carroll and Green.
setwd('C:\\Users\\IIMA\\Google Drive\\SMDA\\SMDA2020\\4. MDS')
library(MASS)
C<-read.csv("car_dis.csv")
C1<-C[,2:11]
C1[upper.tri(C1)] <- 0
CC<-(C1+t(C1))
CC
## BMW Ford Infiniti Jeep Lexus Chrysler Mercedes Saab Porsche Volvo
## 1 0 34 8 31 7 43 3 10 6 33
## 2 34 0 24 2 26 14 28 18 39 11
## 3 8 24 0 25 1 35 5 20 41 22
## 4 31 2 25 0 27 15 29 17 38 12
## 5 7 26 1 27 0 37 4 13 40 23
## 6 43 14 35 15 37 0 42 36 45 9
## 7 3 28 5 29 4 42 0 19 32 30
## 8 10 18 20 17 13 36 19 0 21 16
## 9 6 39 41 38 40 45 32 21 0 44
## 10 33 11 22 12 23 9 30 16 44 0
fit <- isoMDS(as.dist(CC), k=2, trace=TRUE,maxit=1000, tol=.001 ) # k is the number of dim
## initial value 10.466056
## iter 5 value 6.523088
## iter 10 value 4.865126
## iter 15 value 4.088399
## iter 20 value 4.003118
## final value 3.989006
## converged
fit # view results
## $points
## [,1] [,2]
## BMW 17.936081 -0.4758516
## Ford -13.147315 10.0151781
## Infiniti 5.216760 -13.4062683
## Jeep -12.987540 9.9869380
## Lexus 5.541811 -13.5538525
## Chrysler -29.832310 -0.6206639
## Mercedes 14.369211 -10.2044803
## Saab 4.383495 5.8442803
## Porsche 23.614616 18.5987376
## Volvo -15.094809 -6.1840172
##
## $stress
## [1] 3.989006
x <- fit$points[,1]
y <- fit$points[,2]
plot(x, y, xlab="Coordinate 1", ylab="Coordinate 2",
main="Nonmetric MDS", xlim=c(-40,25),ylim=c(-20,20), type="n")
text(x, y, labels = C[,1], pos=c(1,rep(c(2,3),5)), cex=.9)
lines(seq(-40,40, length=100), rep(0,100), col="red")
lines( rep(0,100),seq(-25,25, length=100), col="red")
shep<-Shepard(as.dist(CC), fit$points, p = 2)
plot(shep, pch="*", xlab="Dissimilarity",ylab="Distance", xlim=range(shep$x), ylim=range(shep$x))
lines(shep$x, shep$yf, type="S")