In many applications, we deal with dissimilarity measures that are not absolute. In these cases, it is mostly the ordering that matters. Let \(\delta_{ij}\) denote the dissimilarity between objects \(i\) and \(j\). These \(\delta_{ij}\) need not be Euclidean distances. Our aim in non-metric MDS is to obtain \(r-\)dimensional co-ordinate vectors \(X_1,X_2, \ldots ,X_n\) for objects \(1,2,...,n\) such that distances between \(X_i\) and \(X_j\) denoted by \(d_{ij}\) will "closely resemble" the ordering as per \(\delta_{ij}\).

Methodology

Step 1: Choose dimension \(r\).

Step 2: Choose a staring set of co-ordinates (e.g. by Metric MDS):\[X_1^T=(x_{11},x_{12},....,x_{1r}), X_2^T=(x_{21},x_{22},...,x_{2r}),..., X_n^T=(x_{n1},x_{n2},....,x_{nr})\]

Step 3: Compute \(d_{ij}=\sqrt{( x_{i1}-x_{j1})^2+.....+({x_{ir}-x_{jr})^2}}\)

Step 4: Assess closeness between \(d_{ij}\) and \(\delta_{ij}\). Kruskal's method finds the best monotonic transformation \(\phi\) of \(((\delta_{ij}))\) such that \(\hat{d_{ij}}=\phi{(\delta_{ij})}\) and minimizes \[STRESS=\sqrt{\frac{\sum\limits_i\sum\limits_j(d_{ij}-\hat{d_{ij}})^2}{\sum\limits_i\sum\limits_jd_{ij}^2}}.\] A thumbrule as provided in the text is as follows: Poor : 0.2, Fair: 0.1, Good: .05, Excellent: 0.02.

Step 5: Determine new set of coordinates based on \(\hat{d}_{ij}\) and repeat steps 3 and 4. Shephard's diagram is a visual way to check closeness: it is a plot of dissimilarities versus distances \(d_{ij}\) as well as \(\hat{d}_{ij}\).

Example

We consider the pairwise dissimilarities between car brands. The example is adapted from the book ``Analyzing Multivariate Data" by Lattin, Carroll and Green.

setwd('C:\\Users\\IIMA\\Google Drive\\SMDA\\SMDA2020\\4. MDS')
library(MASS)
C<-read.csv("car_dis.csv")

C1<-C[,2:11]
C1[upper.tri(C1)] <- 0

CC<-(C1+t(C1))
CC
##    BMW Ford Infiniti Jeep Lexus Chrysler Mercedes Saab Porsche Volvo
## 1    0   34        8   31     7       43        3   10       6    33
## 2   34    0       24    2    26       14       28   18      39    11
## 3    8   24        0   25     1       35        5   20      41    22
## 4   31    2       25    0    27       15       29   17      38    12
## 5    7   26        1   27     0       37        4   13      40    23
## 6   43   14       35   15    37        0       42   36      45     9
## 7    3   28        5   29     4       42        0   19      32    30
## 8   10   18       20   17    13       36       19    0      21    16
## 9    6   39       41   38    40       45       32   21       0    44
## 10  33   11       22   12    23        9       30   16      44     0
fit <- isoMDS(as.dist(CC), k=2, trace=TRUE,maxit=1000, tol=.001 ) # k is the number of dim
## initial  value 10.466056 
## iter   5 value 6.523088
## iter  10 value 4.865126
## iter  15 value 4.088399
## iter  20 value 4.003118
## final  value 3.989006 
## converged
fit # view results
## $points
##                [,1]        [,2]
## BMW       17.936081  -0.4758516
## Ford     -13.147315  10.0151781
## Infiniti   5.216760 -13.4062683
## Jeep     -12.987540   9.9869380
## Lexus      5.541811 -13.5538525
## Chrysler -29.832310  -0.6206639
## Mercedes  14.369211 -10.2044803
## Saab       4.383495   5.8442803
## Porsche   23.614616  18.5987376
## Volvo    -15.094809  -6.1840172
## 
## $stress
## [1] 3.989006
x <- fit$points[,1]
y <- fit$points[,2]
plot(x, y, xlab="Coordinate 1", ylab="Coordinate 2", 
     main="Nonmetric MDS", xlim=c(-40,25),ylim=c(-20,20), type="n")
text(x, y, labels = C[,1], pos=c(1,rep(c(2,3),5)), cex=.9)
lines(seq(-40,40, length=100), rep(0,100), col="red")
lines( rep(0,100),seq(-25,25, length=100), col="red")

sheppard's diagram

shep<-Shepard(as.dist(CC), fit$points, p = 2)
plot(shep, pch="*", xlab="Dissimilarity",ylab="Distance", xlim=range(shep$x), ylim=range(shep$x))
lines(shep$x, shep$yf, type="S")