In Metric MDS, we start with data on pair-wise Euclidean distances between various locations, and using that we estimate suitable coordinates to plot the locations on a map.
Suppose \(((d_{ij}))\) is the matrix containing pairwise distances between locations \(i=1,2,\ldots n\). The problem in metric MDS is to estimate \(X_i =\begin{bmatrix} x_i \\ y_i \end{bmatrix}\). The coordinates need not be 2-dimensional and can be of higher dimensions, but we use two dimension in our example.
Rearranging terms, we get \[ -\frac{1}{2}(d^{2}_{jk} - d^{2}_{ij}-d^{2}_{ik})= X_j^{T}X_k .\] Suppose we write \[X= \begin{bmatrix} X_1^{T}\\\vdots\\X_n^{T} \end{bmatrix} = \begin{bmatrix} x_1 & y_1 \\ \vdots\\ x_n & y_n \end{bmatrix}.\] then, \[A= ((-\frac{1}{2} (d_{jk}^2 -d_{ij}^2 -d_{ik}^2))_{n\times n}= X_{n\times r}X^{T}_{r\times n}.\] Note that the matrix \(A\) on the L.H.S is known because all pairwise distances are known. Suppose, spectral decompostion of \(A= U D U^{T}\) where \(U\) is orthonormal and \(D\), the eigen values of A. Then, \(X\) is taken to be the first \(r\) columns of \(UD^{\frac{1}{2}}\). Instead of fixing one of the points as the origin, a more stable approach is to fix the origin at the centroid of the points. It can be shown that this amounts to considering the matrix \(A\) to be \[ A = (( -\frac{1}{2}d^2_{jk}+ \frac{1}{2n}\sum_{l}d_{lk}^2 + \frac{1}{2n}\sum_{l}d_{lj}^2 -\frac{1}{2n^2}\sum_{l_1}\sum_{l_2}d^{2}_{l_1l_2})).\]
We consider the pairwise distances between cities in Europe. The example is adapted from the book ``Analyzing Multivariate Data" by Lattin, Carroll and Green.
setwd('C:\\Users\\IIMA\\Google Drive\\SMDA\\SMDA2020\\4. MDS')
C<-read.csv("cities.csv")
C
## City Athens Berlin Dublin London Madrid Paris Rome Warsaw
## 1 Athens 0 1119 1777 1486 1475 1303 646 1013
## 2 Berlin 1119 0 817 577 1159 545 736 327
## 3 Dublin 1777 817 0 291 906 489 1182 1135
## 4 London 1486 577 291 0 783 213 897 904
## 5 Madrid 1475 1159 906 783 0 652 856 1483
## 6 Paris 1303 545 489 213 652 0 694 859
## 7 Rome 646 736 1182 897 856 694 0 839
## 8 Warsaw 1013 327 1135 904 1483 859 839 0
The actual map and map generated from metric MDS of the locations are given below. The idea here is to see whether we can estimate the relative locations using MDS.
D<-as.dist(as.matrix(C[,2:dim(C)[2]]), diag=NULL)
fit <- cmdscale(D,eig=TRUE, k=2, add=FALSE) # k is the number of dim
fit # view results
# plot solution
x <- fit$points[,1]
y <- fit$points[,2]
plot(x, y, xlab="Coordinate 1", ylab="Coordinate 2",main="Metric MDS", type="n")
text(x, y, labels = C[,1], cex=.8)
The estimated locations and actual map do not seem to be aligned with each other. Note that that estimated coordinates are relative, so we may have to find a suitable rotation (orthonormal transformation) to be able to match the actual map.
# (Europe) after rotation
fit <- cmdscale(D,eig=TRUE, k=2, add=FALSE) # k is the number of dim
# plot solution
x <- fit$points[,2]
y <- fit$points[,1]
theta= pi/4
yy<- (x*cos(theta)-y*sin(theta))
xx<- (x*cos(theta)+y*sin(theta))
plot(xx, yy, xlab="Coordinate 1", ylab="Coordinate 2",
main="Metric MDS (rotated)", type="n")
text(xx, yy, labels = C[,1], cex=.7)