Metric MDS

In Metric MDS, we start with data on pair-wise Euclidean distances between various locations, and using that we estimate suitable coordinates to plot the locations on a map.

Suppose \(((d_{ij}))\) is the matrix containing pairwise distances between locations \(i=1,2,\ldots n\). The problem in metric MDS is to estimate \(X_i =\begin{bmatrix} x_i \\ y_i \end{bmatrix}\). The coordinates need not be 2-dimensional and can be of higher dimensions, but we use two dimension in our example.

Methodology

Firstly, we note that the positions are relative. Therefore, the origin can be arbitrarily fixed. Suppose, we fix the origin at a point \(i\), i.e \(X_i=\) an \(r-\)dimensional zero vector. For example, in 2 dimension this means \(X_i=(x_i,y_i)\) will be treated as \((0,0)\). Then,for the distance \(d_{jk}\) between points \(j\) and \(k\), we have the following equation. \[\begin{eqnarray*} d^{2}_{jk} &=& (X_j-X_k)^{T}(X_j-X_k)=X_j^{T}X_j + X_{k}^T X_k - 2 X_j^{T}X_k\\ &=& d^{2}_{ij}+d^{2}_{ik} - 2 X_j^{T}X_k \end{eqnarray*}\]

Rearranging terms, we get \[ -\frac{1}{2}(d^{2}_{jk} - d^{2}_{ij}-d^{2}_{ik})= X_j^{T}X_k .\] Suppose we write \[X= \begin{bmatrix} X_1^{T}\\\vdots\\X_n^{T} \end{bmatrix} = \begin{bmatrix} x_1 & y_1 \\ \vdots\\ x_n & y_n \end{bmatrix}.\] then, \[A= ((-\frac{1}{2} (d_{jk}^2 -d_{ij}^2 -d_{ik}^2))_{n\times n}= X_{n\times r}X^{T}_{r\times n}.\] Note that the matrix \(A\) on the L.H.S is known because all pairwise distances are known. Suppose, spectral decompostion of \(A= U D U^{T}\) where \(U\) is orthonormal and \(D\), the eigen values of A. Then, \(X\) is taken to be the first \(r\) columns of \(UD^{\frac{1}{2}}\). Instead of fixing one of the points as the origin, a more stable approach is to fix the origin at the centroid of the points. It can be shown that this amounts to considering the matrix \(A\) to be \[ A = (( -\frac{1}{2}d^2_{jk}+ \frac{1}{2n}\sum_{l}d_{lk}^2 + \frac{1}{2n}\sum_{l}d_{lj}^2 -\frac{1}{2n^2}\sum_{l_1}\sum_{l_2}d^{2}_{l_1l_2})).\]

Example

We consider the pairwise distances between cities in Europe. The example is adapted from the book ``Analyzing Multivariate Data" by Lattin, Carroll and Green.

setwd('C:\\Users\\IIMA\\Google Drive\\SMDA\\SMDA2020\\4. MDS')
C<-read.csv("cities.csv")
C

##     City Athens Berlin Dublin London Madrid Paris Rome Warsaw
## 1 Athens      0   1119   1777   1486   1475  1303  646   1013
## 2 Berlin   1119      0    817    577   1159   545  736    327
## 3 Dublin   1777    817      0    291    906   489 1182   1135
## 4 London   1486    577    291      0    783   213  897    904
## 5 Madrid   1475   1159    906    783      0   652  856   1483
## 6  Paris   1303    545    489    213    652     0  694    859
## 7   Rome    646    736   1182    897    856   694    0    839
## 8 Warsaw   1013    327   1135    904   1483   859  839      0

The actual map and map generated from metric MDS of the locations are given below. The idea here is to see whether we can estimate the relative locations using MDS.

D<-as.dist(as.matrix(C[,2:dim(C)[2]]), diag=NULL)

fit <- cmdscale(D,eig=TRUE, k=2, add=FALSE) # k is the number of dim
fit # view results

# plot solution 
x <- fit$points[,1]
y <- fit$points[,2]

plot(x, y, xlab="Coordinate 1", ylab="Coordinate 2",main="Metric  MDS",   type="n")
text(x, y, labels = C[,1], cex=.8)

The estimated locations and actual map do not seem to be aligned with each other. Note that that estimated coordinates are relative, so we may have to find a suitable rotation (orthonormal transformation) to be able to match the actual map.

# (Europe) after rotation
fit <- cmdscale(D,eig=TRUE, k=2, add=FALSE) # k is the number of dim
# plot solution 
x <- fit$points[,2]
y <- fit$points[,1]
theta=  pi/4
yy<- (x*cos(theta)-y*sin(theta))
xx<- (x*cos(theta)+y*sin(theta))
plot(xx, yy, xlab="Coordinate 1", ylab="Coordinate 2",
     main="Metric  MDS (rotated)",   type="n")
text(xx, yy, labels = C[,1], cex=.7)

Metric MDS

Karthik Sriram

2 December 2020

Methodology

Example