Here, we will look at basic plots for visualization in R. We will use the data on “USairpollution” within the HSAUR3 library of R. We will first fetch the data. We note that it is enough to install a package once on your machine. Installation need not be done everytime R is used. However, we need to activate the (already installed) package when we want to use it in a new R session. So, we note below that install.packages command has been commented because we dont want to run it everytime this code is run.

#install.packages('HSAUR3')
library(HSAUR3)
## Warning: package 'HSAUR3' was built under R version 3.5.3
## Loading required package: tools
data(USairpollution)
Data<- USairpollution

Alternatively, if the data is stored in the working directory, it can be imported as follows.

setwd('C:\\Users\\IIMA\\Google Drive\\R introduction')
Data<- read.table("USairpollution.csv", header=TRUE, sep=",", row.names=1)

Look at a snapshot of the data

dim(Data)
## [1] 41  7
head(Data)
##             SO2 temp manu popul wind precip predays
## Albany       46 47.6   44   116  8.8  33.36     135
## Albuquerque  11 56.8   46   244  8.9   7.77      58
## Atlanta      24 61.5  368   497  9.1  48.34     115
## Baltimore    47 55.0  625   905  9.6  41.31     111
## Buffalo      11 47.1  391   463 12.4  36.11     166
## Charleston   31 55.2   35    71  6.5  40.75     148

Scatter plot is a simple visual to analyze associations between variables.

with(Data,plot(manu, popul))

Adding better axis labels and color to the points

with(Data,plot(manu, popul, col="blue", xlab="Manufacturing enterprises with 20 or more workere", ylab="Population size (1970 census) in 1000s"))

We can add a “rug-plot” to get an idea for the density of points

with(Data,plot(manu, popul, col="blue", xlab="Manufacturing enterprises with 20 or more workere", ylab="Population size (1970 census) in 1000s"))
with(Data, rug(manu, side=1))
with(Data, rug(popul, side=2))

we may want to know which are the cities being plotted.

with(Data,plot(manu, popul, col="blue", xlab="Manufacturing enterprises with 20 or more workere", ylab="Population size (1970 census) in 1000s"))
with(Data, rug(manu, side=1))
with(Data, rug(popul, side=2))
with(Data, text(manu, popul, cex=.6,pos=3, labels=row.names(Data) ))

To get better clarity, suppose I want to plot only a range of 0 to 500 on the x axis and 0 to 1000 on y axis.

with(Data,plot(manu, popul, col="blue", xlab="Manufacturing enterprises with 20 or more workere", ylab="Population size (1970 census) in 1000s", xlim=c(0,500), ylim=c(0,1000)))
with(Data, text(manu, popul, cex=.6,pos=3, labels=row.names(Data) ))

We can use R-graphics to visualize more than 2 dimensions in a scatter plot involving only 2 axes. Chart below plots the average wind speed versus average temperature in different cities and further uses circles with their size scaled as per the level of SO2.

Note that below we have not used with(Data, plot( …)) command. If we want to write the command without with(), then we need to use Data$ wind, Data$ temp etc.. each time we want to specify a particular variable from the data.

plot(Data$temp,Data$wind, xlab="Avg annual temperature in Fahrenheit", ylab="avg annual wind speed in m.p.h", pch=10, ylim=c(.9,1.1)*range(Data$wind), xlim=c(.9,1.1)*range(Data$temp))
# introduce bubble showing SO2 levels
with(Data, symbols(temp, wind, circles=SO2, inches=.5, add=TRUE))
with(Data, text(temp, wind, cex=.75,pos=3, labels=row.names(Data)))

One may also obtain 3-dimensional plots. One of the packages to get 3d plots is “scatterplot3d” . There are other packages, e.g. rgl can be used for interactive 3d plots.

#install.packages("scatterplot3d")
library(scatterplot3d)
## Warning: package 'scatterplot3d' was built under R version 3.5.2
with(Data, scatterplot3d(temp, wind, SO2, type="p", angle=55))

with(Data, scatterplot3d(temp, wind, SO2, type="h", angle=55))

Statistical Plots

If we want to plot a ferquency distribution of the data as follows. The y-axis is the count of each value on the x-axis.

d_h<- with(Data,table(SO2))

plot(d_h, ylab="frequency")

For data that is of a continuous nature it is more appropriate to look at histogram. The histogram plot frequencies (or density) within successive intervals of the data. It’s shape depends on how may breaks or how many intervals we consider.

with(Data, hist(SO2))

with(Data, hist(SO2, breaks=20))

It is helpful to plot the density in the histogram and overlay with a continuous smoothed estimate called the “kernel density estimate”. For example, the kernel density estimate below suggests possibly the data is bimodal, i..e having two modes.

with(Data, hist(SO2, freq=FALSE, ylim=c(0,.025)))
with(Data,lines(density(SO2), col="red"))

When we compute data summaries, we look at min, max and the quartiles. A graphical representation of this summary is the Box-Whiskers plot. We can get this plot for any given variable as below.

with(Data, boxplot(SO2))

We can also try to get this boxplot by say size of manufacturing firms. For the purpose, let us consider number firms with atleast 20 employees being less than or greater than 500.

Data$firm_ind= (Data$manu <500)*1

with(Data, boxplot(SO2~firm_ind))

We can see that SO2 levels are lower in terms of many of the summaries for cities with fewer firms.

One may want to fit a curve through scatter plots to quantify the nature of relationships. A common technique is called linear regression.

with(Data,plot(temp, wind))

reg<-lm(wind~temp, data=Data)

print(round(summary(reg)$coefficients,2))
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)    13.30       1.67    7.98     0.00
## temp           -0.07       0.03   -2.33     0.02
wind_pred<- predict(reg)
with(Data, points(temp, wind_pred, col="green"))  # Points are plotted
with(Data,lines(temp, wind_pred))   # Successively plotted points are joined by lines 

In R one can try to customize the graph to our liking to a large extent. For example, in the barplot below, not all the names on the x-axis are shown.

with(Data,barplot(height=SO2, names.arg=row.names(Data)))

We can get the x-axis to show all names and aligned in an angle for legibility.

par(las=2)  # makes labels appear perpendicular to axis
with(Data,barplot(height=SO2, names.arg=row.names(Data)))

One may display many graphs on a grid. For example, we display the scatter plot of SO2 versus population, bar chart of SO2 in a 2 rows and 1 column using par(mfrow=c(2,1))

par(mfrow=c(2,1))

with(Data, plot(popul, SO2, col="brown"))
par(las=2)  # makes labels appear perpendicular to axis
with(Data,barplot(height=SO2, names.arg=row.names(Data)))

We may want to save the graphs as pdf or jpg or eps files for later use. After the plot is generated, we can run the folowing command to save the latest active graphs in the working directory.

# To export as pdf

dev.copy(pdf, "test1.pdf")
dev.off()


# To export as eps

dev.copy(postscript, "test1.eps")
dev.off()


# To export as jpeg

dev.copy(jpeg, "test1.jpg")
dev.off()