Current version of this document (in English) is not up to date due to my time limitation. For the current functionality of the package, please consult the manual of the partial.plot
package in R. I hope I will update this document near future so please be patient. I’m sorry for your inconvenience
This package provides tools for visualizing a result of multivariate analyses. The goal of this package is providing a easy-to-use method to visualize results of many kinds of regression models. This package is still in development and may have bugs which can affect your interpretation of the results. Also I’m still designing the interface of the package so future version may lose some functionality in current version. If you find bugs or if you have any requests, please let me know by GitHub or email: paste following code into R to get my address: rawToChar(as.raw(c(109, 111, 103, 64, 102, 102, 112, 114, 105, 46, 97, 102, 102, 114, 99, 46, 103, 111, 46, 106, 112))).
To install the package, just copy & paste following commands into the R console.
install.packages(
c("model.adapter", "partial.plot"), type = "source",
repos = c(
"http://florivory.net/R/repos", options()$repos
)
)
Try this one! You can also see the examples.
# Load dataset.
data(iris)
# Create a prediction model by GLM.
model <- glm(
Petal.Length ~ (Sepal.Length + Petal.Width) * Species, data = iris
)
# Load the library.
library(partial.plot)
# Draw relationship between sepal length and petal length.
info <- partial.plot(model, c("Sepal.Length", "Species"), pch = 16)
# Add a legend.
pp.legend(info, "topleft")
## Loading required package: rgl
# Draw 3D plot.
info <- partial.plot(
model, c("Sepal.Length", "Petal.Width"), col = terrain.colors,
theta = 20
)
In this time, we use the Fisher’s iris dataset. This dataset having petal length (Petal.Length), petal width (Petal.Width), sepal length (Sepal.Length) and sepal width (Sepal.Width) data of 3 species of Iris (setosa, versicolor, and virginica).
We can load the dataset and see structure of the data set by following code.
# Load the dataset.
data(iris)
# See the structure of the dataset.
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
# Visualize data.
par(mfrow = c(2, 2))
plot(
Petal.Length ~ Sepal.Length, data = iris,
pch = as.numeric(Species) + 14, col = as.numeric(Species) + 1
)
plot(
Petal.Length ~ Sepal.Width, data = iris,
pch = as.numeric(Species) + 14, col = as.numeric(Species) + 1
)
plot(
Petal.Length ~ Petal.Width, data = iris,
pch = as.numeric(Species) + 14, col = as.numeric(Species) + 1
)
In this guide, we will make a model to predict petal length by other variables. For the modeling, we are going to use generalized linear model (GLM). For explanatory variables of the model we will use sepal length, petal width and species. By including interaction terms between sepal length and species, and between petal width and species, we assume that relationships between sepal length and petal length and between petal width and petal length differ between species.
# Make prediction model.
model <- glm(
Petal.Length ~ (Sepal.Length + Petal.Width) * Species, data = iris
)
# Show summary of the model.
summary(model)
##
## Call:
## glm(formula = Petal.Length ~ (Sepal.Length + Petal.Width) * Species,
## data = iris)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.72681 -0.11668 0.00073 0.12952 0.57611
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.88128 0.47059 1.873 0.06318 .
## Sepal.Length 0.09342 0.09695 0.964 0.33693
## Petal.Width 0.45959 0.32429 1.417 0.15862
## Speciesversicolor -0.80183 0.60440 -1.327 0.18677
## Speciesvirginica -0.48261 0.60125 -0.803 0.42351
## Sepal.Length:Speciesversicolor 0.32734 0.12315 2.658 0.00877 **
## Sepal.Length:Speciesvirginica 0.63569 0.11088 5.733 5.76e-08 ***
## Petal.Width:Speciesversicolor 0.80957 0.38007 2.130 0.03490 *
## Petal.Width:Speciesvirginica -0.28686 0.34738 -0.826 0.41033
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.05280405)
##
## Null deviance: 464.3254 on 149 degrees of freedom
## Residual deviance: 7.4454 on 141 degrees of freedom
## AIC: -4.7749
##
## Number of Fisher Scoring iterations: 2
Now we are going to visualize the result of the GLM analysis with partial.plot()
. Basic usage of partial.plot()
function is as follow. The example below visualize predicted relationship between sepal length and petal length of the three species.
# Load library.
library(partial.plot)
# Visualize predicted relationship between sepal length and petal length.
partial.plot(model, c("Sepal.Length", "Species"), pch = 16)
The first argument of the partial.plot()
is model object returned by a model function. In this case we used model
object returned by glm()
.
The second argument is a character vector specifying names of explanatory variables whose relationships to response variable are visualized. In the example, "Sepal.Length"
and "Species"
are specified. For this argument, multiple names of factors can be specified. Those two arguments must be specified.
In addition to the above arguments, we put pch = 16
to make the graph a bit pretty.
If we want to visualize relationship between petal width and petal length, we can specify the arguments as follow.
# Visualize predicted relationship between petal width and petal length.
partial.plot(model, c("Petal.Width", "Species"), pch = 16)
partial.plot()
returns information can be used to produce its legend. For example, if we write following command, information about the plot is stored in the info
variable.
# Store settings of partial.plot into info.
info <- partial.plot(model, c("Sepal.Length", "Species"), pch = 16)
By passing info
to pp.legend()
function, we can draw a legend using settings used for partial.plot()
.
# Add legend.
info <- partial.plot(model, c("Sepal.Length", "Species"), pch = 16)
pp.legend(info, "topleft")
The usage of pp.legend
function is almost similar to the legend
function. The first argument is result of partial.plot
, and other arguments are passed to legend
function.
partial.plot()
can visualize more complicated models. In this time, we use CO2
dataset in dataset
package of R. THis dataset having CO2 assimilation rate of plants collected from Quebec and Mississippi in several CO2 concentrations after the chilling experiment.
We can control some parameters of partial.plot
to change appearance of the graph.
We can suppress drawing symbols representing residuals by setting FALSE
to draw.residual
argument.
partial.plot(
model, c("Sepal.Length", "Species"), draw.residual = FALSE
)
Also, we can suppress drawing predicted partial relationship by specifying FALSE
to draw.relationship
argument.
partial.plot(
model, c("Sepal.Length", "Species"), draw.relationship = FALSE,
pch = 16
)
By setting ‘extrapolate’ to ‘TRUE’, predicted partial relationships are drawn for range of the explanatory variable for all groups.
info <- partial.plot(
model, c("Sepal.Length", "Species"), pch = 16, extrapolate = TRUE
)
pp.legend(info, "topleft")
We can change colors of the graph by several ways. First way to change colors of the graph is passing rainbow()
or heat.colors()
functions for col
argument of partial.plot()
. We can use functions that described in help page of rainbow()
function (type ?rainbow
to see the list).
# Change colors of the graph.
info <- partial.plot(
model, c("Sepal.Length", "Species"), pch = 16, col = rainbow
)
# Colors in the legend is automatically adjusted.
pp.legend(info, "topleft")
Second way to change colors is prepare named character vector denoting colors for each factor levels and specify it for col
argument.
# Prepare a named color vector.
col <- c(
setosa = "darkgreen", versicolor = "blue", virginica = "orange2"
)
# Specify the vector for col argument.
info <- partial.plot(
model, c("Sepal.Length", "Species"), pch = 16, col = col
)
pp.legend(info, "topleft")
We can change other graphic parameters like plot
function. We changed pch
to set symbol of the in the example above. But we also use other graphic parameters.
# Set other graphic parameters
info <- partial.plot(
model, c("Sepal.Length", "Species"), pch = 16, cex = 1.5,
xlab = "Sepal length (mm)", ylab = "Petal Length"
)
pp.legend(info, "topleft")
At this moment, (maybe) lm()
, glm()
, glm.nb()
, lme()
, lmer()
, glmer()
, glmer.nb()
, glmmadmb()
, MCMCglmm()
, cforest()
, ctree()
, svm()
, randomForest()
, ranger()
, rpart()
, tree()
are supported. Support for the models are provided by the model.adapter
package.
For the estimation of the relationship between the explanatory and response variable, partial.plot
depends on emmeans
package. If the specified model is supported by emmeans
package, the predicted relationship is calculated by emmeans
. Currently, emmeans
is used for lm
, glm
, glm.nb
, lme
, lmer
, glmer
, glmer.nb
, glmmadmb
and MCMCglmm
. When the model is supported by emmeans
, response variables other than focal response variable(s) are fixed at their mean and the relationship is estimated.
When the model is not supported by emmeans
, the relationship is calculated by partial.plot
. In this case, conditional relationship on the dataset is calculated: for the values of response variables other than focal response variables(s), all values in the dataset is used and mean values of predicted response variable is used for the relationship.
When the formula of a model having functions such as offset()
and log()
, partial.plot
stops because it can’t obtain data from the model object. To avoid this problem, please specify original data used for the modeling to the data
argument.
# Example
model <- glmer.nb(y ~ log(x) + (1 | random), data = dat)
partial.plot(model, "x", data = dat)
‘partial.plot’ treats offset term(s) as follows:
partial.plot
uses mean value(s) for offset term(s) for prediction. For example, when the response variable of a model is a number of individuals and the offset term is study area, prediction and confidence interval are calculated using the mean value of study area.
Due to these specification, prediction and partial residuals don’t match each other in some situations. Two methods can solve the problem: 1) disabling partial residual and 2) prepare special dataset for plotting. An easier method is disabling plotting partial residuals by specifying draw.residual = FALSE
to partial.plot
. Of course this method is not a better way, but would be used for some situations. A better, but more complicated method is calculating response variable/offset term before running partial.plot
as shown in the following example.
#-----------------------------------------------------------------------------
# Prepare dataset and model.
#-----------------------------------------------------------------------------
# Load 'Insurance' data containing number of claims from MASS package.
# For the details, see '?Insurance' after loading MASS package.
utils::data(Insurance, package="MASS")
# Convert age from factor to numeric.
Insurance$Age.numeric <- c(25, 27, 32.5, 35)[as.numeric(Insurance$Age)]
# Model the number of insurance claims by district, type of cars and age.
# The number of policyholders is used for the offset term.
model <- glm(
Claims ~ District + Group + Age.numeric + offset(log(Holders)),
data = Insurance, family = poisson
)
#-----------------------------------------------------------------------------
# Plotting 1: case of failure.
#-----------------------------------------------------------------------------
# Simply draw partial dependence plot.
# Predicted relationship and partial residual don't match each other.
pp <- partial.plot(model, c("Age.numeric", "Group"))
pp.legend(pp, "topright")
#-----------------------------------------------------------------------------
# Solution 1: disable drawing partial residual.
# In this case, predicted relationship with mean number of policyholders
# is drawn.
#-----------------------------------------------------------------------------
pp <- partial.plot(model, c("Age.numeric", "Group"), draw.residual = FALSE)
pp.legend(pp, "topright")
#-----------------------------------------------------------------------------
# Solution 2: use response variable/offset term.
# In this case, we calculate claims count/number of policyholders and set 1
# for number of policyholders of all records before plotting.
#-----------------------------------------------------------------------------
# Create a copy of the dataset.
Insurance2 <- Insurance
# Convert the values of response variable from number of claims to number of
# claims per number of policyholders.
Insurance2$Claims <- Insurance2$Claims / Insurance2$Holders
# Set 1 for number of policyholders of all records.
Insurance2$Holders <- 1
# Draw partial dependence plot for number of claims per number of policyholders.
pp <- partial.plot(model, c("Age.numeric", "District"), data = Insurance2)
pp.legend(pp, "topright")
partial.plot
package contains some functions which may be useful.
color.ramp
function creates color vector from factors.
library(partial.plot)
plot(
Petal.Length ~ Sepal.Length, col = color.ramp(Species),
data = iris, pch = 16
)
By similar way to control colors of partial.plot
, we can change colors produced by color.ramp
.
plot(
Petal.Length ~ Sepal.Length, col = color.ramp(Species, rainbow),
data = iris, pch = 16
)
col <- c(
setosa = "darkgreen", versicolor = "blue", virginica = "orange2"
)
plot(
Petal.Length ~ Sepal.Length, col = color.ramp(Species, col),
data = iris, pch = 16
)
gg.colors
creates gg.plot like colors.
barplot(1:10, col = gg.colors(10))
extraporate
argument to extrapolate
.coda
package when the model used MCMC for estimation.glm.nb
and glmer.nb
.lsmeans
to emmeans
due to discontinuation of lsmeans
.lmer()
, glmer()
, glmmadmb()
, glmmML()
, ranger()
, rpart()
and tree()
.add
option.lsmeans
package.lty
, lwd
and pch
for groups.color.ramp.data.frame()
methodcolor.ramp.numeric()
methodcforest()
, ctree()
, randomForest()
and svm()
.