Japanese

Introduction

Current version of this document (in English) is not up to date due to my time limitation. For the current functionality of the package, please consult the manual of the partial.plot package in R. I hope I will update this document near future so please be patient. I’m sorry for your inconvenience

This package provides tools for visualizing a result of multivariate analyses. The goal of this package is providing a easy-to-use method to visualize results of many kinds of regression models. This package is still in development and may have bugs which can affect your interpretation of the results. Also I’m still designing the interface of the package so future version may lose some functionality in current version. If you find bugs or if you have any requests, please let me know by GitHub or email: paste following code into R to get my address: rawToChar(as.raw(c(109, 111, 103, 64, 102, 102, 112, 114, 105, 46, 97, 102, 102, 114, 99, 46, 103, 111, 46, 106, 112))).


Installation

To install the package, just copy & paste following commands into the R console.

install.packages(
    c("model.adapter", "partial.plot"), type = "source",
    repos = c(
        "http://florivory.net/R/repos", options()$repos
    )
)

Quick start (for R wizards)

Try this one! You can also see the examples.

# Load dataset.
data(iris)

# Create a prediction model by GLM.
model <- glm(
    Petal.Length ~ (Sepal.Length + Petal.Width) * Species, data = iris
)

# Load the library.
library(partial.plot)

# Draw relationship between sepal length and petal length.
info <- partial.plot(model, c("Sepal.Length", "Species"), pch = 16)

# Add a legend.
pp.legend(info, "topleft")
## Loading required package: rgl

# Draw 3D plot.
info <- partial.plot(
    model, c("Sepal.Length", "Petal.Width"), col = terrain.colors,
    theta = 20
)


Quick start guild (for normal users)

Preparing dataset and an example model

In this time, we use the Fisher’s iris dataset. This dataset having petal length (Petal.Length), petal width (Petal.Width), sepal length (Sepal.Length) and sepal width (Sepal.Width) data of 3 species of Iris (setosa, versicolor, and virginica).

We can load the dataset and see structure of the data set by following code.

# Load the dataset.
data(iris)

# See the structure of the dataset.
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
# Visualize data.
par(mfrow = c(2, 2))
plot(
    Petal.Length ~ Sepal.Length, data = iris,
    pch = as.numeric(Species) + 14, col = as.numeric(Species) + 1
)
plot(
    Petal.Length ~ Sepal.Width, data = iris,
    pch = as.numeric(Species) + 14, col = as.numeric(Species) + 1
)
plot(
    Petal.Length ~ Petal.Width, data = iris,
    pch = as.numeric(Species) + 14, col = as.numeric(Species) + 1
)

In this guide, we will make a model to predict petal length by other variables. For the modeling, we are going to use generalized linear model (GLM). For explanatory variables of the model we will use sepal length, petal width and species. By including interaction terms between sepal length and species, and between petal width and species, we assume that relationships between sepal length and petal length and between petal width and petal length differ between species.

# Make prediction model.
model <- glm(
    Petal.Length ~ (Sepal.Length + Petal.Width) * Species, data = iris
)

# Show summary of the model.
summary(model)
## 
## Call:
## glm(formula = Petal.Length ~ (Sepal.Length + Petal.Width) * Species, 
##     data = iris)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.72681  -0.11668   0.00073   0.12952   0.57611  
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     0.88128    0.47059   1.873  0.06318 .  
## Sepal.Length                    0.09342    0.09695   0.964  0.33693    
## Petal.Width                     0.45959    0.32429   1.417  0.15862    
## Speciesversicolor              -0.80183    0.60440  -1.327  0.18677    
## Speciesvirginica               -0.48261    0.60125  -0.803  0.42351    
## Sepal.Length:Speciesversicolor  0.32734    0.12315   2.658  0.00877 ** 
## Sepal.Length:Speciesvirginica   0.63569    0.11088   5.733 5.76e-08 ***
## Petal.Width:Speciesversicolor   0.80957    0.38007   2.130  0.03490 *  
## Petal.Width:Speciesvirginica   -0.28686    0.34738  -0.826  0.41033    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.05280405)
## 
##     Null deviance: 464.3254  on 149  degrees of freedom
## Residual deviance:   7.4454  on 141  degrees of freedom
## AIC: -4.7749
## 
## Number of Fisher Scoring iterations: 2

Visualize prediction model

Now we are going to visualize the result of the GLM analysis with partial.plot(). Basic usage of partial.plot() function is as follow. The example below visualize predicted relationship between sepal length and petal length of the three species.

# Load library.
library(partial.plot)

# Visualize predicted relationship between sepal length and petal length.
partial.plot(model, c("Sepal.Length", "Species"), pch = 16)

The first argument of the partial.plot() is model object returned by a model function. In this case we used model object returned by glm().

The second argument is a character vector specifying names of explanatory variables whose relationships to response variable are visualized. In the example, "Sepal.Length" and "Species" are specified. For this argument, multiple names of factors can be specified. Those two arguments must be specified.

In addition to the above arguments, we put pch = 16 to make the graph a bit pretty.

If we want to visualize relationship between petal width and petal length, we can specify the arguments as follow.

# Visualize predicted relationship between petal width and petal length.
partial.plot(model, c("Petal.Width", "Species"), pch = 16)

Adding legend

partial.plot() returns information can be used to produce its legend. For example, if we write following command, information about the plot is stored in the info variable.

# Store settings of partial.plot into info.
info <- partial.plot(model, c("Sepal.Length", "Species"), pch = 16)

By passing info to pp.legend() function, we can draw a legend using settings used for partial.plot().

# Add legend.
info <- partial.plot(model, c("Sepal.Length", "Species"), pch = 16)
pp.legend(info, "topleft")

The usage of pp.legend function is almost similar to the legend function. The first argument is result of partial.plot, and other arguments are passed to legend function.


Visualizing slightly complicated models

partial.plot() can visualize more complicated models. In this time, we use CO2 dataset in dataset package of R. THis dataset having CO2 assimilation rate of plants collected from Quebec and Mississippi in several CO2 concentrations after the chilling experiment.


Controlling elements of graph

We can control some parameters of partial.plot to change appearance of the graph.

Removing elements

We can suppress drawing symbols representing residuals by setting FALSE to draw.residual argument.

partial.plot(
    model, c("Sepal.Length", "Species"), draw.residual = FALSE
)

Also, we can suppress drawing predicted partial relationship by specifying FALSE to draw.relationship argument.

partial.plot(
    model, c("Sepal.Length", "Species"), draw.relationship = FALSE,
    pch = 16
)

By setting ‘extrapolate’ to ‘TRUE’, predicted partial relationships are drawn for range of the explanatory variable for all groups.

info <- partial.plot(
    model, c("Sepal.Length", "Species"), pch = 16, extrapolate = TRUE
)
pp.legend(info, "topleft")

Changing colors

We can change colors of the graph by several ways. First way to change colors of the graph is passing rainbow() or heat.colors() functions for col argument of partial.plot(). We can use functions that described in help page of rainbow() function (type ?rainbow to see the list).

# Change colors of the graph.
info <- partial.plot(
    model, c("Sepal.Length", "Species"), pch = 16, col = rainbow
)
# Colors in the legend is automatically adjusted.
pp.legend(info, "topleft")

Second way to change colors is prepare named character vector denoting colors for each factor levels and specify it for col argument.

# Prepare a named color vector.
col <- c(
    setosa = "darkgreen", versicolor = "blue", virginica = "orange2"
)
# Specify the vector for col argument.
info <- partial.plot(
    model, c("Sepal.Length", "Species"), pch = 16, col = col
)
pp.legend(info, "topleft")

Other settings

We can change other graphic parameters like plot function. We changed pch to set symbol of the in the example above. But we also use other graphic parameters.

# Set other graphic parameters
info <- partial.plot(
    model, c("Sepal.Length", "Species"), pch = 16, cex = 1.5,
    xlab = "Sepal length (mm)", ylab = "Petal Length"
)
pp.legend(info, "topleft")


Supported models

At this moment, (maybe) lm(), glm(), glm.nb(), lme(), lmer(), glmer(), glmer.nb(), glmmadmb(), MCMCglmm(), cforest(), ctree(), svm(), randomForest(), ranger(), rpart(), tree() are supported. Support for the models are provided by the model.adapter package.


Details of the calculation

For the estimation of the relationship between the explanatory and response variable, partial.plot depends on emmeans package. If the specified model is supported by emmeans package, the predicted relationship is calculated by emmeans. Currently, emmeans is used for lm, glm, glm.nb, lme, lmer, glmer, glmer.nb, glmmadmb and MCMCglmm. When the model is supported by emmeans, response variables other than focal response variable(s) are fixed at their mean and the relationship is estimated.

When the model is not supported by emmeans, the relationship is calculated by partial.plot. In this case, conditional relationship on the dataset is calculated: for the values of response variables other than focal response variables(s), all values in the dataset is used and mean values of predicted response variable is used for the relationship.


Model specific information

glmer.nb

When the formula of a model having functions such as offset() and log(), partial.plot stops because it can’t obtain data from the model object. To avoid this problem, please specify original data used for the modeling to the data argument.

# Example
model <- glmer.nb(y ~ log(x) + (1 | random), data = dat)
partial.plot(model, "x", data = dat)

Models with offset term

‘partial.plot’ treats offset term(s) as follows:

Prediction and confidence intervals
Similar with explanatory variables, partial.plot uses mean value(s) for offset term(s) for prediction. For example, when the response variable of a model is a number of individuals and the offset term is study area, prediction and confidence interval are calculated using the mean value of study area.
Partial residual
Original values of the variable used for the offset term during model construction are used for calculation of partial residuals.

Due to these specification, prediction and partial residuals don’t match each other in some situations. Two methods can solve the problem: 1) disabling partial residual and 2) prepare special dataset for plotting. An easier method is disabling plotting partial residuals by specifying draw.residual = FALSE to partial.plot. Of course this method is not a better way, but would be used for some situations. A better, but more complicated method is calculating response variable/offset term before running partial.plot as shown in the following example.

#-----------------------------------------------------------------------------
#   Prepare dataset and model.
#-----------------------------------------------------------------------------

# Load 'Insurance' data containing number of claims from MASS package.
# For the details, see '?Insurance' after loading MASS package.
utils::data(Insurance, package="MASS")

# Convert age from factor to numeric.
Insurance$Age.numeric <- c(25, 27, 32.5, 35)[as.numeric(Insurance$Age)]

# Model the number of insurance claims by district, type of cars and age.
# The number of policyholders is used for the offset term.
model <- glm(
    Claims ~ District + Group + Age.numeric + offset(log(Holders)),
    data = Insurance, family = poisson
)

#-----------------------------------------------------------------------------
#   Plotting 1: case of failure.
#-----------------------------------------------------------------------------

# Simply draw partial dependence plot.
# Predicted relationship and partial residual don't match each other.
pp <- partial.plot(model, c("Age.numeric", "Group"))
pp.legend(pp, "topright")

#-----------------------------------------------------------------------------
#   Solution 1: disable drawing partial residual.
#   In this case, predicted relationship with mean number of policyholders
#   is drawn.
#-----------------------------------------------------------------------------
pp <- partial.plot(model, c("Age.numeric", "Group"), draw.residual = FALSE)
pp.legend(pp, "topright")

#-----------------------------------------------------------------------------
#   Solution 2: use response variable/offset term.
#   In this case, we calculate claims count/number of policyholders and set 1
#   for number of policyholders of all records  before plotting.
#-----------------------------------------------------------------------------

# Create a copy of the dataset.
Insurance2 <- Insurance

# Convert the values of response variable from number of claims to number of
# claims per number of policyholders.
Insurance2$Claims <- Insurance2$Claims / Insurance2$Holders

# Set 1 for number of policyholders of all records.
Insurance2$Holders <- 1

# Draw partial dependence plot for number of claims per number of policyholders.
pp <- partial.plot(model, c("Age.numeric", "District"), data = Insurance2)
pp.legend(pp, "topright")


Known issues


Other functions

partial.plot package contains some functions which may be useful.

color.ramp()

color.ramp function creates color vector from factors.

library(partial.plot)
plot(
    Petal.Length ~ Sepal.Length, col = color.ramp(Species),
    data = iris, pch = 16
)

By similar way to control colors of partial.plot, we can change colors produced by color.ramp.

plot(
    Petal.Length ~ Sepal.Length, col = color.ramp(Species, rainbow),
    data = iris, pch = 16
)

col <- c(
    setosa = "darkgreen", versicolor = "blue", virginica = "orange2"
)
plot(
    Petal.Length ~ Sepal.Length, col = color.ramp(Species, col),
    data = iris, pch = 16
)

gg.colors()

gg.colors creates gg.plot like colors.

barplot(1:10, col = gg.colors(10))


Version history