1 Introduction

The main fitting functions are:

2 Fitting one or several quantiles

Here we are fitting a regression model with an adaptive spline basis to quantile 0.8 of the motorcycle dataset.

library(qgam); library(MASS); library(mgcViz)

set.seed(6436)
fit <- qgam(accel~s(times, k=20, bs="ad"), # <- adaptive basis 
            data = mcycle, 
            err = 0.05,                    # <- smoothing pinball loss
            qu = 0.8)                      # <- chosen quantile
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.8..............done
# "fit" inherits from mgcv::gamObject 

# Convert to "gamViz": nsim = 0 because no simulation method available!
fit <- getViz(fit, nsim = 0)

plot(sm(fit, 1), trans = function(.x){ .x + fit$coefficients[1] }) +
  geom_point(data = mcycle, mapping = aes(x = times, y = accel), colour = "grey") +
  l_fitLine() + l_ciLine(colour = 2)

We might want to fit several quantiles at once. This can be done with mqgam.

quSeq <- seq(0.2, 0.8, length.out = 5)
set.seed(6436)
fit <- mqgam(accel~s(times, k=20, bs="ad"), 
             data = mcycle, 
             err = 0.05,
             qu = quSeq, 
             control = list("tol" = 0.01)) # <- sloppy tolerance to speed-up calibration 
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5......done 
## qu = 0.35.......done 
## qu = 0.65......done 
## qu = 0.2.......done 
## qu = 0.8.........done
# Fit contains a list of gamObjects + other stuff

The output of mqgam can be manipulated using the qdo function. Using qdo we can print out the summary for each quantile, for instance:

# Summary for quantile 0.2
qdo(fit, qu = 0.2, summary)
## 
## Family: elf 
## Link function: identity 
## 
## Formula:
## accel ~ s(times, k = 20, bs = "ad")
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -43.256      1.874  -23.09   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##            edf Ref.df Chi.sq p-value    
## s(times) 8.306  9.526  966.4  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.749   Deviance explained = 71.1%
## -REML = 498.42  Scale est. = 1         n = 133

We can convert a specific quantile fit to a gamViz object and then plot it:

o <- qdo(fit, qu = 0.2, getViz, nsim = 0)
plot(o)

We can look at the residuals:

qdo(fit, 0.2, check.gam)[[1]] 
## 
## Method: REML   Optimizer: outer newton
## full convergence after 14 iterations.
## Gradient range [7.805869e-06,0.0007285186]
## (score 498.4152 & scale 1).
## Hessian positive definite, eigenvalue range [7.805955e-06,1.53885].
## Model rank =  20 / 20 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##             k'   edf k-index p-value
## s(times) 19.00  8.31    0.78    0.17

Does not look good because the model is mispecified: it is based on the pinball loss, not on a probabilistic model of the response. Makes more sense checking fraction of observations falling above or below fit (see exercises).

3 Convergence checking

Let’s simulate some data:

library(qgam)
set.seed(15560)
n <- 1000
x <- rnorm(n, 0, 1); z <- rnorm(n)
X <- cbind(1, x, x^2, z, x*z)
beta <- c(0, 1, 1, 1, 0.5)
y <- drop(X %*% beta) + rnorm(n) 
dataf <- data.frame(cbind(y, x, z))
names(dataf) <- c("y", "x", "z")

We fit qgam model:

fit <- qgam(y ~ s(x) + z + I(x*z), 
            qu = 0.99, 
            err = 0.05,  # <- default
            data = dataf)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.99.................done

Look at calibration loss:

check(fit$calibr, sel = 2)