Smoothing splines as a stochastic process


The (cubic) smoothing spline is usually introduced as a minimization problem: find the function, \(f\) that minimizes the following functional

\[ J(f) = \sum_{i=1}^N (y_i - f(x_i))^2 + \lambda\int f''(x)^2\,dx \]

that balances fitting the data in the first term and the smoothness of the chosen function in the second, with the tradeoff determined by the smoothing parameter \(\lambda\).1 It turns out that the solution to this minimization is a piecewise cubic polynomial with knots at the data locations. Fitting this function is straightforward but requires some careful application of linear algebra to be efficient (Reinsch).

When the data are measured at regular intervals, such as in many time series applications, the integral can be replaced by the discrete sum

\[ \sum_{t=2}^{N-1} \left(f_{t+1} - 2f_t + f_{t-1}\right)^2. \]

where \(f_t = f(x_t)\) is the value of the unknown function at each of the data points. Our modified objective function now reads

\[ J(f_{1:t}) = \sum_{t=1}^N (y_t - f_t)^2 + \lambda \sum_{t=2}^{N-1} \left(f_{t+1} - 2f_t + f_{t-1}\right)^2 \]

and one way to interpret this optimization problem is as minimizing the sum of squares of the residuals, \(w_t\) and \(v_t\) in

\begin{align} f_{t+1} &= 2f_t - f_{t-1} + w_t \\ y_t &= f_t + v_t \end{align}

In fact, this pair of equations defines a state-space model, which means we can use Kalman filtering and smoothing techniques2, 3 to estimate the posterior mean \(E[f_{1:t}|y_{1:t}]\), which turns out to coincide with the smoothing spline solution. The one tricky point is that standard Kalman smoother implementations require you to specify an initial distribution for \(f_{-1:0}\). The Kalman smoother and the smoothing spline only coincide when this prior distribution becomes very diffuse. This is because the smoothing spline is technically equivalent to an "improper" Bayesian prior on \(f_{1:t}\).1, 4

Reframing the smoothing spline algorithm to a generative model for some data makes it possible to build smoothing splines into more complex models. One could, for example, combine the smoothing spline prior with a non-Gaussian likelihood to model count or classification data.



Wahba (1978). "Improper priors, spline smoothing, and the problem of guarding against model errors in regression." Journal of the Royal Statistical Society B., Vol. 40, No. 3.


Kohn and Ansley (1987). "A new algorithm for spline smoothing based on smoothing a stochastic process." SIAM Journal of Scientific and Statistical Computing, Vol. 8, No. 1.


Shumway and Stoffer (2017). Time Series Analysis and Its Applications.


Lindgren and Rue (2008). "On the second-order random walk model for irregular locations." Scandinavian Journal of Statistics, Vol. 35.