online-linear-regression

Description

Linear regression is a beautiful algorithm that models relationship between two or more variables. The algorithm estimates linear weights ⍵₀, ⍵₁ … ⍵_n given data (y, X) and assumed model:

y = ⍵₀ + ⍵₁x₁ + … + ⍵_nx_n (1)

The classical approach for calculating ⍵ is minimizing the residual sum of squares:

RSS(⍵) = ∑_i=1..N (y_i - f(x_i))² = (y − X⍵)^T(y − X⍵) (2)

After differentiating the RSS we find it extrema:

⍵* = (X^TX)⁻¹X^Ty (3)

The formula is really great, however there are couple of problems with it from the computational point of view:

Matrix inverse is quite expensive procedure
We have to store all values of x and y
To update the model with new sample we need to iterate over all dataset

Instead of using the classical formula for ⍵ (3) the online-linear-regression is based on the recursive least squares algorithm. It stores only one n×n matrix P and two n-dimensional vectors K and ⍵, where n - number of weights.

K_i+1 = P_ix_i+1(1 + x^T_i+1P_ix_i+1)⁻¹
⍵_i+1 = ⍵_i + K_i+1(y_i+1 - x^T_i+1⍵)
P_i+1 = P_i − K_i+1x^T_i+1P_i

Installation

npm install online-linear-regression

const LinReg = require('online-linear-regression')

To use in browsers, bundle with browserify

Usage

I've tried to make the function interface as simple as possible. However because of inner values and on-line nature it's still needed to initialize the function before usage:

const LinReg = require('online-linear-regression')
const linreg = LinReg()

then just call linreg with 2 arguments (x, y) when training the model or one argument (x) when predicting ŷ.

// Example of polynomial regression
for (let i = 0; i < 5; i += Math.random() * 0.5) {
  let noise = Math.random() * 10 - 5
  let y = 3 * i ** 2 - 2 * i ** 2 + 9 * i + 10 + noise
  linreg([i, i ** 2, i ** 3], y)
}

console.log(linreg([2, 4, 8])) // 43.22

LSE

The problem of finding solutions for a system of independent equations when the number of equations is greater than the number of variables is known as Least Squares Estimation. In our case we have data (X, y) with N pairs of input/ouput values. Each input is a vector x_i of length n-1, output — scalar value of y_i. We try to find weight vector ⍵ of length n with coefficients ⍵₀,⍵₁…⍵_n.
If N > n (i.e. data length is bigger that number of input variables) it's impossible to find one unique solution for ⍵. So we should use the LSE method.

y₀ = ⍵₀ + ⍵₁x₀₁ + … + ⍵_nx_0n
y₁ = ⍵₀ + ⍵₁x₁₁ + … + ⍵_nx_1n
…
y_N = ⍵₀ + ⍵₁x_N1 + … + ⍵_nx_Nn

or in vector form:

y = X⍵

we are trying to find such ⍵ that minimizes the error function (3). To do that let's find the function's derivative:

RSS(⍵) = (y − X⍵)^T(y − X⍵) = (y^T − ⍵^TX^T)(y − X⍵) = y^Ty - ⍵^TX^Ty - y^TX⍵ + ⍵^TX^TX⍵

Understanding that ⍵^TX^Ty is scalar and its transpose (⍵^TX^Ty)^T = y^TX⍵ doesn't change its value:

RSS(⍵) = y^Ty - 2⍵^TX^Ty + ⍵^TX^TX⍵

To differentiate the above formula, we use such math statements:

^∂/_∂⍵ (⍵^TX^Ty) = X^Ty
^∂/_∂⍵ (⍵^TX^TX⍵) = 2X^TX⍵

Now differentiate RSS(⍵) and find a closed-form formula for ⍵ when ^∂RSS(⍵)/_∂⍵ equals to 0 (function extrema)

^∂/_∂⍵ RSS(⍵) = -2X^Ty + 2X^TX⍵ = 0
⍵ = (X^TX)⁻¹X^Ty

RLSE

We are interested in a recursive algorithm for ⍵ such that when new data arrives it'd be possible to update weights without iterating over the dataset. Let's start from the formula (3) for i+1 case:

⍵_i+1 = (X^T_i+1X_i+1)⁻¹X^T_i+1y_i+1 = P_i+1X^T_i+1y_i+1 (4)

Where:

P_i+1 = (X^T_i+1X_i+1)⁻¹ = (X^T_iX_i + x_i+1x^T_i+1)⁻¹ = (P_i + x_i+1x^T_i+1)⁻¹ (5)

Using Woodbury matrix identity we get rid of matrix inversion in (5):

P_i+1 = P_i - αP_ix_i+1x^T_i+1P_i

α is a scalar value equal to

α = (1 + x^T_i+1P_ix_i+1)^-1

Now we can re-write the equation (4) in a recursive way:

⍵_i+1 = P_i+1X^T_i+1y_i+1 = P_i+1(X^T_iy_i + x_i+1y_i+1) = (P_i - αP_ix_i+1x^T_i+1P_i)(X^T_iy_i + x_i+1y_i+1)
…
⍵_i+1 = ⍵_i + k_i+1(y_i+1 - x^T_i+1⍵_i) (6)

where

k_i+1 = P_i+1x_i+1 = αP_ix_i+1 = (1 + x^T_i+1P_ix_i+1)^-1P_ix_i+1 (7)

The last thing we need is an updated formula of P_i+1 based on knowledge of k:

P_i+1 = P_i - αP_ix_i+1x^T_i+1P_i = P_i - k_i+1x^T_i+1P_i (8)

Now we have all needed pieces to calculate weights ⍵ iteratively. For each new sample:

Calculate vector k_i+1 using previous values of the matrix P_i and new input values x₀,x₁…x_n
Calculate vector ⍵_i+1
Update n×n matrix P_i+1