6. VARs and DMDs#

This lecture applies computational methods that we learned about in this lecture Singular Value Decomposition to

  • first-order vector autoregressions (VARs)

  • dynamic mode decompositions (DMDs)

  • connections between DMDs and first-order VARs

6.1. First-Order Vector Autoregressions#

We want to fit a first-order vector autoregression

(6.1)#Xt+1=AXt+Cϵt+1,ϵt+1Xt

where ϵt+1 is the time t+1 component of a sequence of i.i.d. m×1 random vectors with mean vector zero and identity covariance matrix and where the m×1 vector Xt is

(6.2)#Xt=[X1,tX2,tXm,t]

and where again denotes complex transposition and Xi,t is variable i at time t.

We want to fit equation (6.1).

Our data are organized in an m×(n+1) matrix X~

X~=[X1X2XnXn+1]

where for t=1,,n+1, the m×1 vector Xt is given by (6.2).

Thus, we want to estimate a system (6.1) that consists of m least squares regressions of everything on one lagged value of everything.

The i’th equation of (6.1) is a regression of Xi,t+1 on the vector Xt.

We proceed as follows.

From X~, we form two m×n matrices

X=[X1X2Xn]

and

X=[X2X3Xn+1]

Here is part of the name of the matrix X and does not indicate matrix transposition.

We use to denote matrix transposition or its extension to complex matrices.

In forming X and X, we have in each case dropped a column from X~, the last column in the case of X, and the first column in the case of X.

Evidently, X and X are both m×n matrices.

We denote the rank of X as pmin(m,n).

Two cases that interest us are

  • n>>m, so that we have many more time series observations n than variables m

  • m>>n, so that we have many more variables m than time series observations n

At a general level that includes both of these special cases, a common formula describes the least squares estimator A^ of A.

But important details differ.

The common formula is

(6.3)#A^=XX+

where X+ is the pseudo-inverse of X.

To read about the Moore-Penrose pseudo-inverse please see Moore-Penrose pseudo-inverse

Applicable formulas for the pseudo-inverse differ for our two cases.

Short-Fat Case:

When n>>m, so that we have many more time series observations n than variables m and when X has linearly independent rows, XX has an inverse and the pseudo-inverse X+ is

X+=X(XX)1

Here X+ is a right-inverse that verifies XX+=Im×m.

In this case, our formula (6.3) for the least-squares estimator of the population matrix of regression coefficients A becomes

(6.4)#A^=XX(XX)1

This formula for least-squares regression coefficients is widely used in econometrics.

It is used to estimate vector autorgressions.

The right side of formula (6.4) is proportional to the empirical cross second moment matrix of Xt+1 and Xt times the inverse of the second moment matrix of Xt.

Tall-Skinny Case:

When m>>n, so that we have many more attributes m than time series observations n and when X has linearly independent columns, XX has an inverse and the pseudo-inverse X+ is

X+=(XX)1X

Here X+ is a left-inverse that verifies X+X=In×n.

In this case, our formula (6.3) for a least-squares estimator of A becomes

(6.5)#A^=X(XX)1X

Please compare formulas (6.4) and (6.5) for A^.

Here we are especially interested in formula (6.5).

The ith row of A^ is an m×1 vector of regression coefficients of Xi,t+1 on Xj,t,j=1,,m.

If we use formula (6.5) to calculate A^X we find that

A^X=X

so that the regression equation fits perfectly.

This is a typical outcome in an underdetermined least-squares model.

To reiterate, in the tall-skinny case (described in Singular Value Decomposition) in which we have a number n of observations that is small relative to the number m of attributes that appear in the vector Xt, we want to fit equation (6.1).

We confront the facts that the least squares estimator is underdetermined and that the regression equation fits perfectly.

To proceed, we’ll want efficiently to calculate the pseudo-inverse X+.

The pseudo-inverse X+ will be a component of our estimator of A.

As our estimator A^ of A we want to form an m×m matrix that solves the least-squares best-fit problem

(6.6)#A^=argminAˇ||XAˇX||F

where ||||F denotes the Frobenius (or Euclidean) norm of a matrix.

The Frobenius norm is defined as

||A||F=i=1mj=1m|Aij|2

The minimizer of the right side of equation (6.6) is

(6.7)#A^=XX+

where the (possibly huge) n×m matrix X+=(XX)1X is again a pseudo-inverse of X.

For some situations that we are interested in, XX can be close to singular, a situation that makes some numerical algorithms be inaccurate.

To acknowledge that possibility, we’ll use efficient algorithms to constructing a reduced-rank approximation of A^ in formula (6.5).

Such an approximation to our vector autoregression will no longer fit perfectly.

The ith row of A^ is an m×1 vector of regression coefficients of Xi,t+1 on Xj,t,j=1,,m.

An efficient way to compute the pseudo-inverse X+ is to start with a singular value decomposition

(6.8)#X=UΣV

where we remind ourselves that for a reduced SVD, X is an m×n matrix of data, U is an m×p matrix, Σ is a p×p matrix, and V is an n×p matrix.

We can efficiently construct the pertinent pseudo-inverse X+ by recognizing the following string of equalities.

(6.9)#X+=(XX)1X=(VΣUUΣV)1VΣU=(VΣΣV)1VΣU=VΣ1Σ1VVΣU=VΣ1U

(Since we are in the m>>n case in which VV=Ip×p in a reduced SVD, we can use the preceding string of equalities for a reduced SVD as well as for a full SVD.)

Thus, we shall construct a pseudo-inverse X+ of X by using a singular value decomposition of X in equation (6.8) to compute

(6.10)#X+=VΣ1U

where the matrix Σ1 is constructed by replacing each non-zero element of Σ with σj1.

We can use formula (6.10) together with formula (6.7) to compute the matrix A^ of regression coefficients.

Thus, our estimator A^=XX+ of the m×m matrix of coefficients A is

(6.11)#A^=XVΣ1U

6.2. Dynamic Mode Decomposition (DMD)#

We turn to the m>>n tall and skinny case associated with Dynamic Mode Decomposition.

Here an m×n+1 data matrix X~ contains many more attributes (or variables) m than time periods n+1.

Dynamic mode decomposition was introduced by [Schmid, 2010],

You can read about Dynamic Mode Decomposition [Kutz et al., 2016] and [Brunton and Kutz, 2019] (section 7.2).

Dynamic Mode Decomposition (DMD) computes a rank r<p approximation to the least squares regression coefficients A^ described by formula (6.11).

We’ll build up gradually to a formulation that is useful in applications.

We’ll do this by describing three alternative representations of our first-order linear dynamic system, i.e., our vector autoregression.

Guide to three representations: In practice, we’ll mainly be interested in Representation 3.

We use the first two representations to present some useful intermediate steps that help us to appreciate what is under the hood of Representation 3.

In applications, we’ll use only a small subset of DMD modes to approximate dynamics.

We use such a small subset of DMD modes to construct a reduced-rank approximation to A.

To do that, we’ll want to use the reduced SVD’s affiliated with representation 3, not the full SVD’s affiliated with representations 1 and 2.

Guide to impatient reader: In our applications, we’ll be using Representation 3.

You might want to skip the stage-setting representations 1 and 2 on first reading.

6.3. Representation 1#

In this representation, we shall use a full SVD of X.

We use the m columns of U, and thus the m rows of U, to define a m×1 vector b~t as

(6.12)#b~t=UXt.

The original data Xt can be represented as

(6.13)#Xt=Ub~t

(Here we use b to remind ourselves that we are creating a basis vector.)

Since we are now using a full SVD, UU=Im×m.

So it follows from equation (6.12) that we can reconstruct Xt from b~t.

In particular,

  • Equation (6.12) serves as an encoder that rotates the m×1 vector Xt to become an m×1 vector b~t

  • Equation (6.13) serves as a decoder that reconstructs the m×1 vector Xt by rotating the m×1 vector b~t

Define a transition matrix for an m×1 basis vector b~t by

(6.14)#A~=UA^U

We can recover A^ from

A^=UA~U

Dynamics of the m×1 basis vector b~t are governed by

b~t+1=A~b~t

To construct forecasts Xt of future values of Xt conditional on X1, we can apply decoders (i.e., rotators) to both sides of this equation and deduce

Xt+1=UA~tUX1

where we use Xt+1,t1 to denote a forecast.

6.4. Representation 2#

This representation is related to one originally proposed by [Schmid, 2010].

It can be regarded as an intermediate step on the way to obtaining a related representation 3 to be presented later

As with Representation 1, we continue to

  • use a full SVD and not a reduced SVD

As we observed and illustrated in a lecture about the Singular Value Decomposition

  • (a) for a full SVD UU=Im×m and UU=Ip×p are both identity matrices

  • (b) for a reduced SVD of X, UU is not an identity matrix.

As we shall see later, a full SVD is too confining for what we ultimately want to do, namely, cope with situations in which UU is not an identity matrix because we use a reduced SVD of X.

But for now, let’s proceed under the assumption that we are using a full SVD so that requirements (a) and (b) are both satisfied.

Form an eigendecomposition of the m×m matrix A~=UA^U defined in equation (6.14):

(6.15)#A~=WΛW1

where Λ is a diagonal matrix of eigenvalues and W is an m×m matrix whose columns are eigenvectors corresponding to rows (eigenvalues) in Λ.

When UU=Im×m, as is true with a full SVD of X, it follows that

(6.16)#A^=UA~U=UWΛW1U

According to equation (6.16), the diagonal matrix Λ contains eigenvalues of A^ and corresponding eigenvectors of A^ are columns of the matrix UW.

It follows that the systematic (i.e., not random) parts of the Xt dynamics captured by our first-order vector autoregressions are described by

Xt+1=UWΛW1UXt

Multiplying both sides of the above equation by W1U gives

W1UXt+1=ΛW1UXt

or

b^t+1=Λb^t

where our encoder is

b^t=W1UXt

and our decoder is

Xt=UWb^t

We can use this representation to construct a predictor Xt+1 of Xt+1 conditional on X1 via:

(6.17)#Xt+1=UWΛtW1UX1

In effect, [Schmid, 2010] defined an m×m matrix Φs as

(6.18)#Φs=UW

and a generalized inverse

(6.19)#Φs+=W1U

[Schmid, 2010] then represented equation (6.17) as

(6.20)#Xt+1=ΦsΛtΦs+X1

Components of the basis vector b^t=W1UXtΦs+Xt are
DMD projected modes.

To understand why they are called projected modes, notice that

Φs+=(ΦsΦs)1Φs

so that the m×p matrix

b^=Φs+X

is a matrix of regression coefficients of the m×n matrix X on the m×p matrix Φs.

We’ll say more about this interpretation in a related context when we discuss representation 3, which was suggested by Tu et al. [Tu et al., 2014].

It is more appropriate to use representation 3 when, as is often the case in practice, we want to use a reduced SVD.

6.5. Representation 3#

Departing from the procedures used to construct Representations 1 and 2, each of which deployed a full SVD, we now use a reduced SVD.

Again, we let pmin(m,n) be the rank of X.

Construct a reduced SVD

X=U~Σ~V~,

where now U~ is m×p, Σ~ is p×p, and V~ is p×n.

Our minimum-norm least-squares approximator of A now has representation

(6.21)#A^=XV~Σ~1U~

Computing Dominant Eigenvectors of A^

We begin by paralleling a step used to construct Representation 1, define a transition matrix for a rotated p×1 state b~t by

(6.22)#A~=U~A^U~

Interpretation as projection coefficients

[Brunton and Kutz, 2022] remark that A~ can be interpreted in terms of a projection of A^ onto the p modes in U~.

To verify this, first note that, because U~U~=I, it follows that

(6.23)#A~=U~A^U~=U~XV~Σ~1U~U~=U~XV~Σ~1U~

Next, we’ll just compute the regression coefficients in a projection of A^ on U~ using a standard least-squares formula

(U~U~)1U~A^=(U~U~)1U~XV~Σ~1U~=U~XV~Σ~1U~=A~.

Thus, we have verified that A~ is a least-squares projection of A^ onto U~.

An Inverse Challenge

Because we are using a reduced SVD, U~U~I.

Consequently,

A^U~A~U~,

so we can’t simply recover A^ from A~ and U~.

A Blind Alley

We can start by hoping for the best and proceeding to construct an eigendecomposition of the p×p matrix A~:

(6.24)#A~=W~ΛW~1

where Λ is a diagonal matrix of p eigenvalues and the columns of W~ are corresponding eigenvectors.

Mimicking our procedure in Representation 2, we cross our fingers and compute an m×p matrix

(6.25)#Φ~s=U~W~

that corresponds to (6.18) for a full SVD.

At this point, where A^ is given by formula (6.21) it is interesting to compute A^Φ~s:

A^Φ~s=(XV~Σ~1U~)(U~W~)=XV~Σ~1W~(U~W~)Λ=Φ~sΛ

That A^Φ~sΦ~sΛ means that, unlike the corresponding situation in Representation 2, columns of Φ~s=U~W~ are not eigenvectors of A^ corresponding to eigenvalues on the diagonal of matix Λ.

An Approach That Works

Continuing our quest for eigenvectors of A^ that we can compute with a reduced SVD, let’s define an m×p matrix Φ as

(6.26)#ΦA^Φ~s=XV~Σ~1W~

It turns out that columns of Φ are eigenvectors of A^.

This is a consequence of a result established by Tu et al. [Tu et al., 2014] that we now present.

Proposition The p columns of Φ are eigenvectors of A^.

Proof: From formula (6.26) we have

A^Φ=(XV~Σ~1U~)(XV~Σ1W~)=XV~Σ~1A~W~=XV~Σ~1W~Λ=ΦΛ

so that

(6.27)#A^Φ=ΦΛ.

Let ϕi be the ith column of Φ and λi be the corresponding i eigenvalue of A~ from decomposition (6.24).

Equating the m×1 vectors that appear on the two sides of equation (6.27) gives

A^ϕi=λiϕi.

This equation confirms that ϕi is an eigenvector of A^ that corresponds to eigenvalue λi of both A~ and A^.

This concludes the proof.

Also see [Brunton and Kutz, 2022] (p. 238)

6.5.1. Decoder of bˇ as a linear projection#

From eigendecomposition (6.27) we can represent A^ as

(6.28)#A^=ΦΛΦ+.

From formula (6.28) we can deduce dynamics of the p×1 vector bˇt:

bˇt+1=Λbˇt

where

(6.29)#bˇt=Φ+Xt

Since the m×p matrix Φ has p linearly independent columns, the generalized inverse of Φ is

Φ+=(ΦΦ)1Φ

and so

(6.30)#bˇ=(ΦΦ)1ΦX

The p×n matrix bˇ is recognizable as a matrix of least squares regression coefficients of the m×n matrix X on the m×p matrix Φ and consequently

(6.31)#Xˇ=Φbˇ

is an m×n matrix of least squares projections of X on Φ.

Variance Decomposition of X

By virtue of the least-squares projection theory discussed in this quantecon lecture https://python-advanced.quantecon.org/orth_proj.html, we can represent X as the sum of the projection Xˇ of X on Φ plus a matrix of errors.

To verify this, note that the least squares projection Xˇ is related to X by

X=Xˇ+ϵ

or

(6.32)#X=Φbˇ+ϵ

where ϵ is an m×n matrix of least squares errors satisfying the least squares orthogonality conditions ϵΦ=0 or

(6.33)#(XΦbˇ)Φ=0m×p

Rearranging the orthogonality conditions (6.33) gives XΦ=bˇΦΦ, which implies formula (6.30).

6.5.2. An Approximation#

We now describe a way to approximate the p×1 vector bˇt instead of using formula (6.29).

In particular, the following argument adapted from [Brunton and Kutz, 2022] (page 240) provides a computationally efficient way to approximate bˇt.

For convenience, we’ll apply the method at time t=1.

For t=1, from equation (6.32) we have

(6.34)#Xˇ1=Φbˇ1

where bˇ1 is a p×1 vector.

Recall from representation 1 above that X1=Ub~1, where b~1 is a time 1 basis vector for representation 1 and U is from the full SVD X=UΣV.

It then follows from equation (6.32) that

Ub~1=XV~Σ~1W~bˇ1+ϵ1

where ϵ1 is a least-squares error vector from equation (6.32).

It follows that

b~1=UXVΣ~1W~bˇ1+Uϵ1

Replacing the error term Uϵ1 by zero, and replacing U from a full SVD of X with U~ from a reduced SVD, we obtain an approximation b^1 to b~1:

b^1=U~XV~Σ~1W~bˇ1

Recall that from equation (6.23), A~=U~XV~Σ~1.

It then follows that

b^1=A~W~bˇ1

and therefore, by the eigendecomposition (6.24) of A~, we have

b^1=W~Λbˇ1

Consequently,

b^1=(W~Λ)1b~1

or

(6.35)#b^1=(W~Λ)1U~X1,

which is a computationally efficient approximation to the following instance of equation (6.29) for the initial vector bˇ1:

(6.36)#bˇ1=Φ+X1

(To highlight that (6.35) is an approximation, users of DMD sometimes call components of basis vector bˇt=Φ+Xt the exact DMD modes and components of b^t=(W~Λ)1U~Xt the approximate modes.)

Conditional on Xt, we can compute a decoded Xˇt+j,j=1,2, from the exact modes via

(6.37)#Xˇt+j=ΦΛjΦ+Xt

or use compute a decoded X^t+j from approximate modes via

(6.38)#X^t+j=ΦΛj(W~Λ)1U~Xt.

We can then use a decoded Xˇt+j or X^t+j to forecast Xt+j.

6.5.3. Using Fewer Modes#

In applications, we’ll actually use only a few modes, often three or less.

Some of the preceding formulas assume that we have retained all p modes associated with singular values of X.

We can adjust our formulas to describe a situation in which we instead retain only the r<p largest singular values.

In that case, we simply replace Σ~ with the appropriate r×r matrix of singular values, U~ with the m×r matrix whose columns correspond to the r largest singular values, and V~ with the n×r matrix whose columns correspond to the r largest singular values.

Counterparts of all of the salient formulas above then apply.

6.6. Source for Some Python Code#

You can find a Python implementation of DMD here:

https://mathlab.sissa.it/pydmd