HOWTO calculate EOFs

Empirical Orthogonal Functions (EOFs) (or Principal Component Analysis (PCA) ) is a method of analysing a dataset to find the signals that explain the most variance. The rationale behind this is that the signals with large variance are probably the ones you are interested in.

The EOFs of a dataset are simply the eigenvectors of the covariance matrix of that dataset. Imagine the dataset plotted in some n-dimensional space. The dataset will form some kind of n-dimensional blob. The EOFs of this dataset will then lie along the principal axes of this blob.

EOFs and SVD

Singular Value Decomposition is a neat trick by which we can calculate EOFs. It goes something like this:

Assume we have an M*N matrix of data X that is space * time. In other words a collection of row time vectors. For example:

        t1   t2  t3  t4 ....  tN
        /                       \
     s1|x11 x21 x31 x41 .... xN1|
     s2|x12 x22 x32 x42 .... xN2|
 X=  s2|x13 x23 x33 x43 .... xN3|
        .  .   .  .       .       
        .  .   .  .       .
     sM|x1M x2M x3M x4M .... xNM|
        \                       /

Let's also assume that each time vector (row) has zero time mean.
We form the M*M covariance matrix C by
C=(1/(N-1))X(X^T) (1) Where ^T implies the matrix transpose and N is the number of time points.
Now the the matrix eigenvectors E of C is: CE=EL (2) Where L is a diagonal matrix of eigenvalues.
Hence E is the matrix of EOFs we are looking for and L is the variance associated with each EOF.

SVD says that we can decompose an M*N matrix of data X as follows:

X=QRP^T Where Q is an orthogonal M*N matrix, R an N*N diagonal matrix and P an orthogonal N*N matrix. (If you think about it, this is just representing the dataset in a new coordinate basis Q)
If we substitute this into the above expression (1) we get: C=(1/(N-1))QRP^TPR^TQ^T Which, because of orthogonality (ie P^TP=I, the identity matrix): C=(1/(N-1))QRR^TQ^T If we multiply both sides by Q: CQ=(1/(N-1))QRR^T Which is equivalent to (2) if: L=(1/(N-1))RR^T

E=Q

What's the point of all this? Well IDL has a built in SVD routine which takes in a matrix X and returns Q, R and P. We've shown that Q identical to the eigenvectors of C and hence the EOFs. R is related to the variance associated with each EOF. What about P? P is a matrix of time series. Each time series is associated with a given EOF and described the evolution of that pattern in time within the dataset.

Example + Idl Program

OK, so here's a couple of IDL routines:

EOFcalc.pro calculates the EOFs, WR_EOFcalc.pro reads in a netcdf file and passes it to EOFcalc. It then writes out the result as a netcdf file. EOF_example.idl is a IDL script file that uses both the previous two routines to calculate an example EOF file. To try this first download all three files to the same directory and then do:

idl EOF_example.idl

Here are the Results:

The upper left figure shows the first EOF of Sea level pressure data over a limited region of the globe. It clearly shows the dipole pattern commonly associated with the NAO. The lower left panel shows the time series associated with this EOF pattern (that is the P above). The upper right panel shows what happens when we regress the original SLP data back onto this time series. The time series clearly picks out the NAO within the data.

Other explanations

Halldor Bjornsson and Silvia A. Venegas have written an excellent guide to EOFs.
The classic reference is by Preisendorfer : Principal component analysis in meteorology and oceanography ISBN: 0444430148

Page updated on