Empirical Orthogonal Function (EOF) software
Principal Component Analysis (PCA) software

David W. Pierce
Climate Research Division
Scripps Institution of Oceanography

Download version 1.1 of the code. Updated 2003-01-30 to fix a bug in the calculation of variance in the non-switched case (typically selected much less often than the switched case). Thanks to Dan Marsh at UCAR for finding this bug. If you are NOT using version 1.1 or later (look in the README file, which will indicate version 1.1 or later if it is) then please upgrade to this version.

This page provides Fortran software for calculating empirical orthogonal functions (EOFs).  EOFs are used for decomposing data sets that have two or more dimensions into pairs of loadings (also called the eigenvectors, or the EOFs) and associated principal components (PCs).  This technique is also called principal component analysis (PCA).

EOFs are widely used in the oceanographic and meteorological sciences.  Typically, you start with a data set that is three-dimensional, extending over latitude, longitude, and time.  You get the EOFs of the data, and are left with a set of two-dimensional loadings (that are in latitude, longitude space) and one-dimensional principal components (that are a function of time).  The EOFs are ranked with respect to the amount of variance in the original data set that they explain.  So, the leading EOF explains the greatest amount of variance that can be captured by one pattern in this way.  This is just a statistical technique, so you must then decide if the pattern means anything physical or not.

Another way that EOFs are often used is a kind of data compression technique.  The EOFs and PCs reconstruct the data -- that is, the original data at any time can be thought of as being composed of the (time-independent) EOF patterns times their associated PC values at that time.  Often you will find that the first N EOFs will capture a substantial part of the total variance of the original data set (say, >95%) when N is much smaller then the number of time entries in the original data set.  Keeping just these N modes, with the associated one-dimensional PCs, then gives you back most of the variance in the original data but at a considerable space savings (about N/(# time entries)).

I really don't want to get into a long description of EOFs.  You can find textbooks that do this already, such as "Statistical Methods in the Atmospheric Sciences," by Daniel S. Wilks, Academic Press, 1995.  Look at section 9.3 in that book for a good description of EOFs.  This page is just to make available EOF software that I use.  It requires LAPACK and the BLAS libraries, which may very well already be on your machine somewhere or can be downloaded from netlib.

The software is in the form of a tar file that has the source code and a test data set.

The test data is observed yearly-averaged sea surface temperatures from the tropical Pacific ocean over the period 1965-1993, from the daSilva data set (based on COADS).  After you get the software and successfully make the test program, you should check to see if the output EOF and PC are correct.  Be aware that the EOF/PC combination can have BOTH signs switched and still be correct.  It's just the product of the two that has a determined sign.  Here is what they look like for me (note: you will have to plot these with your own graphics software!):



Again, remember, that you may get BOTH switched in sign.

Back to my home page .