Econometric Benchmarks
(Last updated: 4/29/2012)
Here are some standard benchmark datasets and models for testing
the accuracy of TSP or other econometrics packages.
Most of the models are just classic published results; others
are designed to be numerically difficult (designated as [difficult] here).
Jump down to section:
 Basic Statistics, OLS, NLS
 Simultaneous equations: SUR, 2SLS, LIML, 3SLS, FIML
 AR(1) regression
 unit root, cointegration testing
 DurbinWatson test  tables of critical values
 ARIMA (Exact ML)
 GARCH
 Logit, Probit
 Count data: Poisson, NegBin1, NegBin2, Ordered
Probit, panel data
 Panel data: Static models
 Panel data: Dynamic models
Basic Statistics, Linear and Nonlinear Regression
 linear regression:
Longley, JASA (1967) [difficult]
longley.tsp
text data, benchmark, TSP code
longley.wks
longley.xls
spreadsheet version of data
Note: to download .WKS and .XLS files, try
the right mouse button on some browsers.
longley.htm Some brief research
(2/1997) on obtaining a Longley coefficient vector accurate to
11 digits.
 missing data, means, correlation, linear regression:
Wilkinson's "Statistics Quiz" [difficult]
nasty.tsp text data, code
wilk.txt
Full original text of the "Statistics Quiz",
including the correct answers.
nasty.wks
nasty.xls spreadsheet data
For comparisons of how
several packages have fared on this benchmark, see:
 Sawitzki, G. "Testing numerical reliability of data
analysis systems,"
Computational Statistics and Data Analysis 18,
1994, pp.269286.
 National Institutes of Standards and Technology (NIST)
"Statistical Reference Datasets"
NIST StRD web page
Contains many test problems and certified results for univariate
statistics, anova, linear regression, and nonlinear regression.
 Univariate statistics [difficult]
 ANOVA [difficult]
 ano.tsp text data, benchmarks,
code to run 11 models
 Table of results for Sun 4
TSP obtains at least 7 correct digits for the
Fstatistic in 8 of the 11 models.

nistanow.zip
siresist.wks
agatomic.wks
simonles.wks
nistanox.zip
siresist.xls
agatomic.xls
simonles.xls
spreadsheet data (including SimonLesage 1,4,7).
SimonLesage 3,6,9 are too large (18,009 obs) for
.xls files, and 2,5,8 are large but similar to 1,4,7.
 Ordinary Least Squares [difficult]
 Nonlinear Least Squares [difficult]
For comparisons of how
several packages have fared on these benchmarks, see:
 McCullough, Bruce D., "Assessing the Reliability of
Statistical Software: Part I,"
The American Statistician 52, November 1998, pp.355363.
 McCullough, Bruce D., "Assessing the Reliability of
Statistical Software: Part II,"
The American Statistician 53, May 1999, pp.149159.
 McCullough, Bruce D., "Econometric Software Reliability:
EViews, LIMDEP, SHAZAM, and TSP,"
Journal of Applied Econometrics 14, 1999, pp.191202.
 Lilien, David M., "Econometric Software Reliability and
Nonlinear Estimation in EViews: Comment,"
Journal of Applied Econometrics 15, 2001, pp.107110.
 McCullough, Bruce D., "Reply,"
Journal of Applied Econometrics 15, 2001, pp.111+
 McCullough, Bruce D. and Wilson, Berry,
"On the accuracy of statistical procedures in Microsoft Excel 97,"
Computational Statistics and Data Analysis 31, 1999, pp.2737.
Simultaneous Equations
 SUR (Seemingly Unrelated Regressions), iterated SUR,
(iterated) SUR with crossequation constraints:
Grunfeld investment data, 2 firms, following Theil (1971)
grunsur2.tsp text data, benchmarks, code
grunsur2.wks
grunsur2.xls spreadsheet data
 SUR (Seemingly Unrelated Regressions), iterated SUR,
(iterated) SUR with crossequation constraints, singular equation
system:
Berndt and Christensen, US manufacturing labor/capital data,
3 equations, following Berndt and Savin (1975)
bs75d.tsp text data, benchmarks, code
 2SLS, LIML, 3SLS, FIML:
Klein I model
Combined in a single file, with references to original
articles with correct and incorrect results
klein.tsp text data, benchmarks, code
klein.wks
klein.xls spreadsheet data
Individual files
 2SLS and 3SLS: Klein I
klein3s.tsp text data, code,
references
 LIML: Klein I
kleinlml.tsp text data, code,
references
kleinc2.tsp LIML on consumption
equation, reproduced with FIML. Also shows LogL computation.
 FIML: Klein I
kleinfml.tsp text data, benchmark,
code, references
kleinfm2.tsp high precision
benchmark, including 3 different types of standard errors
 2SLS, LIML, 3SLS, iterated 3SLS, FIML:
Kmenta's simple supply/demand model
kmenta.tsp text data, benchmarks, code
kmenta.wks
kmenta.xls spreadsheet data
 nonlinear FIML:
Bodkin and Klein model
bodkin.tsp text data, benchmarks, code
Time Series
 AR(1) regression models (conditional and exact ML)
Changes to TSP's AR1 command 8/1998
Note: the AR1 code used in the examples below does not take
advantage of the 8/1998 changes. It uses the older grid search
and iteration methods before they were combined on 8/1998.
 AR(1) conditional ML via grid search (HildrethLu),
AR(1) conditional ML via full iteration (CochranOrcutt),
AR(1) exact ML via full iteration (BeachMacKinnon),
DurbinWatson statistic and its exact Pvalue:
Bartlett pears data  analyzed by Hildreth and Lu,
and by Henshaw
pears.tsp text data, benchmark, code
pears.wks
pears.xls spreadsheet data
 AR(1) conditional and exact ML:
Longley data  following Lovell and Selover [difficult]
longar1.tsp text data, benchmark, code
longley.wks
longley.xls spreadsheet data
 AR(1) and AR(2) conditional and exact ML:
Klein I consumption function 
following Beach and MacKinnon(1978)
kleinar2.tsp text data, benchmark, code
 AR(1) with lagged dependent variable  multiple optima
and consistent standard errors,
testing for autocorrelation in OLS with a lagged
dependent variable  DurbinWatson, Durbin's h, Durbin's m:
electric utility demand, NERC dataset, Berndt (1990)
ar1lag.tsp text data, benchmark, code
ar1lag.wks
ar1lag.xls spreadsheet data
 AR(1) conditional and exact ML, with multiple optima:
Dufour, Gaudry and Liem, also Lovell and Selover [difficult]
dufour.tsp text data, benchmark, code
dufour.wks
dufour.xls spreadsheet data
 unit root testing
 DurbinWatson test
 univariate ARIMA models (exact ML)
pure AR (see also AR(1) above)
 AR(1) and AR(2) conditional and exact ML:
Klein I consumption function 
following Beach and MacKinnon(1978)
kleinar2.tsp text data, benchmark, code
 AR(2) and AR(3) with constant:
BoxJenkins series E (sunspots)
bje.tsp text data, benchmark, code
bje.wks
bje.xls spreadsheet data
pure MA
 MA(1)  actually ARIMA(0,1,1),
BoxJenkins series A
bja.tsp text data, benchmark, code
(includes general comments on ARIMA benchmarks)
bja.wks
bja.xls spreadsheet data
ma1.tsp handcoded MA(1) LogL
The handcoded version uses:
 MA(2)  actually ARIMA(0,2,2):
BoxJenkins series C, 2 subsets, following Osborn (1976)
bjc.tsp text data, benchmark, code
bjc.wks
bjc.xls spreadsheet data
 ARIMA(0,0, 1,1, 1,1)  multiplicative seasonal MA:
BoxJenkins series G (monthly airline passengers)
bjg.tsp text data, benchmark, code
bjg.wks
bjg.xls spreadsheet data
 ARIMA(0,0, 1,1, 1,1)  multiplicative seasonal MA:
Pankratz (1991) series 12 (log KWH), 23 (housing starts),
24 (housing sales)  follows Newbold, Agiakloglou, and Miller
(1994)
bjnam.tsp text data, benchmark, code
bjnama.wks
bjnambc.wks
bjnama.xls
bjnambc.xls spreadsheet data
mixed ARMA
 ARMA(1,1) with constant (exact ML):
BoxJenkins series A
bja.tsp text data, benchmark, code
(same as above under MA(1))
bja.wks
bja.xls spreadsheet data
bjacls.tsp (conditional ML 
2 benchmarks)
 ARIMA(2,1,2):
illustrates multiple local optima, for different starting
values.
original data from Campbell and Mankiw  log(Real GNP quarterly)
as given by Perron
follows Newbold, Agiakloglou, and Miller (1994),
who actually used an extended version of these data.
The three local optima NAM found also seem to exist (at
roughly the same parameter values) for the original data.
bjrg.tsp text data, code
other
 Partial AutoCorrelation function  compares
YuleWalker, OLS, Burg, and Exact ML methods:
BoxJenkins series F (chemical yields)
bjfpac.tsp text data, benchmarks, code
bjf.wks
bjf.xls spreadsheet data
 GARCH models
 GARCH(1,1) with constant:
Bollerslev and Ghysels (1996) daily DeutschmarkBritish Pound
exchange rate
uses TSP 4.4 features as of 5/11/98:
different init options
for h(t) and e(t)**2, iteration with analytic second
derivatives, and QMLE standard errors
bg44.tsp code, 6digit benchmarks
(for 3 different presample initialization options)
dmbp.dat text data
garch11w.zip
garch11x.zip (zipped) spreadsheet data
 bgfcp.tsp Same model as above, but gives
a full 11digit solution (for one of the initialization options).
Verified with software from Fiorentini, Calzolari, and Panattoni,
as well as the independent TSP code in bg11 below.
 bg11i.tsp
Same model as above, but reproduces SEs from Information Matrix
and BollerslevWooldridge, as given by FCP software.
Uses g11s.tsp below to compute recursive first derivatives.
 bg11.tsp
GARCH(1,1) with constant
Bollerslev and Ghysels (1996) daily DeutschmarkBritish Pound
exchange rate
same as bg44.tsp, but can run on earlier versions of TSP
(uses a lot of complicated code to evaluate the analytic
second derivatives by hand).
 Asymmetric EGARCH(1,1) with constant:
Bollerslev and Ghysels (1996) daily DeutschmarkBritish Pound
exchange rate
uses numeric second derivatives feature released in
TSP 4.5 on 3/30/00
egarch.tsp code, 6digit benchmarks
(for 2 different presample initialization options)
 IGARCH(1,1) with constant:
Bollerslev and Ghysels (1996) daily DeutschmarkBritish Pound
exchange rate
uses ARCH command to compute unrestricted derivatives, and
then does the restricted (alpha0=0, alpha1=1beta1) iterations.
bgi.tsp code, 5digit benchmarks
Qualitative Dependent Variables
 binary Logit, Probit, Maximum Score Estimation:
Spector and Mazzeo economic education data (32 obs. x 4 vars.),
following Greene (1993, 1997)
greenelp.tsp text data, benchmarks, code
greenelp.wks
greenelp.xls spreadsheet data
 Trinomial Probit:
Daganzo transportation choice data (50 obs. x 4 vars.),
following Bunch (1991)
daganzo.tsp benchmarks, code
for 2 different normalizations of Sigma matrix; comparison
with Bunch and Limdep 7.0 beta (GHK simulator) results
daganzo.txt text data
daganzo.wks spreadsheet data
daganzo.lim Limdep/Nlogit code
 Poisson, Negative Binomial 1 and 2, Ordered Probit:
Doctor Visits model (5190 obs. x 13 vars.),
following
Cameron and Trivedi (1986, 1998)
count.tsp benchmarks, code
counta.zip (zipped) text data
countw.zip
countx.zip (zipped) spreadsheet data
 Poisson on panel data  fixed and random effects,
Patents and R&D model (346 obs. x 18 vars.),
following Cameron and Trivedi (1998)
poispan.tsp benchmarks, code
countpa.zip (zipped) text data
countpw.zip
countpx.zip (zipped) spreadsheet data
 Negative Binomial 1 on panel data  fixed and random effects,
Patents and R&D model (346 obs. x 18 vars.),
following Cameron and Trivedi (1998)
nbpan.tsp benchmarks, code
countpa.zip (zipped) text data
(same as above)
countpw.zip
countpx.zip (zipped) spreadsheet data
 Frontier production function
pure cross section model,
same as Example 9.6.4 in TSP User's Guide
EG1 model from Frontier program (60 obs. x 3 vars.),
following Coelli
fronteg1.tsp benchmarks, code
fronteg1.txt text data
Panel Data Models  Static
 Pooled OLS, Fixed Individual Effects, Random Individual Effects (ML),
Fixed Time Effects, Fixed Individual & Time Effects,
Random Time Effects (ML), Random Individual & Time Effects (ML)
Grunfeld investment data, balanced, 10 firms, 20 years
following Nerlove (2000), Baltagi (1995)
Different versions of Grunfeld data
grfere.tsp benchmarks, code
grunfeld.dat text data
grunfeld.wks
grunfeld.xls spreadsheet data
 Pooled OLS, Fixed Individual Effects, Random Individual Effects (ML),
Random Individual & Time Effects (ML), Fixed Individual and Time
Effects, nested Twoway ML
Datasets and benchmarks from Baltagi, Badi H., "Econometric
Analysis of Panel Data", second edition, 2001.
official book web site
TSP results generally match the book to 3 digits printed;
sometimes there are small differences in standard errors
of the nonlinear models.
TSP results use an updated (10/2003) PANEL(REI,REIT) command
 Coefficients varying by i
Proc Panbi  allows any set of coefficients to vary by i
Efficient computation by sweeping out effects in a loop,
so hundreds or thousands of different coefficients can be
estimated without inverting any large matrices.
Grunfeld investment data (n=10)
grbi.tsp benchmarks, code
 Random Individual Effects plus AR(1) (exact ML)
Grunfeld investment data
follows Baltagi and Li (1991)
grar1rei.tsp code
grar1rei.out TSP results
Although we believe the above results are correct for the
Grunfeld data, it would be preferable to replicate some
published results, such as those cited in Baltagi(2001), p.84.
 Diagonal heteroskedasticity, AR(1) (exact ML), diagonal het
with AR(1)
Grunfeld investment data
grhetar.tsp benchmarks, code
 SUR by firm (since N < T), SUR with AR(1) (exact ML)
Grunfeld investment data
grsurar.tsp benchmarks, code
 Robust (HAC) SEs for OLS coefficients, using diagonal and
blockdiagonal patterns for Omega. Most code works for
unbalanced data. Includes BeckKatz SEs, implemented for
balanced data, and which do not handle conditional heteroskedasticity.
Grunfeld investment data
ghac.tsp code
 Score/LM tests for AR(1) and Random Individual Effects,
robust to local misspecification.
Greene version of Grunfeld investment data (5 firms)
Different versions of Grunfeld data
following Bera et al (2001)
gscore.tsp benchmark, code
grun5.dat text data
gscore10.tsp same tests,
but computed on the complete/original Grunfeld data (10 firms)
glr.tsp equivalent Likelihood Ratio and
Wald tests, on all 10 firms
glr.out full output file for glr.tsp
 LM tests for Random Individual and Time Effects.
Breusch and Pagan (1980).
Grunfeld investment data (full dataset)
following Baltagi (2001)
grbpre.tsp benchmark, code
Panel Data Models  Dynamic
 OLS, Fixed Effects, Random Effects (FGLS and Conditional ML),
Between, AndersonHsiao 2SLS (AHL, AHD)
Penn World Table, growth model, balanced data
following Nerlove (1999); see full citations in benchmark file
penngrow.tsp benchmarks, code
penngrow.txt text data
penngrow.wks
penngrow.xls spreadsheet data
 Fixed Effects Bias Correction (also does AHL)
Penn World Table, growth model, balanced data
following Kiviet (1995), Judson and Owen (1996)
lsdvc.tsp benchmarks, code
 OLS, Fixed Effects, AndersonHsiao 2SLS (AHL, AHD)
following Arellano and Bond (1981)  unbalanced data
arelbond.tsp benchmarks, code
(permission to distribute data pending)
arelbond.txt text data
arelbond.wks
arelbond.xls spreadsheet data
 GMM firstdifference model, written by Yoshitsugu Kitazawa
following Arellano and Bond (1981)  unbalanced data
DPD page benchmarks, code
 PMG (Pooled Mean Group)
following Pesaran, Shin and Smith (1999)  unbalanced data
in this example, intercepts and many slopes vary by individual
pmg_jasa.tsp benchmarks, code
pmg_jasa.xls spreadsheet data
Random Number Generation
 Checks the new uniform generator in TSP 4.5,
by summing the first 10,000,000 variates.
The answer matches that given by L'Ecuyer (1999).
rantst.tsp
If you have comments on these benchmarks, please send
email to Clint Cummins:
clint@leland.stanford.edu.