We also address the missing data problem by using multiple imputation (see
Rubin 1987, Schafer and Olsen 1998, and Schafer 1999 for more details).19 The multiple
imputation technique essentially replaces each missing value in the data with a set of
plausible values resulting in separate datasets that include the true values for nonmissing
observations and the imputed variables for missing observations. The imputations are
made by examining correlations between all available independent variables and placing
restrictions on minimum and maximum values and rounding.20 The variables are
assumed to have a multivariate normal distribution. Logit or linear regressions are then
run on five separately imputed datasets.21 The results from the five runs are combined for
inference and adjustments are made for sampling variance. The resulting coefficient
estimates summarize this information and their standard errors capture the variability of
estimates across the five runs, which differs from the typical overstatement of the
statistical precision of estimates from single imputation methods. We report the multiple
imputation coefficient estimates and their standard errors in Table 3. Despite the large
increase in sample size, the estimates are similar to those reported in Table 2. Thus, the
removal of observations with missing data does not appear to overly affect our results.
19 The technique has been discussed recently in the Economics literature (Brownstone and Valetta 2001) and has been used to impute income and wealth variables in the Survey of Consumer Finances (Kennickell 1998).
20 Information from all of the independent variables in the main specification, in addition to information on financial capital, industry, and start year, was used in the correlations.
The gains in efficiency are small after increasing the number of imputations above five