# Modelling individual health status

We estimate a joint model of labour supply and chronic disease. We begin by estimating 3 separate univariate probit models for labour force participation, diabetes and cardiovascular disease. To allow for the potential endogeneity associated with unobserved heterogeneity and the simultaneous determination of diabetes and cardiovascular disease we estimate a 3 equation recursive simultaneous multivariate probit model. The simultaneous equation model allows us to test for correlation between the residuals and ensures consistency of the estimated effects.

# We define the unobserved latent variables for labour supply

and for cardiovascular disease

status

and diabetes status

=x’_{1 }

_{1}+

+

==x’_{2 }

2

+a

2

+e_{1 }

+e

3

==x’_{3 3 }

+e_{3 }

Y_{1}=1 if

=>0, 0 otherwise

Y_{2}=1 if

>0, 0 otherwise

Y_{3}=1 if

=>0, 0 otherwise

(1)

(2)

(3)

We begin by assuming that e_{1}, e_{2}, and e_{3 }are independent and estimate equations 1-3 using s e p a r a t e u n i v a r i a t e p r o b i t e q u a t i o n s . W e t h e n a s s u m e t h a t e 1 , e 2 , a n d e 3 a r e j o i n t n o r m a l l y . T h a t i s [ e 1 , e 2 , e 3 ] ’ ~ M V N ( 0 , ) with the 3 distributed with means zero, and covariance matrix

variances assumed equal to 1. A univariate approach ignoring the potentially non zero off-

diagonal elements in

will produce inconsistent coefficient estimates where correlation across

the error terms exists (Maddala 1983). Note that this is a recursive simultaneous equations model as the unobserved propensity to have diabetes ( ) is a regressor in the equation for cardiovascular disease ( ), and both are regressors in the determinants of labour force participation ( ). It has been shown that the endogenous variables on the right hand side of a bivariate recursive simultaneous equation binary model (and the FIML estimator of the simultaneous equation model) can be ignored (Greene 2003) (Maddala 1983)). The logic carries over to maximum likelihood estimation of this multivariate recursive simultaneous model. The recursive multivariate system of equations satisfies exclusion requirements. In addition the cardiovascular disease equation (2) includes the same right hand side variables as the labour force participation equation (1), but with the addition of smoking status, weight, exercise, and diabetes status. The diabetes equation (3) includes the same exogenous variables as equation 2

but also includes a variable that indicate whether the individual’s mother or father had diabetes.

We use a simultaneous equation multivariate probit analysis estimated by simulated maximum likelihood implemented in STATA10.0 with the MVPROBIT routine. The variance-covariance matrix of the cross-equation error terms was estimated and the null hypothesis that _{12}= _{13}= =0 was tested with a Wald test at the 10% level. If the null hypothesis cannot be rejected the model consists of independent probit equations that can be estimated separately. 23

We are primarily interested in inferences about the effect of differences in disease status on labour force participation. The estimation and interpretation of these conditional treatment effects, their standard errors, and the effect of regressors such as lifestyle and genetics is complicated both by the non continuous nature of the many of the variables and by the potential correlation of residuals across equations. For example the effect of being a smoker on labour force participation compared to the counterfactual of not smoking involves first an increase in the risk of each chronic disease, and then an indirect effect on labour force participation through the effect from

Chronic disease and labour force participation in Australia: an endogenous multivariate probit analysis of clinical prevalence data

5