Both specific and general deterrence analyses should be typically examined on a sector-by-sector basis. Since a key component of the statistical identification and statistical intuition in the specific deterrence models is a behavioral comparison of facility/time pairs with an inspection or enforcement action to facility/time pairs without an inspection or an enforcement action, specific deterrence models should typically be considered one sector at a time. This restriction ensures that comparison facilities share roughly similar characteristics. Similarly, a key component of the statistical identification in the general deterrence models is a behavioral comparison of facility/time pairs with enforcement actions on neighboring facilities to facility/time pairs without an enforcement action on neighboring facilities. So, general deterrence models should also typically be considered one sector at a time.

When the dependent variable is continuous, like emissions or discharges, ordinary linear regression models are most appropriate. The values of the explanatory variables for a given observation predict a corresponding average or expected emissions level. For example, all else equal, we would expect a facility’s average emissions to be lower following an enforcement action. When the dependent variable is discrete, however, like a 0/1 compliance status or non-compliance status indicator, non-linear models are more appropriate than linear regression models. When the dependent variable is limited to take on a value of 0 or 1, ordinary linear regressions are known as linear probability models. The values of the explanatory variables for a given observation predict a corresponding average or expected probability of compliance. For example, all else equal, we would expect a facility’s probability of compliance to be higher following an enforcement action. Linear probability models exhibit at least two well-known weaknesses. First, predicted values from a linear regression may lie outside of the 0/1 range. For example, the predicted probability of compliance from a linear probability model may be negative or greater than 1. Second, linear probability models force the impact of an explanatory variable to be the same for all values of the dependent variable. For example, the change in the predicted probability of compliance due to an enforcement action is the same for a facility with a low probability of compliance and a facility with a high probability of compliance. Non-linear models, like the logit model, overcome these difficulties and are therefore used here when the dependent variable is discrete.

# Three regression approaches

This brief sub-section is for the more technically inclined reader, and can be skipped if preferred. Here, we discuss the three regression approaches proposed in the next two sections. Each regression approach maps to the basic empirical model discussed above, and all are discussed in the related white paper, easily implemented with common statistical packages, and are referenced in most basic statistics/econometrics textbooks. The key difference between the three models is their approach to addressing the facility- specific regression parameter α_{i}. Recall that the key function of this facility-specific

compliance for a given period of time (“aggregate BOD and TSS discharges within a state fall approximately 7 percent in the year following a sanction within that state.”) without reference to a historical period.

7