CWSs expected to have higher concentration levels ("vulnerability") is a good design to also begin acquiring the empirical data needed to begin building and testing models to predict pesticide concentrations in raw and finished drinking water. At this stage in the research process, some SAP members feel we have neither a well-developed theoretical understanding nor the ancillary data needed to specify good models for single pesticides. The proposed survey in combination with existing and new ancillary data for sampled CWSs and their source watersheds should provide a good data set with which to begin preliminary model development and testing. The immediate goal should be to develop and test preliminary models for a high-priority set of pesticides. The Agency should also be thinking ahead to begin building a framework for how measured compounds will be used to proxy model behavior for unmeasured compounds. This planning may well guide the choice of which representative pesticide compounds from general classes should be the focus of initial model development and testing. Finally, to build initial models of the detailed multivariate and spatial form described by the Agency will require new ancillary data for model projections. New survey data or cross-validation based on the existing samples (i.e. building models with half the CWSs, testing the performance on a second half) will be needed to evaluate the performance of the initial modeling effort. Ultimately, if acceptable models can be defined, broader collection of ancillary data will be needed if the model is to be applied generally to predicting chronic exposures to pesticides over time and in other CWSs.
Related to improving models, better models are made by challenging the model predictions. This happens when the situation being considered represents an extrapolation beyond the range of conditions used to build the model. If the survey can extend the range of conditions beyond those currently predicted well by the existing models, then the models can be challenged and when necessary improved. Little new is learned when data are collected where we know the model works well.
If supporting model development and testing is a priority goal, stratification based on model predictions may also be warranted. As pointed out in the International Life Sciences Institute (ILSI) model report, the goals and data needs of model calibration and validation do not always coincide with those of developing good concentration distributions. This suggests there may have to be either a trade-off in design efficiency to accommodate both objectives, or one objective may need to be dropped, dependent on funding.
However, some of the SAP members don't agree with the ILSI findings that regression-based models have utility only in the lowest tier of assessments and that high precision can only come with detailed process models. Empirical modeling, of which regression is one approach, has resolution that is dependent on the amount of data available: the more data available, the higher the resolution. In this case, a lot of data will be generated, and there is the potential for empirical models to be quite effective for higher tier assessments without going to the more complex process models.
The SAP was concerned that at the proposed level of funding, it may not be possible to