Many variables in the study have missing values for some observations. The missing values arise for a number of reasons, including that data were not collected either by mistake or by design. Design decisions that create missing values include policy determinations to forego the collection of certain data items at particular points in time. Additionally, the university sometimes does not collect certain data items on borrowers for whom it does not make sense to do so. For example, students who withdraw from the university permanently will not have data values for degree attained or graduation date.
When defaulters and non-defaulters have different rates of missing values, examination of the differences can be instructive. In some situations, differences in the percentages of missing values might identify a risk factor associated with default. For example, Previous College Attended is missing for 33 percent of defaulters but for only 15 percent of non-defaulters. It might be that borrowers who do not attend another college before TAMU (and so have missing values for this variable) are not as well prepared for attending TAMU and, therefore, tend to default at a higher rate. Other times, missing values portray a more indirect relationship with default. When administrative decisions mean that data is collected at some points in time but not at other points, the default rate for the missing group can tend to mirror the overall default rate for a particular repayment year or set of years. In other words, if the default rate for the missing group is higher/lower than other groups, it might simply be because the university did not uniformly collect that data item at a time when default rates were generally higher/lower.
The following table shows the rate of missing values for non-defaulters and defaulters. The table does not indicate whether the differences in rates are statistically significant. Some apparent differences might, therefore, be due to random variation. The tables are provided for suggestive purposes only. More research is needed to determine the precise relationships between missing data and default behavior.