6th Oct 2023
In our project, we aimed to predict diabetes by analyzing data on inactivity, obesity levels, and corresponding diabetes rates in different counties. We initially attempted simple linear regression models but found them insufficient, noticing heteroskedasticity in the data. Upon closer inspection, we determined that a quadratic model, enhanced by an interaction term, was more accurate for predicting diabetes.
We identified counties with complete data for all three parameters and tested various models, including the quadratic one, using cross-validation to assess their test errors. With more data, a broader trend might emerge, allowing for a simpler and more accurate model. This summarizes our project’s findings, suggesting the potential for further exploration with additional data.