<- read_dta("C:/Users/buste/OneDrive/Desktop/Modeling/analysis1.dta") %>%
crossdata select(ecghr,age,bmi,htn) %>%
na.omit()
## Sets method of cross-validation to use leave-one-out
<- trainControl(method = "LOOCV")
method
## Example model created to demonstrate leave-one-out
<- train(as.factor(htn) ~ age + bmi + ecghr,
crossmodelfull data = crossdata,
method = "glm",
trControl = method)
<- train(as.factor(htn) ~ age + bmi,
crossmodel1 data = crossdata,
method = "glm",
trControl = method)
<- train(as.factor(htn) ~ age + ecghr,
crossmodel2 data = crossdata,
method = "glm",
trControl = method)
<- train(as.factor(htn) ~ bmi + ecghr,
crossmodel3 data = crossdata,
method = "glm",
trControl = method)
4.4.3 Leave-One-Out Cross-Validation
Training and Testing
print(crossmodelfull)
Generalized Linear Model
2646 samples
3 predictor
2 classes: '0', '1'
No pre-processing
Resampling: Leave-One-Out Cross-Validation
Summary of sample sizes: 2645, 2645, 2645, 2645, 2645, 2645, ...
Resampling results:
Accuracy Kappa
0.7093726 0.4132869
print(crossmodel1)
Generalized Linear Model
2646 samples
2 predictor
2 classes: '0', '1'
No pre-processing
Resampling: Leave-One-Out Cross-Validation
Summary of sample sizes: 2645, 2645, 2645, 2645, 2645, 2645, ...
Resampling results:
Accuracy Kappa
0.7071051 0.4090227
print(crossmodel2)
Generalized Linear Model
2646 samples
2 predictor
2 classes: '0', '1'
No pre-processing
Resampling: Leave-One-Out Cross-Validation
Summary of sample sizes: 2645, 2645, 2645, 2645, 2645, 2645, ...
Resampling results:
Accuracy Kappa
0.685941 0.366857
print(crossmodel3)
Generalized Linear Model
2646 samples
2 predictor
2 classes: '0', '1'
No pre-processing
Resampling: Leave-One-Out Cross-Validation
Summary of sample sizes: 2645, 2645, 2645, 2645, 2645, 2645, ...
Resampling results:
Accuracy Kappa
0.5684051 0.1202561
Results
Interpretation of results:
No pre-processing. We did not scale the data before fitting the models.
The re-sampling method we used to evaluate the model was leave-one-out cross-validation.
The sample size for each training set was approximately 2645.
Accuracy: This is a measure of the correlation between the predictions made by the model and the actual observations. The higher the Accuracy, the more closely a model can predict the actual observations.
By comparing the Accuracy metric of the 4 models, we can see the our FULL model produces the highest Accuracy rate and is therefore the best model to use.
$finalModel crossmodelfull
Call: NULL
Coefficients:
(Intercept) age bmi ecghr
-7.33430 0.08857 0.06191 0.01125
Degrees of Freedom: 2645 Total (i.e. Null); 2642 Residual
Null Deviance: 3655
Residual Deviance: 3072 AIC: 3080
Our final model is: ht̂n = -7.33430 + 0.08857{age} + 0.06191{bmi} + 0.01125{ecghr}
For each 1-year increase in Age, the likelihood of a person having hypertension increases by approximately 8.8%.
For each 1 kg/m increase in BMI (Body Mass Index), the likelihood of a person having hypertension increases by approximately 6%.
For each 1 bpm increase in Ecghr (Heart Rate), the likelihood of a person having hypertension increases by approximatly 1%.
It can be noted that k-fold cross-validation and leave-one-out cross-validation are very similar when it comes to R code.