crossdata <- read_dta("C:/Users/buste/OneDrive/Desktop/Modeling/analysis1.dta") %>%
select(ecghr,age,bmi,htn) %>%
na.omit()
## Sets method of cross-validation to use leave-one-out
method <- trainControl(method = "LOOCV")
## Example model created to demonstrate leave-one-out
crossmodelfull <- train(as.factor(htn) ~ age + bmi + ecghr,
data = crossdata,
method = "glm",
trControl = method)
crossmodel1 <- train(as.factor(htn) ~ age + bmi,
data = crossdata,
method = "glm",
trControl = method)
crossmodel2 <- train(as.factor(htn) ~ age + ecghr,
data = crossdata,
method = "glm",
trControl = method)
crossmodel3 <- train(as.factor(htn) ~ bmi + ecghr,
data = crossdata,
method = "glm",
trControl = method)4.4.3 Leave-One-Out Cross-Validation
Training and Testing
print(crossmodelfull)Generalized Linear Model
2646 samples
3 predictor
2 classes: '0', '1'
No pre-processing
Resampling: Leave-One-Out Cross-Validation
Summary of sample sizes: 2645, 2645, 2645, 2645, 2645, 2645, ...
Resampling results:
Accuracy Kappa
0.7093726 0.4132869
print(crossmodel1)Generalized Linear Model
2646 samples
2 predictor
2 classes: '0', '1'
No pre-processing
Resampling: Leave-One-Out Cross-Validation
Summary of sample sizes: 2645, 2645, 2645, 2645, 2645, 2645, ...
Resampling results:
Accuracy Kappa
0.7071051 0.4090227
print(crossmodel2)Generalized Linear Model
2646 samples
2 predictor
2 classes: '0', '1'
No pre-processing
Resampling: Leave-One-Out Cross-Validation
Summary of sample sizes: 2645, 2645, 2645, 2645, 2645, 2645, ...
Resampling results:
Accuracy Kappa
0.685941 0.366857
print(crossmodel3)Generalized Linear Model
2646 samples
2 predictor
2 classes: '0', '1'
No pre-processing
Resampling: Leave-One-Out Cross-Validation
Summary of sample sizes: 2645, 2645, 2645, 2645, 2645, 2645, ...
Resampling results:
Accuracy Kappa
0.5684051 0.1202561
Results
Interpretation of results:
No pre-processing. We did not scale the data before fitting the models.
The re-sampling method we used to evaluate the model was leave-one-out cross-validation.
The sample size for each training set was approximately 2645.
Accuracy: This is a measure of the correlation between the predictions made by the model and the actual observations. The higher the Accuracy, the more closely a model can predict the actual observations.
By comparing the Accuracy metric of the 4 models, we can see the our FULL model produces the highest Accuracy rate and is therefore the best model to use.
crossmodelfull$finalModel
Call: NULL
Coefficients:
(Intercept) age bmi ecghr
-7.33430 0.08857 0.06191 0.01125
Degrees of Freedom: 2645 Total (i.e. Null); 2642 Residual
Null Deviance: 3655
Residual Deviance: 3072 AIC: 3080
Our final model is: ht̂n = -7.33430 + 0.08857{age} + 0.06191{bmi} + 0.01125{ecghr}
For each 1-year increase in Age, the likelihood of a person having hypertension increases by approximately 8.8%.
For each 1 kg/m increase in BMI (Body Mass Index), the likelihood of a person having hypertension increases by approximately 6%.
For each 1 bpm increase in Ecghr (Heart Rate), the likelihood of a person having hypertension increases by approximatly 1%.
It can be noted that k-fold cross-validation and leave-one-out cross-validation are very similar when it comes to R code.