4.4.3 Leave-One-Out Cross-Validation

 

Training and Testing

crossdata <- read_dta("C:/Users/buste/OneDrive/Desktop/Modeling/analysis1.dta")  %>%
  select(ecghr,age,bmi,htn) %>%
    na.omit()
## Sets method of cross-validation to  use leave-one-out
method <- trainControl(method = "LOOCV")


## Example model created to demonstrate leave-one-out
crossmodelfull <- train(as.factor(htn) ~ age + bmi + ecghr,
                        data = crossdata,
                        method = "glm",
                        trControl = method)

crossmodel1 <- train(as.factor(htn) ~ age + bmi,
                     data = crossdata,
                     method = "glm",
                     trControl = method)

crossmodel2 <- train(as.factor(htn) ~ age + ecghr,
                     data = crossdata,
                     method = "glm",
                     trControl = method)

crossmodel3 <- train(as.factor(htn) ~ bmi + ecghr,
                     data = crossdata,
                     method = "glm",
                     trControl = method)
print(crossmodelfull)
Generalized Linear Model 

2646 samples
   3 predictor
   2 classes: '0', '1' 

No pre-processing
Resampling: Leave-One-Out Cross-Validation 
Summary of sample sizes: 2645, 2645, 2645, 2645, 2645, 2645, ... 
Resampling results:

  Accuracy   Kappa    
  0.7093726  0.4132869
print(crossmodel1)
Generalized Linear Model 

2646 samples
   2 predictor
   2 classes: '0', '1' 

No pre-processing
Resampling: Leave-One-Out Cross-Validation 
Summary of sample sizes: 2645, 2645, 2645, 2645, 2645, 2645, ... 
Resampling results:

  Accuracy   Kappa    
  0.7071051  0.4090227
print(crossmodel2)
Generalized Linear Model 

2646 samples
   2 predictor
   2 classes: '0', '1' 

No pre-processing
Resampling: Leave-One-Out Cross-Validation 
Summary of sample sizes: 2645, 2645, 2645, 2645, 2645, 2645, ... 
Resampling results:

  Accuracy  Kappa   
  0.685941  0.366857
print(crossmodel3)
Generalized Linear Model 

2646 samples
   2 predictor
   2 classes: '0', '1' 

No pre-processing
Resampling: Leave-One-Out Cross-Validation 
Summary of sample sizes: 2645, 2645, 2645, 2645, 2645, 2645, ... 
Resampling results:

  Accuracy   Kappa    
  0.5684051  0.1202561

Results

Interpretation of results:

No pre-processing. We did not scale the data before fitting the models.

The re-sampling method we used to evaluate the model was leave-one-out cross-validation.

The sample size for each training set was approximately 2645.

Accuracy: This is a measure of the correlation between the predictions made by the model and the actual observations. The higher the Accuracy, the more closely a model can predict the actual observations.

By comparing the Accuracy metric of the 4 models, we can see the our FULL model produces the highest Accuracy rate and is therefore the best model to use.

crossmodelfull$finalModel

Call:  NULL

Coefficients:
(Intercept)          age          bmi        ecghr  
   -7.33430      0.08857      0.06191      0.01125  

Degrees of Freedom: 2645 Total (i.e. Null);  2642 Residual
Null Deviance:      3655 
Residual Deviance: 3072     AIC: 3080

Our final model is: ht̂n = -7.33430 + 0.08857{age} + 0.06191{bmi} + 0.01125{ecghr}

For each 1-year increase in Age, the likelihood of a person having hypertension increases by approximately 8.8%.

For each 1 kg/m increase in BMI (Body Mass Index), the likelihood of a person having hypertension increases by approximately 6%.

For each 1 bpm increase in Ecghr (Heart Rate), the likelihood of a person having hypertension increases by approximatly 1%.

It can be noted that k-fold cross-validation and leave-one-out cross-validation are very similar when it comes to R code.