[Question 1] code: features <- training[, !colnames(training) %in% nonpredictors] labels <- as.factor(training$label) output: none [Question 2] code: table(labels) output: labels 0 1 500 500 [Question 3] code: library(caret) set.seed(0) model <- train( x = features, y = labels, method = 'rpart', trControl = trainControl( method = 'boot', # this is the default and can be excluded number = 10 ) ) model output: CART 1000 samples 408 predictor 2 classes: '0', '1' No pre-processing Resampling: Bootstrapped (5 reps) Summary of sample sizes: 1000, 1000, 1000, 1000, 1000 Resampling results across tuning parameters: cp Accuracy Kappa 0.028 0.5945917 0.18955871 0.061 0.5647587 0.13218871 0.188 0.5182033 0.04084188 Accuracy was used to select the optimal model using the largest value. The final value used for the model was cp = 0.028. [Question 4] code: print(varImp(model), top = 10) output: rpart variable importance only 10 most important variables shown (out of 408) Overall H3K36me3..window. 100.00 CTCF..window. 95.29 SPI1..window. 88.81 SMC3..window. 88.39 RAD21..window. 85.40 H4K20me1..window. 81.11 CTCFL..window. 79.53 CTCF..promoter. 65.12 SMC3..promoter. 52.47 SMC3..enhancer. 49.01 [Question 5] code: set.seed(0) model <- train( x = features, y = labels, method = 'rf', ntree = 50, trControl = trainControl( method = 'cv', number = 5 ) ) model output: Random Forest 1000 samples 408 predictor 2 classes: '0', '1' No pre-processing Resampling: Cross-Validated (5 fold) Summary of sample sizes: 800, 800, 800, 800, 800 Resampling results across tuning parameters: mtry Accuracy Kappa 2 0.733 0.466 205 0.734 0.468 408 0.751 0.502 Accuracy was used to select the optimal model using the largest value. The final value used for the model was mtry = 408. [Question 6] code: print(varImp(model), top = 5) output: rf variable importance only 5 most important variables shown (out of 408) Overall CTCF..window. 100.00 CTCF..promoter. 55.87 H4K20me1..window. 53.53 SMC3..promoter. 52.12 RAD21..window. 46.48 [Question 7] CTCF, SMC3, RAD21 [Question 8] False positives would likely increase genome wide and cause a large decrease in precision since precision = tp / (tp + fp).