Of course! Using xgboost with cross_val_score from scikit-learn is a standard and powerful way to evaluate your model's performance and get a more reliable estimate of its accuracy.

Here's a comprehensive guide covering the basics, a complete code example, and important considerations.
Why Use Cross-Validation with XGBoost?
- Robust Performance Estimate: A single train-test split can be misleading due to the specific data in that split. Cross-validation (CV) averages the performance over multiple splits, giving you a more stable and reliable estimate of how your model will perform on unseen data.
- Better Use of Data: Especially with smaller datasets, CV allows you to use all of your data for both training and evaluation, just at different times.
- Hyperparameter Tuning: CV is the foundation for techniques like
GridSearchCVandRandomizedSearchCV, which help you find the optimal hyperparameters for your XGBoost model.
Key Components
xgboost.XGBClassifierorxgboost.XGBRegressor: The XGBoost model class that is compatible with the scikit-learn API.sklearn.model_selection.cross_val_score: The scikit-learn function that performs the cross-validation. It takes the model, the data, the labels, and acvparameter to define the number of folds.cvParameter: This can be an integer (e.g.,cv=5for 5-fold CV) or a cross-validation splitter object (e.g.,StratifiedKFoldfor classification problems).scoringParameter: The metric to evaluate the model (e.g.,'accuracy','f1','roc_auc','neg_mean_squared_error').
Complete Code Example (Classification)
Let's walk through a full example for a classification problem.
Step 1: Import Libraries
import xgboost as xgb from sklearn.model_selection import cross_val_score from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score import numpy as np import pandas as pd
Step 2: Create or Load Data
We'll use make_classification to create a synthetic dataset. This is great for examples because it's self-contained.
# Generate a synthetic dataset
X, y = make_classification(
n_samples=1000, # 1000 data points
n_features=20, # 20 features
n_informative=10, # 10 useful features
n_redundant=5, # 5 redundant features
n_classes=2, # Binary classification
random_state=42
)
# For demonstration, let's also create a train/test split
# This is just to show you the difference between a single score and CV scores.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f"Dataset shape: {X.shape}")
print(f"Train set shape: {X_train.shape}")
print(f"Test set shape: {X_test.shape}")
Step 3: Initialize the XGBoost Model
We'll create an XGBClassifier. You can set hyperparameters here, but for CV, it's often best to start with default or reasonable values.

# Initialize the XGBoost Classifier
# 'use_label_encoder' and 'eval_metric' are set to avoid warnings
xgb_model = xgb.XGBClassifier(
objective='binary:logistic', # For binary classification
eval_metric='logloss',
use_label_encoder=False,
n_estimators=100, # Number of boosting rounds (trees)
learning_rate=0.1,
max_depth=3,
random_state=42
)
Step 4: Perform Cross-Validation
Now, we use cross_val_score to evaluate the model. We'll use 5-fold CV.
# Perform 5-fold cross-validation
# We use the entire dataset (X, y) here.
# cross_val_score will handle the splitting internally.
cv_scores = cross_val_score(
xgb_model,
X,
y,
cv=5, # Number of folds
scoring='accuracy', # Evaluation metric
n_jobs=-1 # Use all available CPU cores
)
# Print the results
print("--- Cross-Validation Results ---")
print(f"CV Scores for each fold: {cv_scores}")
print(f"Mean CV Accuracy: {cv_scores.mean():.4f}")
print(f"Standard Deviation of CV Accuracy: {cv_scores.std():.4f}")
Output:
--- Cross-Validation Results ---
CV Scores for each fold: [0.89 0.925 0.905 0.91 0.91 ]
Mean CV Accuracy: 0.9090
Standard Deviation of CV Accuracy: 0.0114
This tells us that, on average, we can expect the model to be about 90.9% accurate, with a small standard deviation, indicating consistent performance across different folds.
Step 5: (Optional) Train on Full Data and Evaluate on Holdout Set
For comparison, let's train the model on the X_train set and evaluate it on the unseen X_test set.

# Train the model on the entire training set
xgb_model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = xgb_model.predict(X_test)
# Calculate accuracy on the test set
test_accuracy = accuracy_score(y_test, y_pred)
print("\n--- Holdout Set Evaluation ---")
print(f"Accuracy on the holdout test set: {test_accuracy:.4f}")
Output:
--- Holdout Set Evaluation ---
Accuracy on the holdout test set: 0.9200
Notice that the holdout accuracy (92.00%) is close to the mean CV accuracy (90.90%). This is a good sign! If they were vastly different, it might indicate that the model is overfitting to the specific train-test split or that the CV estimate was not reliable.
Advanced: Using StratifiedKFold for Imbalanced Datasets
For classification, especially with imbalanced classes, it's better to use StratifiedKFold. This ensures that each fold has the same proportion of classes as the original dataset.
from sklearn.model_selection import StratifiedKFold
# Define the cross-validation strategy
stratified_kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
# Perform cross-validation with the custom splitter
cv_scores_stratified = cross_val_score(
xgb_model,
X,
y,
cv=stratified_kfold,
scoring='accuracy',
n_jobs=-1
)
print("\n--- Stratified K-Fold CV Results ---")
print(f"CV Scores for each fold: {cv_scores_stratified}")
print(f"Mean CV Accuracy: {cv_scores_stratified.mean():.4f}")
print(f"Standard Deviation of CV Accuracy: {cv_scores_stratified.std():.4f}")
Advanced: Using XGBoost's Built-in CV Function (xgb.cv)
XGBoost also has its own, highly optimized cross-validation function: xgb.cv. This is particularly useful because it can be much faster and provides more detailed output, including the evaluation history for each fold.
Key Differences:
- Data Format:
xgb.cvrequires the data to be in a specialDMatrixformat. - Parameters: Many model parameters are passed directly to the
xgb.cvfunction, not the model object. - Output: It returns a pandas DataFrame with the history of evaluation metrics for each iteration.
# 1. Create a DMatrix
dtrain = xgb.DMatrix(X, label=y)
# 2. Define parameters
# Note: 'objective' and 'eval_metric' are specified here.
params = {
'objective': 'binary:logistic',
'eval_metric': 'logloss',
'max_depth': 3,
'eta': 0.1, # learning_rate
'seed': 42
}
# 3. Run xgb.cv
# The 'num_boost_round' is the total number of trees to build.
# The 'early_stopping_rounds' is a powerful feature to prevent overfitting.
cv_results = xgb.cv(
params,
dtrain,
num_boost_round=100,
nfold=5,
stratified=True, # Use stratified folds for classification
early_stopping_rounds=10, # Stop if performance doesn't improve for 10 rounds
seed=42,
verbose_eval=10 # Print results every 10 boosting rounds
)
# 4. Analyze the results
print("\n--- XGBoost Native CV Results ---")
print(cv_results.tail()) # Show the last few rows
# Get the best iteration and the corresponding score
best_iteration = cv_results['test-logloss-mean'].idxmin()
best_score = cv_results.loc[best_iteration, 'test-logloss-mean']
print(f"\nBest iteration: {best_iteration}")
print(f"Best test logloss (mean): {best_score:.4f}")
Why use xgb.cv?
- Speed: It's often faster than
sklearn.model_selection.cross_val_score. - Early Stopping: This is a huge advantage. It automatically stops training when the performance on a validation set stops improving, preventing you from training unnecessary trees and saving time.
- Detailed History: You get a full history of the metric (e.g., logloss) for every boosting round, allowing you to analyze the learning curve.
Summary: Which one should you use?
| Method | When to Use | Pros | Cons |
|---|---|---|---|
cross_val_score |
General-purpose, when you want a quick and robust performance estimate. Integrates seamlessly with the scikit-learn ecosystem (GridSearchCV, etc.). |
Easy to use, familiar syntax, great for model comparison. | Slower for large datasets, no built-in early stopping. |
xgb.cv |
When you need speed, want to use early stopping, or need a detailed history of the training process. | Very fast, supports early stopping, provides rich training history. | Requires DMatrix format, less integrated with scikit-learn's meta-estimators. |
For most use cases, starting with sklearn.model_selection.cross_val_score is perfect. If you find yourself needing more control, speed, or the power of early stopping, switch to xgb.cv.
