Here we configure WSL2 with Debian Distribution where we will step thru command by command which would help to configure and run ML model comparing the performance of CPU vs GPU. We will use XGBoost, which is a powerful gradient boosting algo designed for machine learning tasks. It is optimized for speed and efficiency and uses multiple CPU/GPU cores for faster training.
Step 1 – Configure CUDA, cuDNN on WSL2 on Debian distribution.
Step 2 – Once we have finished installing and configuring the required libraries, we will execute a python code which will show the difference between CPU vs GPU.
Output is shown below for dataset – ASHRAE – Great Energy Predictor III
CPU Training Time: 10.0848 seconds
GPU Training Time: 1.1647 seconds
Command– Check which distribution of linux is installed.
cat /etc/os-release
Output –
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
Command– Update/Upgrade the linux distribution
sudo apt-get update && sudo apt-get upgrade -y
Output – If there are no upgrades available
Hit:1 http://deb.debian.org/debian bookworm InRelease
Hit:2 http://ftp.debian.org/debian bookworm-backports InRelease
Hit:3 http://deb.debian.org/debian bookworm-updates InRelease
Hit:4 http://security.debian.org/debian-security bookworm-security InRelease
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Command– Install Python and check if its installed correctly
apt list --installed | grep python
# if there is no output - then python is not installed, lets proceed to install python
sudo apt install python3 -y
python3 --version
# Output should look like
Python 3.11.2
apt list --installed | grep python
libpython3-stdlib/stable,now 3.11.2-1+b1 amd64 [installed,automatic]
libpython3.11-minimal/stable,now 3.11.2-6+deb12u6 amd64 [installed,automatic]
libpython3.11-stdlib/stable,now 3.11.2-6+deb12u6 amd64 [installed,automatic]
python3-minimal/stable,now 3.11.2-1+b1 amd64 [installed,automatic]
python3.11-minimal/stable,now 3.11.2-6+deb12u6 amd64 [installed,automatic]
python3.11/stable,now 3.11.2-6+deb12u6 amd64 [installed,automatic]
python3/stable,now 3.11.2-1+b1 amd64 [installed]
Command– Install CUDA (Nvidia Only)
sudo apt-get install -y cuda
# If this returns blank - Basically CUDA is not installed
sudo apt install -y cuda
Output : If Cuda is not available in the default repo – You should see the following message
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package cuda
Command– This downloads and adds the GPG key for the NVIDIA CUDA repository.
# check if the repository is already present
grep -r "developer.download.nvidia.com" /etc/apt/sources.list /etc/apt/sources.list.d/
sudo apt install gnupg2 -y
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/3bf863cc.pub
Output –
Executing: /tmp/apt-key-gpghome.i4IxIlSRWV/gpg.1.sh --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/3bf863cc.pub
gpg: requesting key from 'https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/3bf863cc.pub'
gpg: key A4B469963BF863CC: public key "cudatools <cudatools@nvidia.com>" imported
gpg: Total number processed: 1
gpg: imported: 1
Command– This adds the NVIDIA CUDA repository to your system
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/ /"
Command –
sudo: add-apt-repository: command not found
# If you get the above error do the following and rerun the "sudo add-apt-repository" command
sudo apt install software-properties-common
Output (Ignore the warnings)
Repository: 'deb https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/ /'
Description:
Archive for codename: / components:
More info: https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/
Adding repository.
Press [ENTER] to continue or Ctrl-c to cancel.
Adding deb entry to /etc/apt/sources.list.d/archive_uri-https_developer_download_nvidia_com_compute_cuda_repos_debian12_x86_64_-bookworm.list
Adding disabled deb-src entry to /etc/apt/sources.list.d/archive_uri-https_developer_download_nvidia_com_compute_cuda_repos_debian12_x86_64_-bookworm.list
Hit:1 http://deb.debian.org/debian bookworm InRelease
Hit:2 http://deb.debian.org/debian bookworm-updates InRelease
Hit:3 http://security.debian.org/debian-security bookworm-security InRelease
Get:4 http://deb.debian.org/debian bookworm/main amd64 DEP-11 Metadata [4,492 kB]
Hit:5 http://ftp.debian.org/debian bookworm-backports InRelease
Get:6 https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64 InRelease [1,581 B]
Get:7 https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64 Packages [885 kB]
Fetched 5,379 kB in 5s (1,140 kB/s)
Reading package lists... Done
W: https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details.
Command – Install CUDA – this would take a while to install, Installs the CUDA toolkit & drivers and provides with CUDA runtime, compiler, and libraries
sudo apt-get update && sudo apt-get install -y cuda
Output – few lines at the end
Setting up default-jre-headless (2:1.17-74) ...
Setting up openjdk-17-jre:amd64 (17.0.15+6-1~deb12u1) ...
Setting up default-jre (2:1.17-74) ...
Setting up cuda-nvvp-12-9 (12.9.19-1) ...
Setting up cuda-nsight-12-9 (12.9.19-1) ...
Setting up cuda-visual-tools-12-9 (12.9.0-1) ...
Setting up cuda-tools-12-9 (12.9.0-1) ...
Setting up cuda-toolkit-12-9 (12.9.0-1) ...
Setting up cuda-12-9 (12.9.0-1) ...
Setting up cuda (12.9.0-1) ...
Processing triggers for libc-bin (2.36-9+deb12u10) ...
Processing triggers for libgdk-pixbuf-2.0-0:amd64 (2.42.10+dfsg-1+deb12u1) ...
Command – Setuptools is a Python package that helps developers build, package, and distribute Python projects
sudo ln -s /usr/bin/python3 /usr/bin/python
sudo apt install pip
sudo apt install python3-venv # if not already installed
python3 -m venv ~/myenv
source ~/myenv/bin/activate
pip install -U pip setuptools
Output –
(myenv) sharad@SharadDellPC:~$ pip install -U pip setuptools
Requirement already satisfied: pip in ./myenv/lib/python3.11/site-packages (23.0.1)
Collecting pip
Downloading pip-25.1.1-py3-none-any.whl (1.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 13.3 MB/s eta 0:00:00
Requirement already satisfied: setuptools in ./myenv/lib/python3.11/site-packages (66.1.1)
Collecting setuptools
Downloading setuptools-80.8.0-py3-none-any.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 15.3 MB/s eta 0:00:00
Installing collected packages: setuptools, pip
Attempting uninstall: setuptools
Found existing installation: setuptools 66.1.1
Uninstalling setuptools-66.1.1:
Successfully uninstalled setuptools-66.1.1
Attempting uninstall: pip
Found existing installation: pip 23.0.1
Uninstalling pip-23.0.1:
Successfully uninstalled pip-23.0.1
Successfully installed pip-25.1.1 setuptools-80.8.0
Command – Installs CuPy, a Python library for GPU computing
pip install cupy-cuda12x
Output –
(myenv) sharad@SharadDellPC:~$ pip install cupy-cuda12x
Collecting cupy-cuda12x
Downloading cupy_cuda12x-13.4.1-cp311-cp311-manylinux2014_x86_64.whl.metadata (2.6 kB)
Collecting numpy<2.3,>=1.22 (from cupy-cuda12x)
Downloading numpy-2.2.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (62 kB)
Collecting fastrlock>=0.5 (from cupy-cuda12x)
Downloading fastrlock-0.8.3-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl.metadata (7.7 kB)
Downloading cupy_cuda12x-13.4.1-cp311-cp311-manylinux2014_x86_64.whl (105.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 105.4/105.4 MB 13.2 MB/s eta 0:00:00
Downloading numpy-2.2.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.8/16.8 MB 12.7 MB/s eta 0:00:00
Downloading fastrlock-0.8.3-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl (54 kB)
Installing collected packages: fastrlock, numpy, cupy-cuda12x
Successfully installed cupy-cuda12x-13.4.1 fastrlock-0.8.3 numpy-2.2.6
Command – Upgrades both Jupyter and IPython to their latest versions. IPython – Enhances Python’s interactive shell with features like syntax highlighting, tab completion, and better debugging
pip install --upgrade jupyter ipython
# option : --display-name "Python (myenv)" : Defines how the kernel appears in Jupyter Notebook
python -m ipykernel install --user --name myenv --display-name "Python (myenv)"
Command – cuML is a GPU-accelerated machine learning library that provides scikit-learn-like APIs but runs computations on NVIDIA GPUs using CUDA.
pip install cuml-cu12 --extra-index-url=https://pypi.nvidia.com
# CuPy is great for GPU-accelerated numerical computing, while cuML is designed for machine learning tasks on GPUs. If you're working with deep learning, CuPy is useful for tensor operations, whereas cuML is ideal for traditional ML algorithms.
Command – Modify .bashrc
export LD_LIBRARY_PATH=/usr/local/cuda-12.9/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-12.9/bin:$PATH
export LD_LIBRARY_PATH="/usr/lib/wsl/lib/":$LD_LIBRARY_PATH
export NUMBA_CUDA_DRIVER="/usr/lib/wsl/lib/libcuda.so.1"
export CUDA_HOME=/usr/local/cuda-12.9
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu/:/usr/local/cuda-12.9/targets/x86_64-linux/lib
Command – Install tensorflow
pip install tensorflow
Command – The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks
# Visit CUDA Deep Neural Network (cuDNN) | NVIDIA Developer (i.e. https://developer.nvidia.com/cudnn) to download the required cuDNN lib
wget https://developer.download.nvidia.com/compute/cudnn/9.10.1/local_installers/cudnn-local-repo-debian12-9.10.1_1.0-1_amd64.deb
sudo dpkg -i cudnn-local-repo-debian12-9.10.1_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-debian12-9.10.1/cudnn-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cudnn
sudo apt-get -y install cudnn-cuda-12
Command – Install Jupyter Notebook extensions like spell check, code folding, table of contents, displays execution times and more
pip install jupyter_contrib_nbextensions
pip install jupyterlab-execute-time
Command – Install XGBoost
pip install xgboost
Output –
(myenv) sharad@SharadDellPC:~$ pip install xgboost
Collecting xgboost
Downloading xgboost-3.0.2-py3-none-manylinux_2_28_x86_64.whl.metadata (2.1 kB)
Requirement already satisfied: numpy in ./myenv/lib/python3.11/site-packages (from xgboost) (2.0.2)
Collecting nvidia-nccl-cu12 (from xgboost)
Downloading nvidia_nccl_cu12-2.26.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.0 kB)
Requirement already satisfied: scipy in ./myenv/lib/python3.11/site-packages (from xgboost) (1.15.3)
Downloading xgboost-3.0.2-py3-none-manylinux_2_28_x86_64.whl (253.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 253.9/253.9 MB 12.8 MB/s eta 0:00:00
Downloading nvidia_nccl_cu12-2.26.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (318.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺ 312.5/318.1 MB 12.7 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 318.1/318.1 MB 12.4 MB/s eta 0:00:00
Installing collected packages: nvidia-nccl-cu12, xgboost
Successfully installed nvidia-nccl-cu12-2.26.5 xgboost-3.0.2
Code – Let’s check, if GPU is detected
from numba import cuda
print(cuda.detect())
Output – If successful – You should see the following message
Found 1 CUDA devices
id 0 b'NVIDIA GeForce RTX 3050 Laptop GPU' [SUPPORTED]
Compute Capability: 8.6
PCI Device ID: 0
PCI Bus ID: 1
UUID: GPU-20ddd68b-ed17-f2c9-1d1b-f008c98bc677
Watchdog: Enabled
FP32/FP64 Performance Ratio: 32
Summary:
1/1 devices are supported
True
Code – Another aspect to check GPU
import cupy as cp
cp.cuda.runtime.getDeviceCount()
# output should be
1
Code – To check if tensorflow is available via GPU
import tensorflow as tf
print("Is TensorFlow built with cuDNN?", tf.test.is_built_with_cuda())
print("Available GPUs:", tf.config.list_physical_devices('GPU'))
Output –
Is TensorFlow built with cuDNN? True
Available GPUs: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Real world example – to show the execution time difference between CPU’s and GPU’s
The dataset is from a real-world competition organized by ASHRAE (American Society of Heating, Refrigerating and Air-Conditioning Engineers) i.e. ASHRAE – Great Energy Predictor III
URL – https://www.kaggle.com/competitions/ashrae-energy-prediction/overview
Why did we take this dataset ?
Large Scale (Perfect for GPU Testing)
It contains over 20 million rows, ideal for testing performance difference between CPU and GPU.
- The size is realistic and substantial enough to justify GPU acceleration.
- Great for demonstrating models like XGBoost
Since XGBoost is a gradient boosting algorithm, it benefits from
- Temporal features → Helps model time-dependent energy consumption trends.
- Structural features → Improves predictions by considering building-specific attributes.
- Feature engineering → Creating lag variables, rolling averages, and categorical encodings which enhances model accuracy.
XGBoost benefits from handling missing values, feature selection, and GPU acceleration to optimize predictions.
import pandas as pd
import xgboost as xgb
import time
from sklearn.model_selection import train_test_split
# Step 1: Load the dataset
absolute_path = "/home/sharad/csvfiles/energy/"
train = pd.read_csv(absolute_path + '/train.csv')
building = pd.read_csv(absolute_path + 'building_metadata.csv')
weather = pd.read_csv(absolute_path + 'weather_train.csv')
# Step 5: Split and prepare DMatrix
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
# Hard Coding Common XGBoost parameters
params = {
"objective": "reg:squarederror",
"max_depth": 8,
"eta": 0.1,
"subsample": 0.8,
"colsample_bytree": 0.8
}
# Step 6: CPU Training
params_cpu = params.copy()
params_cpu["device"] = "cpu"
start_cpu = time.time()
model_cpu = xgb.train(params_cpu, dtrain, num_boost_round=100)
cpu_time = time.time() - start_cpu
# Step 7: GPU Training
params_gpu = params.copy()
params_gpu["device"] = "cuda"
start_gpu = time.time()
model_gpu = xgb.train(params_gpu, dtrain, num_boost_round=100)
gpu_time = time.time() - start_gpu
# Step 8: Print results
print(f"CPU Training Time: {cpu_time:.4f} seconds")
print(f"GPU Training Time: {gpu_time:.4f} seconds")
Output –
CPU Training Time: 10.0848 seconds
GPU Training Time: 1.1647 seconds