Being Hands-on – Performance Stats

Here we configure WSL2 with Debian Distribution where we will step thru command by command which would help to configure and run ML model comparing the performance of CPU vs GPU. We will use XGBoost, which is a powerful gradient boosting algo designed for machine learning tasks. It is optimized for speed and efficiency and uses multiple CPU/GPU cores for faster training.

Step 1 – Configure CUDA, cuDNN on WSL2 on Debian distribution.

Step 2 – Once we have finished installing and configuring the required libraries, we will execute a python code which will show the difference between CPU vs GPU.

Output is shown below for dataset – ASHRAE – Great Energy Predictor III

CPU Training Time: 10.0848 seconds
GPU Training Time: 1.1647 seconds

Command– Check which distribution of linux is installed.

cat /etc/os-release

Output –

PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Command– Update/Upgrade the linux distribution

sudo apt-get update && sudo apt-get upgrade -y

Output – If there are no upgrades available

Hit:1 http://deb.debian.org/debian bookworm InRelease
Hit:2 http://ftp.debian.org/debian bookworm-backports InRelease
Hit:3 http://deb.debian.org/debian bookworm-updates InRelease
Hit:4 http://security.debian.org/debian-security bookworm-security InRelease
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

Command– Install Python and check if its installed correctly

apt list --installed | grep python
# if there is no output - then python is not installed, lets proceed to install python
sudo apt install python3 -y
python3 --version

# Output should look like 

Python 3.11.2

apt list --installed | grep python

libpython3-stdlib/stable,now 3.11.2-1+b1 amd64 [installed,automatic]
libpython3.11-minimal/stable,now 3.11.2-6+deb12u6 amd64 [installed,automatic]
libpython3.11-stdlib/stable,now 3.11.2-6+deb12u6 amd64 [installed,automatic]
python3-minimal/stable,now 3.11.2-1+b1 amd64 [installed,automatic]
python3.11-minimal/stable,now 3.11.2-6+deb12u6 amd64 [installed,automatic]
python3.11/stable,now 3.11.2-6+deb12u6 amd64 [installed,automatic]
python3/stable,now 3.11.2-1+b1 amd64 [installed]

Command– Install CUDA (Nvidia Only)

sudo apt-get install -y cuda
# If this returns blank - Basically CUDA is not installed
sudo apt install -y cuda

Output : If Cuda is not available in the default repo – You should see the following message

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package cuda

Command– This downloads and adds the GPG key for the NVIDIA CUDA repository.

# check if the repository is already present
grep -r "developer.download.nvidia.com" /etc/apt/sources.list /etc/apt/sources.list.d/

sudo apt install gnupg2 -y
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/3bf863cc.pub

Output –

Executing: /tmp/apt-key-gpghome.i4IxIlSRWV/gpg.1.sh --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/3bf863cc.pub
gpg: requesting key from 'https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/3bf863cc.pub'
gpg: key A4B469963BF863CC: public key "cudatools <cudatools@nvidia.com>" imported
gpg: Total number processed: 1
gpg:               imported: 1

Command– This adds the NVIDIA CUDA repository to your system

sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/ /"

Command –

sudo: add-apt-repository: command not found
# If you get the above error do the following and rerun the "sudo add-apt-repository" command
sudo apt install software-properties-common

Output (Ignore the warnings)

Repository: 'deb https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/ /'
Description:
Archive for codename: / components:
More info: https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/
Adding repository.
Press [ENTER] to continue or Ctrl-c to cancel.
Adding deb entry to /etc/apt/sources.list.d/archive_uri-https_developer_download_nvidia_com_compute_cuda_repos_debian12_x86_64_-bookworm.list
Adding disabled deb-src entry to /etc/apt/sources.list.d/archive_uri-https_developer_download_nvidia_com_compute_cuda_repos_debian12_x86_64_-bookworm.list
Hit:1 http://deb.debian.org/debian bookworm InRelease
Hit:2 http://deb.debian.org/debian bookworm-updates InRelease
Hit:3 http://security.debian.org/debian-security bookworm-security InRelease
Get:4 http://deb.debian.org/debian bookworm/main amd64 DEP-11 Metadata [4,492 kB]
Hit:5 http://ftp.debian.org/debian bookworm-backports InRelease
Get:6 https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64  InRelease [1,581 B]
Get:7 https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64  Packages [885 kB]
Fetched 5,379 kB in 5s (1,140 kB/s)
Reading package lists... Done
W: https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details.

Command – Install CUDA – this would take a while to install, Installs the CUDA toolkit & drivers and provides with CUDA runtime, compiler, and libraries

sudo apt-get update && sudo apt-get install -y cuda

Output – few lines at the end

Setting up default-jre-headless (2:1.17-74) ...
Setting up openjdk-17-jre:amd64 (17.0.15+6-1~deb12u1) ...
Setting up default-jre (2:1.17-74) ...
Setting up cuda-nvvp-12-9 (12.9.19-1) ...
Setting up cuda-nsight-12-9 (12.9.19-1) ...
Setting up cuda-visual-tools-12-9 (12.9.0-1) ...
Setting up cuda-tools-12-9 (12.9.0-1) ...
Setting up cuda-toolkit-12-9 (12.9.0-1) ...
Setting up cuda-12-9 (12.9.0-1) ...
Setting up cuda (12.9.0-1) ...
Processing triggers for libc-bin (2.36-9+deb12u10) ...
Processing triggers for libgdk-pixbuf-2.0-0:amd64 (2.42.10+dfsg-1+deb12u1) ...

Command – Setuptools is a Python package that helps developers build, package, and distribute Python projects

sudo ln -s /usr/bin/python3 /usr/bin/python
sudo apt install pip
sudo apt install python3-venv  # if not already installed
python3 -m venv ~/myenv
source ~/myenv/bin/activate
pip install -U pip setuptools

Output –

(myenv) sharad@SharadDellPC:~$ pip install -U pip setuptools
Requirement already satisfied: pip in ./myenv/lib/python3.11/site-packages (23.0.1)
Collecting pip
  Downloading pip-25.1.1-py3-none-any.whl (1.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 13.3 MB/s eta 0:00:00
Requirement already satisfied: setuptools in ./myenv/lib/python3.11/site-packages (66.1.1)
Collecting setuptools
  Downloading setuptools-80.8.0-py3-none-any.whl (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 15.3 MB/s eta 0:00:00
Installing collected packages: setuptools, pip
  Attempting uninstall: setuptools
    Found existing installation: setuptools 66.1.1
    Uninstalling setuptools-66.1.1:
      Successfully uninstalled setuptools-66.1.1
  Attempting uninstall: pip
    Found existing installation: pip 23.0.1
    Uninstalling pip-23.0.1:
      Successfully uninstalled pip-23.0.1
Successfully installed pip-25.1.1 setuptools-80.8.0

Command – Installs CuPy, a Python library for GPU computing

pip install cupy-cuda12x

Output –

(myenv) sharad@SharadDellPC:~$ pip install cupy-cuda12x
Collecting cupy-cuda12x
  Downloading cupy_cuda12x-13.4.1-cp311-cp311-manylinux2014_x86_64.whl.metadata (2.6 kB)
Collecting numpy<2.3,>=1.22 (from cupy-cuda12x)
  Downloading numpy-2.2.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (62 kB)
Collecting fastrlock>=0.5 (from cupy-cuda12x)
  Downloading fastrlock-0.8.3-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl.metadata (7.7 kB)
Downloading cupy_cuda12x-13.4.1-cp311-cp311-manylinux2014_x86_64.whl (105.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 105.4/105.4 MB 13.2 MB/s eta 0:00:00
Downloading numpy-2.2.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.8/16.8 MB 12.7 MB/s eta 0:00:00
Downloading fastrlock-0.8.3-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl (54 kB)
Installing collected packages: fastrlock, numpy, cupy-cuda12x
Successfully installed cupy-cuda12x-13.4.1 fastrlock-0.8.3 numpy-2.2.6

Command – Upgrades both Jupyter and IPython to their latest versions. IPython – Enhances Python’s interactive shell with features like syntax highlighting, tab completion, and better debugging

pip install --upgrade jupyter ipython
# option : --display-name "Python (myenv)" : Defines how the kernel appears in Jupyter Notebook
python -m ipykernel install --user --name myenv --display-name "Python (myenv)"

Command – cuML is a GPU-accelerated machine learning library that provides scikit-learn-like APIs but runs computations on NVIDIA GPUs using CUDA.

pip install cuml-cu12 --extra-index-url=https://pypi.nvidia.com

# CuPy is great for GPU-accelerated numerical computing, while cuML is designed for machine learning tasks on GPUs. If you're working with deep learning, CuPy is useful for tensor operations, whereas cuML is ideal for traditional ML algorithms.

Command – Modify .bashrc

export LD_LIBRARY_PATH=/usr/local/cuda-12.9/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-12.9/bin:$PATH
export LD_LIBRARY_PATH="/usr/lib/wsl/lib/":$LD_LIBRARY_PATH
export NUMBA_CUDA_DRIVER="/usr/lib/wsl/lib/libcuda.so.1"
export CUDA_HOME=/usr/local/cuda-12.9
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu/:/usr/local/cuda-12.9/targets/x86_64-linux/lib

Command – Install tensorflow

pip install tensorflow

Command – The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks

# Visit CUDA Deep Neural Network (cuDNN) | NVIDIA Developer (i.e. https://developer.nvidia.com/cudnn) to download the required cuDNN lib

wget https://developer.download.nvidia.com/compute/cudnn/9.10.1/local_installers/cudnn-local-repo-debian12-9.10.1_1.0-1_amd64.deb
sudo dpkg -i cudnn-local-repo-debian12-9.10.1_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-debian12-9.10.1/cudnn-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cudnn
sudo apt-get -y install cudnn-cuda-12

Command – Install Jupyter Notebook extensions like spell check, code folding, table of contents, displays execution times and more

pip install jupyter_contrib_nbextensions
pip install jupyterlab-execute-time

Command – Install XGBoost

pip install xgboost

Output –

(myenv) sharad@SharadDellPC:~$ pip install xgboost
Collecting xgboost
  Downloading xgboost-3.0.2-py3-none-manylinux_2_28_x86_64.whl.metadata (2.1 kB)
Requirement already satisfied: numpy in ./myenv/lib/python3.11/site-packages (from xgboost) (2.0.2)
Collecting nvidia-nccl-cu12 (from xgboost)
  Downloading nvidia_nccl_cu12-2.26.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.0 kB)
Requirement already satisfied: scipy in ./myenv/lib/python3.11/site-packages (from xgboost) (1.15.3)
Downloading xgboost-3.0.2-py3-none-manylinux_2_28_x86_64.whl (253.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 253.9/253.9 MB 12.8 MB/s eta 0:00:00
Downloading nvidia_nccl_cu12-2.26.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (318.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺ 312.5/318.1 MB 12.7 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 318.1/318.1 MB 12.4 MB/s eta 0:00:00
Installing collected packages: nvidia-nccl-cu12, xgboost
Successfully installed nvidia-nccl-cu12-2.26.5 xgboost-3.0.2

Code – Let’s check, if GPU is detected

from numba import cuda
print(cuda.detect())

Output – If successful – You should see the following message

Found 1 CUDA devices
id 0    b'NVIDIA GeForce RTX 3050 Laptop GPU'                              [SUPPORTED]
                      Compute Capability: 8.6
                           PCI Device ID: 0
                              PCI Bus ID: 1
                                    UUID: GPU-20ddd68b-ed17-f2c9-1d1b-f008c98bc677
                                Watchdog: Enabled
             FP32/FP64 Performance Ratio: 32
Summary:
        1/1 devices are supported
True

Code – Another aspect to check GPU

import cupy as cp
cp.cuda.runtime.getDeviceCount()

# output should be 

1

Code – To check if tensorflow is available via GPU

import tensorflow as tf

print("Is TensorFlow built with cuDNN?", tf.test.is_built_with_cuda())
print("Available GPUs:", tf.config.list_physical_devices('GPU'))

Output –

Is TensorFlow built with cuDNN? True
Available GPUs: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Real world example – to show the execution time difference between CPU’s and GPU’s

The dataset is from a real-world competition organized by ASHRAE (American Society of Heating, Refrigerating and Air-Conditioning Engineers) i.e. ASHRAE – Great Energy Predictor III

URL – https://www.kaggle.com/competitions/ashrae-energy-prediction/overview

Why did we take this dataset ?

Large Scale (Perfect for GPU Testing)

It contains over 20 million rows, ideal for testing performance difference between CPU and GPU.

The size is realistic and substantial enough to justify GPU acceleration.
Great for demonstrating models like XGBoost

Since XGBoost is a gradient boosting algorithm, it benefits from

Temporal features → Helps model time-dependent energy consumption trends.
Structural features → Improves predictions by considering building-specific attributes.
Feature engineering → Creating lag variables, rolling averages, and categorical encodings which enhances model accuracy.

XGBoost benefits from handling missing values, feature selection, and GPU acceleration to optimize predictions.

import pandas as pd
import xgboost as xgb
import time
from sklearn.model_selection import train_test_split

# Step 1: Load the dataset
absolute_path = "/home/sharad/csvfiles/energy/"
train = pd.read_csv(absolute_path + '/train.csv')
building = pd.read_csv(absolute_path + 'building_metadata.csv')
weather = pd.read_csv(absolute_path + 'weather_train.csv')

# Step 5: Split and prepare DMatrix
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

# Hard Coding Common XGBoost parameters
params = {
    "objective": "reg:squarederror",
    "max_depth": 8,
    "eta": 0.1,
    "subsample": 0.8,
    "colsample_bytree": 0.8
}

# Step 6: CPU Training
params_cpu = params.copy()
params_cpu["device"] = "cpu"
start_cpu = time.time()
model_cpu = xgb.train(params_cpu, dtrain, num_boost_round=100)
cpu_time = time.time() - start_cpu

# Step 7: GPU Training
params_gpu = params.copy()
params_gpu["device"] = "cuda"
start_gpu = time.time()
model_gpu = xgb.train(params_gpu, dtrain, num_boost_round=100)
gpu_time = time.time() - start_gpu

# Step 8: Print results
print(f"CPU Training Time: {cpu_time:.4f} seconds")
print(f"GPU Training Time: {gpu_time:.4f} seconds")

Output –

CPU Training Time: 10.0848 seconds
GPU Training Time: 1.1647 seconds