The AI Journey

Never Ending Journey of Learning

Machine Learning on Windows 11 with WSL2

by

in

WSL (Windows Subsystem for Linux) provides a Linux-native environment on Windows. It enables seamless use of modern machine learning frameworks. These frameworks can use full GPU acceleration via NVIDIA CUDA, cuDNN, ROCm (for AMD GPUs), and OpenCL (where supported).

This is critical because many ML libraries have limited or deprecated support on native Windows, especially for training with GPUs.

Using WSL over native Windows for machine learning ensures full compatibility with modern GPU-accelerated libraries like TensorFlow and JAX. Native Windows no longer supports TensorFlow GPU beyond version 2.10, and JAX lacks official Windows support for GPUs.

In contrast, WSL offers a Linux-native CUDA environment. It enables seamless installation and better performance. It also allows full GPU utilization with the latest ML frameworks.

 GPU API Comparison: Windows 11 Native vs WSL2

API / FrameworkWindows 11 (Native)WSL2Supported OnKey Points
CUDA (NVIDIA)Supported via Windows CUDA Toolkit; PyTorch only. TensorFlow GPU support dropped after v2.10Fully supported via NVIDIA’s WSL2 CUDA ToolkitNVIDIA GPUs OnlyUse WSL2 for TensorFlow ≥ v2.11
cuDNN (NVIDIA Deep Learning Library)SupportedSupportedNVIDIA GPUs OnlyRequires manual download/registration
DirectMLNative Windows ML API; works with TensorFlow-DirectML and ONNX RuntimeNot supported on Linux or WSLAll modern GPUs (i.e. NVIDIA, AMD, Intel)Ideal for Windows + AMD/Intel GPUs
OpenCLVendor-supported (NVIDIA, Intel, AMD)Partially supported depending on driversCross-vendor GPUs (i.e. Intel, AMD, NVIDIA)Less optimized for DL frameworks
Vulkan (with Vulkan Compute)Supported via Vulkan SDK; can run compute shadersSupported via mesa-vulkan-drivers or vendor SDKsCross-vendor APILow-level, not common for DL
ROCm (AMD’s GPU platform)Not supported (no ROCm on Windows)Not officially supported in WSL2; use native Linux distributions onlyAMD GPUs (Linux only)WSL2 doesn’t support ROCm yet
TensorRT (NVIDIA inference engine)Supported; convert ONNX/PyTorch models to deploy fast inferenceFully supported via TensorRT engine in Ubuntu or WSL2NVIDIA GPUsInference-only
ONNX Runtime GPUSupported with DirectML or CUDA backendSupported with CUDA or TensorRT backendCross-vendor GPUsHardware-agnostic ONNX inference

Key Points :

  • TensorFlow GPU on Windows is no longer supported beyond version 2.10. For 2.11+, use WSL2.
  • WSL2 allows TensorFlow to use CUDA and cuDNN directly, ensuring full GPU acceleration.
  • cuDNN is still available for Windows, but mostly used by PyTorch. TensorFlow no longer leverages it on Windows.
  • ONNX Runtime GPU is a deployment-focused runtime, not used for training.
  • Vulkan Compute is very low-level and not practical for most ML workflows unless you’re on embedded/mobile.
  • WSL2 offers full Linux environment, making it the preferred development path if staying on Windows.
  • JAX has no official support on native Windows.
  • JAX can leverage CUDA and XLA for high-performance GPU computation.
  • DirectML is a Windows-only GPU backend used for lightweight ML tasks.
  • DirectML is not supported in WSL and lacks the performance and ecosystem compatibility of CUDA or ROCm.
  • DirectML is suitable for inference, not for training large models or using advanced ML libraries.

Additional Inference and Acceleration APIs

API / FrameworkWindows 11 (Native)WSL2 (Ubuntu)Supported OnKey Notes
OpenVINO (Intel)SupportedFully supported (via APT/PIP, or Docker)Intel CPUs, iGPUs, VPUs (e.g., Myriad X)Optimized for Intel hardware, supports ONNX, TensorFlow
Intel Extension for TensorFlow / PyTorchSupportedSupportedIntel CPUs (AVX-512), GPUs (Arc/Xe)Adds fused ops, quantization, etc.
Graphcore IPU (Poplar SDK)Not supportedRequired Native Linux or Docker on WSLGraphcore IPUsUsed for LLMs, GNNs, Transformer-style models
Habana Gaudi (HUGO)Not supportedSupported on native Linux (not WSL)AWS EC2 DL1, Intel’s Habana chipsRequires custom framework support
Apple Core ML / MPSNot supportedNot supportedApple Silicon (M1, M2, M3, M4)macOS-only, optimized for iPhones/iPads

Slight divergence from the topic of interest to cover some important topics like “Inference engines” and “ONNX”.

Inference Engine

An inference engine is a software component or system that executes a trained machine learning model to make predictions on new (unseen) data — i.e., it performs inference.

Common Inference Engines –

Inference EngineCompatible FormatsOptimized ForPlatform
TensorRTTensorFlow, ONNXNVIDIA GPUs (high-speed inference)Linux, WSL, Windows
ONNX RuntimeONNXCross-platform, hardware-agnosticWindows, Linux, Mobile
OpenVINOONNX, TensorFlowIntel CPUs, VPUs, GPUsLinux, Windows
TFLiteTensorFlow LiteEdge/mobile inferenceAndroid, iOS
DirectMLONNX, TensorFlow-DirectMLWindows GPU acceleration (NVIDIA, AMD, Intel)Windows only

What does the inference engines do –

  1. Loads the model – Reads model file (.onnx, .pb, .pt, etc.) into memory.
  2. Optimize model graph – fuses the layers, remove redundancies for speed.
  3. Handles data preprocessing – May include image resizing, normalization.
  4. Executes compute efficiently – Uses GPU/CPU acceleration with low latency.
  5. Supports quantization – Converts float32 to int8/float16 for smaller models.
  6. Returns outputs – Provides predicted classes, bounding boxes, scores, etc.

Example – Let’s say you trained a YOLOv5 object detection model using PyTorch. You then can –

  1. Convert it to ONNX.
  2. Deploy it using ONNX Runtime with TensorRT backend on an NVIDIA GPU.
  3. The inference engine handles image input and returns bounding boxes + labels in real time.

Why Use Specialized Inference Engines?

  1. Faster than generic training frameworks (such as PyTorch or TensorFlow)
  2. Optimized for specific hardware (e.g., ONNX Runtime or TensorRT or OpenVINO)
  3. Smaller memory footprint
  4. Supports deployment across platforms (cloud, mobile, edge)

Remember that Specialized inference engines (like TensorRT, ONNX Runtime, OpenVINO) are optimized to run trained models faster and more efficiently than the general-purpose training frameworks (like TensorFlow, PyTorch) when used for inference only. Most people use frameworks like TensorFlow or PyTorch to both train and test models. These are:

  • Flexible
  • Dynamic
  • Designed for experimentation

But these training frameworks are not optimized for low-latency, high-throughput inference.