The common wisdom in 2026 is that if you want to run powerful Large Language Models (LLMs) locally, you need a brand-new NVIDIA card and a massive budget.

The common wisdom is wrong.
I recently managed to get Qwen2.5-Coder (7B and 14B) running on my AMD Radeon RX 590. It wasn’t a "plug-and-play" experience; I ran into container errors, network bridges, and driver incompatibilities, but we cracked the code. Here is exactly how we did it so that you can do it too.
The Hardware Challenge
The AMD RX 590 is a classic "Polaris" architecture card with 8GB of VRAM. While it’s a legend for 1080p gaming, modern AI software (ROCm) has officially dropped support for it. If you try to run AI out of the box, your system will likely ignore your GPU and crawl along on your CPU.
The Goal: Force the software to recognize the GPU and offload the heavy math to those 2,304 stream processors.
Step 1: The Docker Foundation
We used Ollama via Docker for this setup. It keeps the environment clean and allows us to easily "spoof" our hardware settings.
First, we had to clear the path. If you have a native version of Ollama running, it will fight Docker for Port 11434.
sudo systemctl disable ollama
sudo systemctl stop ollama
Step 2: The Secret Sauce (Vulkan & Spoofing)
Because the standard AMD drivers (ROCm) are picky about older cards, we utilized Vulkan. Vulkan is much more forgiving with "legacy" hardware.
We also used a "spoofing" variable to tell the driver to treat our Polaris card like a supported device. Here is the exact command that finally worked:
docker run -d \
--device /dev/dri \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
--restart always \
-e OLLAMA_VULKAN=1 \
-e HSA_OVERRIDE_GFX_VERSION=8.0.3 \
ollama/ollama
Why this works:
--device /dev/dri: Gives Docker direct access to the graphics hardware.OLLAMA_VULKAN=1: Switches the engine from ROCm to Vulkan.HSA_OVERRIDE_GFX_VERSION=8.0.3: Tricks the system into seeing the RX 590 as a compatible architecture.
Step 3: Picking the Right Model (7B vs 14B)
This was the biggest lesson of the journey.
- The 14B Model: At ~9GB, this model is slightly larger than the RX 590’s 8GB VRAM. This causes "spillover" into system RAM, resulting in a 98-second response time for simple tasks.
- The 7B Model: At ~4.7GB, it fits entirely within the GPU’s high-speed memory.
The result? The 7B model finished the same task in 18 seconds. That is a 5x speed increase just by picking a model that matches your VRAM.
Step 4: Troubleshooting the "Network Unreachable" Error
During setup, we hit a wall where the container couldn't download models (DNS issues). If your container has internet but can’t "see" the model registry, try adding an explicit DNS flag to your docker run command:--dns 8.8.8.8
The Result
I now have a Senior-level Coding Assistant running 100% locally on a mid-range card from 2018.
- Model: Qwen2.5-Coder-7B
- Speed: ~18 seconds for complex Python scripts.
- Privacy: 100%. No data leaves my machine.
Final Tips for Fellow Tweakers:
- Check your logs: Use
docker logs ollama | grep -i "vulkan"to ensure your VRAM isn't showing as "0 B". - Don't chase Parameters: A 7B model running on a GPU will almost always provide a better experience than a 14B model crawling on a CPU.
- Disable Native Services: Ensure your Linux background services aren't hogging the ports or the GPU before you start Docker.
Happy Hacking! AI isn't just for the 4090 owners; it belongs to anyone with the patience to tweak the config.
Comments