Optimizing QwQ-32B (by Qwen): AMD MI300X vs. NVIDIA H200

1. Introduction In the world of large language models (LLMs), most benchmarks center on Llama or DeepSeek derivatives. We decided to diversify by adding the Qwen2 architecture, using our Paiton framework. This 32-billion-parameter model pushes GPU resources to the limit, perfect for comparing NVIDIA’s new H200 to our AMD MI300X, which leverages Paiton for advanced concurrency and custom kernel compilation.…

Eliovp Featured on AMD “Tech Talk” Podcast

We’re excited to share that Eliovp was recently featured on AMD’s “Tech Talk” podcast! In this episode, our CEO, Elio Van Puyvelde sits down with Jim greene to talk about the origins of Eliovp, the passion and expertise that brought the company to life, and the innovative full end-to-end solutions we offer today. From our humble beginnings to our current…

Further Optimizing AMD-Powered Inference with Paiton

Executive Summary If you’ve followed our journey so far, you’ll know that Paiton is laser-focused on AMD-centric inference optimization. Our latest work takes DeepSeek R1 Distill Llama 8B to the next level, delivering 10–15% higher throughput, improved time-to-first-token (TTFT), and more stable performance at lower batch sizes, an area that previously needed a boost. In short, Paiton further cements its ability to exploit AMD hardware’s raw power, bridging the performance gap in a…

News & Updates