Stop Overpaying: Paiton MI300X MoE Beats H200/B200 on $/1M Tokens

Short summary: We benchmarked Paiton with our new MoE support on Qwen/Qwen3-30B-A3B-Instruct-2507 to compare inference performance across several setups. Each configuration was run five times per batch size and we report the mean across runs. Why this benchmark Most published numbers use synthetic prompts or toy datasets. We focused on realistic conversational workloads (we always do) to highlight true latency…

Agentic AI, But Make It Local: From Inbox to Insight to Action

(Nederlandse versie) We’ve built production-ready, local-first agentic AI that plugs into your existing email stack, auto-creates tickets, classifies messages, extracts multi-question threads, reads PDFs, spots invoices/quotes, analyzes images (yes, damage detection), and pushes structured reports into your systems, no dependency on OpenAI, Google, or Microsoft unless you want it. Tailor-made models trained on your data, on your hardware, inside your…

MI300X FP8 Data‑Parallel Benchmarks (8–64 GPUs): H200 Left Behind, B200 Within Reach

At ElioVP, we’re all about pushing AI inference past the limits, and packaging every squeeze of performance into a plug‑and‑play runtime.  Remember our last blog, where Paiton’s FP8 pipeline on AMD’s MI300X completely outclassed NVIDIA’s H200? Well, buckle up, because we’ve gone back to the drawing board. This time, we’re loading Llama-3.1-8B-Instruct-FP8-KV, the leaner, meaner FP8‑quantized Llama variant, into not…

News & Updates