Examining AI agents in the medical field: AI that speaks DICOM
At Eliovp, we’re constantly keeping up with the newest AI trends. Consequently, we have been looking into AI agents and have created a medical agent designed to seamlessly interact with DICOM servers inside hospitals. This isn’t just another chatbot or AI tool. This is an intelligent assistant that understands the language of radiology and is built to support the people at the heart of healthcare: radiologists, technicians, and hospital staff.
What can the medical agent do?
Our agent connects directly to your hospital’s DICOM infrastructure, which means it can:
- Look up patients by name, ID, or metadata
- Retrieve study information, including modalities, timestamps, and diagnostic notes
- Analyze medical images for abnormalities, pathologies, and injuries. The agent can retrieve the relevant images directly from the DICOM server and give an analysis solely based on the images. No reports needed!
With this agent, radiologists don’t need to spend hours each day looking up patients, searching through all studies and series, and analysing images manually. They can just ask the agent in plain language and it will do the rest.
Seamless workflow, no extra tools and secure connections
What makes this agent truly powerful is how naturally it integrates into existing hospital workflows. There’s no need to learn new software or navigate complex menus. Just ask and get the information you need.
Many questions may arise about security concerns. Don’t worry, because the agent will never share sensitive data to the outside world as it can run on your own servers with self-hosted models. Built-in limitations in its tools also form a second protection layer for the hospital’s data. So, data leakage will never occur!
See it in action
We’ve recorded a few demos to show the agent in action. They range from fetching patient records to detecting pathologies in diagnostic images. These videos demonstrate real-world tasks that hospital staff perform every day, now made faster and easier with agentic AI support.
These demos were performed in a test environment and the execution speed can still be optimized. The agent always takes a step-by-step approach in gathering the most relevant information, so multiple tool calls like querying patients, studies and series, are performed sequentially.
All patient information has also been anonymized.
Technical details
Here’s a small summary of the technical side of the agent:
- Agent LLM: Qwen2.5-32B-Instruct served with vLLM on 1 AMD MI300x GPU
- DICOM server in the demo: Self-hosted Orthanc
- Vision language model: several state-of-the-art VLMs were compared in analyzing the images and forming a diagnosis, and are listed below. All models were served with vLLM on respectively 4, 1, 2 and 1 AMD MI300x GPUs. The model used in the last demo video above is Qwen/Qwen2.5-VL-72B-Instruct.
Vision language model comparison
By examining the performance on an MRI scan of a knee, we highlight the variability in diagnostic conclusions and the clinical relevance of leading vision language models. Below, the outputs from these models are set out and you can compare them yourself with the radiologist’s report. Clearly, Llama-4-Scout-17B-16E-Instruct and Qwen2.5-VL-32B-Instruct are not able to give a correct diagnosis. Qwen2.5-VL-72B-Instruct and gemma-3-27b-it perform better on the other hand. They are able to diagnose the patient correctly to a great degree.
For your convenience we have created a small scoring system to determine whether the models performed well or poorly. From the radiologist’s report we extract four key aspects in the diagnosis: ACL tear (1), limited Segond fracture and elongation of the anterolateral ligament (2), bone edema (3) and presumably no meniscus tear (4). If the VLM mentions one of these key aspects in its diagnosis, it gets one point. If it mentions two aspects it gets two points and so on.
Model | Score (points) | Score (percentage) |
meta-llama/Llama-4-Scout-17B-16E-Instruct | 0/4 | 0% |
google/gemma-3-27b-it | 3/4 | 75% |
Qwen/Qwen2.5-VL-72B-Instruct | 3/4 | 75% |
Qwen/Qwen2.5-VL-32B-Instruct | 1/4 | 25% |
Disclaimer: We are no medical professionals and have based the scoring solely on the text output.
The output from the models and the radiologist’s conclusion can be found below.
Input message: Can you analyze the images in series pd_tse_sag from patient ID 0000850361?
Radiologist
Complete tear of the anterior cruciate ligament; associated limited Segond fracture with some elongation of the anterolateral ligament.
Significant bone edema primarily lateral tibial, somewhat less medial tibial, and even less medial femoral, with inset bone fissures primarily lateral tibial.
Presumably no tear of the posterior horn of the lateral meniscus see description.
No tear of the medial meniscus.
meta-llama/Llama-4-Scout-17B-16E-Instruct
The analysis of the MRI images from the series ‘pd_tse_sag’ has been completed. The conclusion is as follows:
Diagnosis:
Based on the provided MRI image of the knee, the diagnosis appears to be a Posterior Cruciate Ligament (PCL) injury, specifically a PCL tear.
Reasoning Steps:
Visible Disruption: The PCL appears disrupted or torn, which is a common injury in sports and trauma.
Clinical Context: PCL injuries often result from a direct blow to the front of the knee or from a fall onto a flexed knee. Symptoms may include pain, instability, and swelling.
Imaging Features: The MRI shows a discontinuity or abnormal signal in the PCL, indicative of a tear.
Conclusion:
The diagnosis of a PCL injury is critical for guiding treatment, which may include conservative management or surgical intervention, depending on the severity and the patient’s activity level.
google/gemma-3-27b-it
The analysis of the knee MRI series “pd_tse_sag” from patient ID 0000850361 suggests the following:
Key Findings:
ACL Tear: The anterior cruciate ligament (ACL) appears to be torn as it is frayed and discontinuous.
Bone Bruising: There is bone marrow edema in the lateral femoral condyle and the posterior aspect of the lateral tibial plateau, indicating impact forces at the time of injury.
Possible Meniscal Tear: The posterior horn of the medial meniscus appears irregular and possibly disrupted.
Joint Effusion: There is some fluid within the joint space, which is common after an acute knee injury.
Posterior Translation of Tibia: The tibia appears to be slightly translated posteriorly relative to the femur, a sign of ACL insufficiency.
Likely Diagnosis: An acute ACL tear with associated bone contusions and a possible medial meniscal tear.
However, I am an AI and my analysis should not replace professional medical advice. Please consult with a qualified healthcare professional for an accurate diagnosis and treatment plan.
Qwen/Qwen2.5-VL-72B-Instruct
The MRI image series ‘pde_tse_sag’ from patient ID 0000850361 shows a sagittal view of the left knee joint. Based on the image analysis, here are some observations and potential diagnoses:
ACL (Anterior Cruciate Ligament) Tear: The ACL appears to be disrupted or torn. The normal continuity of the ligament is not visible, which is a strong indicator of an ACL tear.
Bone Bruise: There may be evidence of bone bruising (bone marrow edema) in the femur and tibia, which is often associated with ACL injuries.
Meniscus Tear: The meniscus appears to be intact based on this image, but a more detailed analysis would be needed to rule out any tears.
Joint effusion: There is a significant amount of fluid in the joint space, which is consistent with an acute injury.
Other Soft Tissue Injuries: There may be additional soft tissue injuries, such as damage to the collateral ligaments or other structures, but this would require further evaluation.
Recommendations:
Clinical Correlation: The findings should be correlated with the patient’s clinical history and physical examination.
Further Imaging: Additional MRI sequences (e.g., T1-weighted, PD-weighted) may be needed for a more comprehensive evaluation.
Referral to Orthopedic Specialist: Given the potential for an ACL tear, referral to an orthopedic surgeon for further management and possible surgical intervention is recommended.
Please consult with a medical professional for a definitive diagnosis and treatment plan.
Qwen/Qwen2.5-VL-32B-Instruct
Based on the analysis of MRI images from the series “pd_tse_sag” for the patient with ID 0000850361, here are some potential findings:
The joint space appears intact, suggesting no significant degenerative changes.
There is a focal area of increased signal intensity in the medial meniscus, which could indicate a tear or degeneration.
The cruciate ligaments (ACL and PCL) appear intact, but subtle tears or sprains might not be visible without specific sequences.
There is a region of increased signal intensity in the bone marrow of the tibial plateau, which could indicate bone marrow edema (BME). This is often associated with trauma, stress fractures, or inflammatory conditions.
The surrounding soft tissues do not show significant abnormalities, but there may be some fluid accumulation around the joint.
However, it’s important to note that these findings are purely observational and cannot replace a proper medical diagnosis. Consulting with a healthcare professional who can consider the patient’s symptoms, medical history, and conduct a physical examination would be necessary for a definitive diagnosis and treatment plan.
Conclusion
Humanity is one step closer to achieving a simplified and automated diagnosis tool. However, it’s important to tread carefully. Oversimplification can lead to overly broad or inaccurate diagnoses, potentially overlooking critical details and increasing the risk of worst-case outcomes. Striking the right balance between usability and clinical precision is key.
While general-purpose Vision-Language Models (VLMs) demonstrate impressive capabilities, their outputs can be significantly improved through fine-tuning on domain-specific datasets. In the context of healthcare, this means training the model with high-quality medical images, radiology reports, pathology notes, or other clinical documentation. Fine-tuning allows the model to learn medical terminology, visual patterns specific to diseases, and nuanced diagnostic cues, resulting in more accurate and context-aware outputs. This domain adaptation is crucial when deploying AI tools in sensitive environments like hospitals, where precision and reliability are paramount.