Mistral
Pixtral 12B(2409)
mistralai/pixtral-12b-2409
Pixtral 12B is a state-of-the-art multimodal AI model developed by Mistral AI. It combines strong visual understanding capabilities with excellent text processing, making it a versatile tool for various multimodal tasks. Key features include:
- Natively multimodal architecture, trained on interleaved image and text data
- 400M parameter vision encoder and 12B parameter multimodal decoder based on Mistral Nemo Support for variable image sizes and multiple images within a 128k token context window
- Top-tier performance on multimodal benchmarks like MMMU (52.5%), outperforming many larger models
- Maintained excellence in text-only tasks, unlike some other multimodal models
Pixtral excels in tasks such as chart understanding, document question-answering, and multimodal reasoning. It's particularly strong in instruction following for both multimodal and text-only scenarios. The model can process images at their native resolution and aspect ratio, offering flexibility in token usage for image processing.
Capability
Vision Support