Mistral
Mistral

Pixtral 12B(2409)

mistralai/pixtral-12b-2409

Pixtral 12B is a state-of-the-art multimodal AI model developed by Mistral AI. It combines strong visual understanding capabilities with excellent text processing, making it a versatile tool for various multimodal tasks. Key features include:

  • Natively multimodal architecture, trained on interleaved image and text data
  • 400M parameter vision encoder and 12B parameter multimodal decoder based on Mistral Nemo Support for variable image sizes and multiple images within a 128k token context window
  • Top-tier performance on multimodal benchmarks like MMMU (52.5%), outperforming many larger models
  • Maintained excellence in text-only tasks, unlike some other multimodal models

Pixtral excels in tasks such as chart understanding, document question-answering, and multimodal reasoning. It's particularly strong in instruction following for both multimodal and text-only scenarios. The model can process images at their native resolution and aspect ratio, offering flexibility in token usage for image processing.

Capability

Vision Support

00
ProviderInput Token PriceOutput Token Price
Hyperbolic$0.20/Million Tokens$0.2/Million Tokens