Mistral
Pixtral 12B(2409)
mistralai/pixtral-12b-2409
Pixtral 12B is a state-of-the-art multimodal AI model developed by Mistral AI. It combines strong visual understanding capabilities with excellent text processing, making it a versatile tool for various multimodal tasks. Key features include:
- Natively multimodal architecture, trained on interleaved image and text data
- 400M parameter vision encoder and 12B parameter multimodal decoder based on Mistral Nemo Support for variable image sizes and multiple images within a 128k token context window
- Top-tier performance on multimodal benchmarks like MMMU (52.5%), outperforming many larger models
- Maintained excellence in text-only tasks, unlike some other multimodal models
Pixtral excels in tasks such as chart understanding, document question-answering, and multimodal reasoning. It's particularly strong in instruction following for both multimodal and text-only scenarios. The model can process images at their native resolution and aspect ratio, offering flexibility in token usage for image processing.
Capability
Vision Support
Using Pixtral 12B(2409) with Python API
Using Pixtral 12B(2409) with OpenAI compatible API
import openai
client = openai.Client(
api_key= '{your_api_key}',
base_url="https://api.model.box/v1",
)
response = client.chat.completions.create(
model="mistralai/pixtral-12b-2409",
messages: [
{
role: 'user',
content:
'introduce your self',
},
]
)
print(response)