Google
Gemini Pro Vision 1.0
google/gemini-pro-vision
Google's flagship multimodal model, supporting image and video in text or chat prompts for a text or code response.
See the benchmarks and prompting guidelines from Deepmind.
Usage of Gemini is subject to Google's Gemini Terms of Use.
#multimodal
Capability
Vision Support
Context Window
45,875
Max Output Tokens
2,048