Models | ModelBox

Google

Gemini 2.5 Flash Preview

Gemini 2.5 models are thinking models, capable of reasoning through their thoughts before responding. Instead of immediately generating an output, the model can perform a "thinking" process to better understand the prompt, break down complex tasks, and plan a response. On complex tasks that require multiple steps of reasoning (like solving math problems or analyzing research questions), the thinking process allows the model to arrive at more accurate and comprehensive answers. In fact, Gemini 2.5 Flash performs strongly on Hard Prompts in LMArena, second only to 2.5 Pro.

Vision

Google

Gemini 2.5 Pro

Gemini 2.5 Pro Experimental is Google's most advanced coding model yet and is state-of-the-art across a range of benchmarks requiring enhanced reasoning. 2.5 models are thinking models, capable of reasoning through thoughts before responding. The result is enhanced performance and improved accuracy. This means Gemini 2.5 can handle more complex problems in coding, science and math, and support more context-aware agents.

Vision

Google

Gemini 2.0 Pro(Experiment)

Gemini 2.0 Pro is Google's most advanced AI model to date, designed to excel in complex tasks such as coding and handling intricate prompts. It features a substantial context window of 2 million tokens, enabling comprehensive analysis of extensive information. The model also integrates seamlessly with tools like Google Search and code execution environments, enhancing its utility for developers. Currently available in experimental form through Google AI Studio and Vertex AI, as well as to Gemini Advanced users, Gemini 2.0 Pro represents a significant leap forward in AI capabilities. citeturn0search1

Vision

Google

Gemini 2.0 Flash

The Gemini 2.0 Flash model builds upon the achievements of its predecessor, 1.5 Flash, which was widely regarded as a favorite among developers for its impressive performance and rapid response times. Notably, Gemini 2.0 Flash surpasses the 1.5 Pro model on key benchmarks while operating at double the speed. This upgraded version introduces several new capabilities. In addition to handling multimodal inputs such as images, video, and audio, it now supports multimodal outputs, including natively generated images combined with text and steerable multilingual text-to-speech (TTS) audio. Furthermore, Gemini 2.0 Flash can seamlessly integrate with external tools like Google Search, perform code execution, and leverage third-party user-defined functions.

Vision

Google

Gemini 2.0 Flash Lite

The Gemini 2.0 Flash model builds upon the achievements of its predecessor, 1.5 Flash, which was widely regarded as a favorite among developers for its impressive performance and rapid response times. Notably, Gemini 2.0 Flash surpasses the 1.5 Pro model on key benchmarks while operating at double the speed. This upgraded version introduces several new capabilities. In addition to handling multimodal inputs such as images, video, and audio, it now supports multimodal outputs, including natively generated images combined with text and steerable multilingual text-to-speech (TTS) audio. Furthermore, Gemini 2.0 Flash can seamlessly integrate with external tools like Google Search, perform code execution, and leverage third-party user-defined functions.

Vision

Google

Gemini 2.0 Flash Thinking Mode

Gemini 2.0 Flash Thinking Mode is an experimental model that's trained to generate the "thinking process" the model goes through as part of its response. As a result, Thinking Mode is capable of stronger reasoning capabilities in its responses than the base Gemini 2.0 Flash model.

Vision

Google

Gemini 2.0 Flash (experimental)

The Gemini 2.0 Flash model builds upon the achievements of its predecessor, 1.5 Flash, which was widely regarded as a favorite among developers for its impressive performance and rapid response times. Notably, Gemini 2.0 Flash surpasses the 1.5 Pro model on key benchmarks while operating at double the speed. This upgraded version introduces several new capabilities. In addition to handling multimodal inputs such as images, video, and audio, it now supports multimodal outputs, including natively generated images combined with text and steerable multilingual text-to-speech (TTS) audio. Furthermore, Gemini 2.0 Flash can seamlessly integrate with external tools like Google Search, perform code execution, and leverage third-party user-defined functions.

Vision

Google

Gemini 2.0 Pro 0205(Experiment)

Gemini 2.0 Pro is Google's most advanced AI model to date, designed to excel in complex tasks such as coding and handling intricate prompts. It features a substantial context window of 2 million tokens, enabling comprehensive analysis of extensive information. The model also integrates seamlessly with tools like Google Search and code execution environments, enhancing its utility for developers. Currently available in experimental form through Google AI Studio and Vertex AI, as well as to Gemini Advanced users, Gemini 2.0 Pro represents a significant leap forward in AI capabilities. citeturn0search1

Vision

Google

Gemini Flash 1.5 0827 (experiment)

Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots. Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. On most common tasks, Flash achieves comparable quality to other Gemini Pro models at a significantly reduced cost. Flash is well-suited for applications like chat assistants and on-demand content generation where speed and scale matter.

Vision

Google

Gemini Pro 1.5 0827 (experiment)

Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including: - Code generation - Text generation - Text editing - Problem solving - Recommendations - Information extraction - Data extraction or generation - AI agents Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). *Note: Preview models are offered for testing purposes and should not be used in production apps. This model is **heavily rate limited**.*

Google

Gemini Flash 1.5 (preview)

Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots. Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. On most common tasks, Flash achieves comparable quality to other Gemini Pro models at a significantly reduced cost. Flash is well-suited for applications like chat assistants and on-demand content generation where speed and scale matter. #multimodal

Vision

Google

Gemini Pro 1.5 (preview)

Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including: - Code generation - Text generation - Text editing - Problem solving - Recommendations - Information extraction - Data extraction or generation - AI agents Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). *Note: Preview models are offered for testing purposes and should not be used in production apps. This model is **heavily rate limited**.* #multimodal

Google

Gemini Pro 1.0

Google's flagship text generation model. Designed to handle natural language tasks, multiturn text and code chat, and code generation. See the benchmarks and prompting guidelines from [Deepmind](https://deepmind.google/technologies/gemini/). Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).

Google

Gemini Pro Vision 1.0

Google's flagship multimodal model, supporting image and video in text or chat prompts for a text or code response. See the benchmarks and prompting guidelines from [Deepmind](https://deepmind.google/technologies/gemini/). Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). #multimodal

Vision

Google

Gemini Flash 1.5 0827 (experiment)

Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots. Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. On most common tasks, Flash achieves comparable quality to other Gemini Pro models at a significantly reduced cost. Flash is well-suited for applications like chat assistants and on-demand content generation where speed and scale matter.

Vision

Google

Gemini Pro 1.5 0827 (experiment)

Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including: - Code generation - Text generation - Text editing - Problem solving - Recommendations - Information extraction - Data extraction or generation - AI agents Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). *Note: Preview models are offered for testing purposes and should not be used in production apps. This model is **heavily rate limited**.*

Vision

Google

Gemini Flash 1.5 0827 (experiment)

Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots. Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. On most common tasks, Flash achieves comparable quality to other Gemini Pro models at a significantly reduced cost. Flash is well-suited for applications like chat assistants and on-demand content generation where speed and scale matter.

Vision

Google

Gemini Pro 1.5 0827 (experiment)

Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including: - Code generation - Text generation - Text editing - Problem solving - Recommendations - Information extraction - Data extraction or generation - AI agents Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). *Note: Preview models are offered for testing purposes and should not be used in production apps. This model is **heavily rate limited**.*

Vision