Models
DeepSeek Reasoner(r1)
The first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1, were introduced to advance reasoning capabilities. DeepSeek-R1-Zero, developed using large-scale reinforcement learning (RL) without prior supervised fine-tuning (SFT), displayed impressive reasoning performance. Through RL, it naturally acquired a range of powerful and intriguing reasoning behaviors. However, DeepSeek-R1-Zero faced challenges such as repetitive outputs, poor readability, and language mixing. To address these limitations and further improve reasoning capabilities, DeepSeek-R1 was developed, incorporating cold-start data before RL. DeepSeek-R1 demonstrated performance on par with OpenAI-o1 across tasks involving mathematics, coding, and reasoning. To foster progress within the research community, DeepSeek-R1-Zero, DeepSeek-R1, and six distilled dense models based on Llama and Qwen were open-sourced. Among them, DeepSeek-R1-Distill-Qwen-32B surpassed OpenAI-o1-mini on various benchmarks, setting new performance standards for dense models.
Open Source
DeepSeek Chat(V3)
DeepSeek V3, developed by DeepSeek, is a cutting-edge large language model with 685 billion parameters, making it one of the largest in the world. Its 687.9 GB size reflects its vast knowledge base and complexity. The model uses a Mixture of Experts (MoE) architecture, featuring 256 experts, with 8 experts activated per token. This design enables efficient resource allocation, providing high scalability without sacrificing performance. In early benchmarks, DeepSeek V3 secured second place on the Aider Polyglot leaderboard with a score of 48.4%, surpassing models like Claude-3-5 and Gemini-EXP. This highlights its strength in multilingual and contextual reasoning tasks. Currently, DeepSeek V3 is accessible through chat.deepseek.com and the DeepSeek API, as part of a staged rollout. Its scale and innovation surpass even Meta AI’s Llama 3.1 (405B parameters), setting a new standard for large-scale AI models. With its robust performance and innovative architecture, DeepSeek V3 is poised to redefine expectations for efficiency and accuracy in AI-powered applications.
Open Source
Deepseek Coder(V3)
DeepSeek V3, developed by DeepSeek, is a cutting-edge large language model with 685 billion parameters, making it one of the largest in the world. Its 687.9 GB size reflects its vast knowledge base and complexity. The model uses a Mixture of Experts (MoE) architecture, featuring 256 experts, with 8 experts activated per token. This design enables efficient resource allocation, providing high scalability without sacrificing performance. In early benchmarks, DeepSeek V3 secured second place on the Aider Polyglot leaderboard with a score of 48.4%, surpassing models like Claude-3-5 and Gemini-EXP. This highlights its strength in multilingual and contextual reasoning tasks. Currently, DeepSeek V3 is accessible through chat.deepseek.com and the DeepSeek API, as part of a staged rollout. Its scale and innovation surpass even Meta AI’s Llama 3.1 (405B parameters), setting a new standard for large-scale AI models. With its robust performance and innovative architecture, DeepSeek V3 is poised to redefine expectations for efficiency and accuracy in AI-powered applications.
Open Source
DeepSeek Chat(V2.5)
DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions. For model details, please visit [DeepSeek-V2 page](https://github.com/deepseek-ai/DeepSeek-V2) for more information. DeepSeek-V2.5 better aligns with human preferences and has been optimized in various aspects, including writing and instruction following:
Open Source
Deepseek Coder(V2.5)
DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions. For model details, please visit [DeepSeek-V2 page](https://github.com/deepseek-ai/DeepSeek-V2) for more information. DeepSeek-V2.5 better aligns with human preferences and has been optimized in various aspects, including writing and instruction following:
Open Source
Gemini 2.0 Flash Thinking Mode
Gemini 2.0 Flash Thinking Mode is an experimental model that's trained to generate the "thinking process" the model goes through as part of its response. As a result, Thinking Mode is capable of stronger reasoning capabilities in its responses than the base Gemini 2.0 Flash model.
Vision
O1
The OpenAI o1 models are designed to spend more time thinking before responding, improving their ability to reason through complex tasks in science, coding, and math. The first model of this series is now available in ChatGPT and the API, with regular updates expected.
Vision
Gemini 2.0 Flash (experimental)
The Gemini 2.0 Flash model builds upon the achievements of its predecessor, 1.5 Flash, which was widely regarded as a favorite among developers for its impressive performance and rapid response times. Notably, Gemini 2.0 Flash surpasses the 1.5 Pro model on key benchmarks while operating at double the speed. This upgraded version introduces several new capabilities. In addition to handling multimodal inputs such as images, video, and audio, it now supports multimodal outputs, including natively generated images combined with text and steerable multilingual text-to-speech (TTS) audio. Furthermore, Gemini 2.0 Flash can seamlessly integrate with external tools like Google Search, perform code execution, and leverage third-party user-defined functions.
Vision
Llama 3.3 70B Instruct
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
Open Source
Qwen2 VL 72B
### What's New in Qwen2-VL? #### [](https://huggingface.co/Qwen/Qwen2-VL-72B#key-enhancements)Key Enhancements: - SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. - Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. - Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. - Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.
Vision
Open Source
GPT-4o
This version’s creative writing ability has leveled up–more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with uploaded files, providing deeper insights & more thorough responses. GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.
Vision
Claude 3.5 Haiku
Claude 3.5 Haiku is the next generation of our fastest model. For the same cost and similar speed to Claude 3 Haiku, Claude 3.5 Haiku improves across every skill set and surpasses even Claude 3 Opus, the largest model in our previous generation, on many intelligence benchmarks. Claude 3.5 Haiku is particularly strong on coding tasks. For example, it scores 40.6% on SWE-bench Verified, outperforming many agents using publicly available state-of-the-art models—including the original Claude 3.5 Sonnet and GPT-4o.
Vision
Claude 3.5 Sonnet (new)
Claude 3.5 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments. Claude 3.5 Sonnet raises the industry bar for intelligence, outperforming competitor models and Claude 3 Opus on a wide range of evaluations, with the speed and cost of our mid-tier model, Claude 3 Sonnet.
Vision
ChatGPT-4o Latest
ChatGPT-4o contains latest improvements for chat use cases, expected for testing/evaluation purpose. ChatGPT-4o also supports structured outputs, with up to 16k max output tokens GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.
Vision
QwQ 32B Preview
QwQ-32B-Preview is an experimental research model developed by the Qwen Team, focused on advancing AI reasoning capabilities. As a preview release, it demonstrates promising analytical abilities while having several important limitations: 1. Language Mixing and Code-Switching: The model may mix languages or switch between them unexpectedly, affecting response clarity. 2. Recursive Reasoning Loops: The model may enter circular reasoning patterns, leading to lengthy responses without a conclusive answer. 3. Safety and Ethical Considerations: The model requires enhanced safety measures to ensure reliable and secure performance, and users should exercise caution when deploying it. 4. Performance and Benchmark Limitations: The model excels in math and coding but has room for improvement in other areas, such as common sense reasoning and nuanced language understanding.
Open Source
Qwen2.5 Coder 32B Instruct
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: * Significantly improvements in code generation, code reasoning and code fixing. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o. * A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies. Long-context Support up to 128K tokens.
Open Source
Qwen2.5 7B Instruct
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. - Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. - Long-context Support up to 128K tokens and can generate up to 8K tokens. - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. This repo contains the instruction-tuned 7B Qwen2.5 model, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - Number of Parameters: 7.61B - Number of Paramaters (Non-Embedding): 6.53B - Number of Layers: 28 - Number of Attention Heads (GQA): 28 for Q and 4 for KV - Context Length: Full 131,072 tokens and generation 8192 tokens - Please refer to [this section](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts. For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).
Open Source
Qwen2.5 Turbo (1M Context)
Following the release of Qwen2.5, the team responded to the community's demand for handling longer contexts. Over the past few months, significant optimizations have been made to enhance the model's capabilities and inference performance for extremely long contexts. Now, the team is proud to introduce the new **Qwen2.5-Turbo** model, featuring the following advancements: - **Extended Context Support**: The context length has been increased from 128k to 1M tokens, equivalent to approximately 1 million English words or 1.5 million Chinese characters. This capacity corresponds to 10 full-length novels, 150 hours of speech transcripts, or 30,000 lines of code. Qwen2.5-Turbo achieves 100% accuracy in the 1M-token Passkey Retrieval task and scores 93.1 on the RULER long-text evaluation benchmark, outperforming GPT-4 (91.6) and GLM4-9B-1M (89.9). Moreover, the model retains strong performance in short sequence tasks, comparable to GPT-4o-mini. - **Faster Inference Speed**: Leveraging sparse attention mechanisms, the time to generate the first token for a 1M-token context has been reduced from 4.9 minutes to just 68 seconds, representing a 4.3x speed improvement. - **Cost Efficiency**: The pricing remains unchanged at $0.05 per 1M tokens. At this rate, Qwen2.5-Turbo processes 3.6 times more tokens than GPT-4o-mini for the same cost.
Grok Beta
Grok-2 Beta introduces two advanced language models, with superior performance in reasoning, coding, and understanding tasks compared to prior models. Grok-2 outperforms competitors like GPT-4 Turbo on key benchmarks and includes state-of-the-art capabilities in multimodal and real-time applications. It's accessible via the platform for Premium users and will soon be available through a secure, low-latency enterprise API. These updates mark significant advancements in xAI's pursuit of cutting-edge AI development.
Grok Beta
Grok-2 Beta introduces two advanced language models, with superior performance in reasoning, coding, and understanding tasks compared to prior models. Grok-2 outperforms competitors like GPT-4 Turbo on key benchmarks and includes state-of-the-art capabilities in multimodal and real-time applications. It's accessible via the platform for Premium users and will soon be available through a secure, low-latency enterprise API. These updates mark significant advancements in xAI's pursuit of cutting-edge AI development.
Vision
GPT 4o Mini
GPT 4o Mini ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.
Vision
O1 Preview
The OpenAI o1 Preview models are designed to spend more time thinking before responding, improving their ability to reason through complex tasks in science, coding, and math. The first model of this series is now available in ChatGPT and the API, with regular updates expected.
O1 Mini
The OpenAI o1-mini is a newly released smaller version of the o1 model, designed to optimize reasoning tasks, particularly in coding. It provides advanced reasoning capabilities similar to its larger counterpart, making it well-suited for generating and debugging complex code. However, it is 80% cheaper and faster, making it a cost-effective solution for developers who need reasoning power but don’t require broad world knowledge.
Claude 3.5 Haiku 20241022
Claude 3.5 Haiku is the next generation of our fastest model. For the same cost and similar speed to Claude 3 Haiku, Claude 3.5 Haiku improves across every skill set and surpasses even Claude 3 Opus, the largest model in our previous generation, on many intelligence benchmarks. Claude 3.5 Haiku is particularly strong on coding tasks. For example, it scores 40.6% on SWE-bench Verified, outperforming many agents using publicly available state-of-the-art models—including the original Claude 3.5 Sonnet and GPT-4o.
Vision
Claude 3.5 Sonnet 20241022 (new)
Claude 3.5 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments. Claude 3.5 Sonnet raises the industry bar for intelligence, outperforming competitor models and Claude 3 Opus on a wide range of evaluations, with the speed and cost of our mid-tier model, Claude 3 Sonnet.
Vision
Llama 3.2 90B Instruct
Llama 3.2 is the latest iteration of Meta's open-source AI model family, offering enhanced capabilities and versatility. The new release includes models of various sizes: 1B, 3B, 11B, and 90B parameters. The 1B and 3B models are lightweight, multilingual, and text-only, designed for efficient deployment on mobile and edge devices. The larger 11B and 90B models are multimodal, capable of processing both text and high-resolution images. Key features of Llama 3.2 include: 1. Improved performance across over 150 benchmark datasets in multiple languages. 2. Multimodal capabilities in larger models for image understanding and visual reasoning. 3. Integration with Llama Stack, providing a streamlined developer experience with support for multiple programming languages and deployment options. 4. Enhanced support for agentic components, including tool calling, safety guardrails, and retrieval augmented generation. 5. Compatibility with various hardware platforms, including ARM, MediaTek, and Qualcomm for mobile and edge devices. Llama 3.2 has garnered significant attention, with over 350 million downloads on Hugging Face alone. It's being utilized across various industries for applications such as data privacy, productivity enhancement, contextual understanding, and solving complex business needs. The ecosystem around Llama continues to grow, with partners like Dell, Zoom, DoorDash, and KPMG leveraging the technology for diverse use cases.
Vision
Open Source
Llama 3.2 11B Instruct
Llama 3.2 is the latest iteration of Meta's open-source AI model family, offering enhanced capabilities and versatility. The new release includes models of various sizes: 1B, 3B, 11B, and 90B parameters. The 1B and 3B models are lightweight, multilingual, and text-only, designed for efficient deployment on mobile and edge devices. The larger 11B and 90B models are multimodal, capable of processing both text and high-resolution images. Key features of Llama 3.2 include: 1. Improved performance across over 150 benchmark datasets in multiple languages. 2. Multimodal capabilities in larger models for image understanding and visual reasoning. 3. Integration with Llama Stack, providing a streamlined developer experience with support for multiple programming languages and deployment options. 4. Enhanced support for agentic components, including tool calling, safety guardrails, and retrieval augmented generation. 5. Compatibility with various hardware platforms, including ARM, MediaTek, and Qualcomm for mobile and edge devices. Llama 3.2 has garnered significant attention, with over 350 million downloads on Hugging Face alone. It's being utilized across various industries for applications such as data privacy, productivity enhancement, contextual understanding, and solving complex business needs. The ecosystem around Llama continues to grow, with partners like Dell, Zoom, DoorDash, and KPMG leveraging the technology for diverse use cases.
Vision
Open Source
Llama 3.2 3B Instruct
Llama 3.2 is the latest iteration of Meta's open-source AI model family, offering enhanced capabilities and versatility. The new release includes models of various sizes: 1B, 3B, 11B, and 90B parameters. The 1B and 3B models are lightweight, multilingual, and text-only, designed for efficient deployment on mobile and edge devices. The larger 11B and 90B models are multimodal, capable of processing both text and high-resolution images. Key features of Llama 3.2 include: 1. Improved performance across over 150 benchmark datasets in multiple languages. 2. Multimodal capabilities in larger models for image understanding and visual reasoning. 3. Integration with Llama Stack, providing a streamlined developer experience with support for multiple programming languages and deployment options. 4. Enhanced support for agentic components, including tool calling, safety guardrails, and retrieval augmented generation. 5. Compatibility with various hardware platforms, including ARM, MediaTek, and Qualcomm for mobile and edge devices. Llama 3.2 has garnered significant attention, with over 350 million downloads on Hugging Face alone. It's being utilized across various industries for applications such as data privacy, productivity enhancement, contextual understanding, and solving complex business needs. The ecosystem around Llama continues to grow, with partners like Dell, Zoom, DoorDash, and KPMG leveraging the technology for diverse use cases.
Open Source
GPT-4o
GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.
Vision
Qwen2.5 72B Instruct
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: * Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. * Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. * Long-context Support up to 128K tokens and can generate up to 8K tokens. * Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
Qwen2 VL 72B Instruct
Qwen2-VL is the latest iteration of multimodal large language models developed by the Qwen team at Alibaba Cloud. This advanced AI system represents a significant leap forward in the field of vision-language models, building upon its predecessor, Qwen-VL. Qwen2-VL boasts state-of-the-art capabilities in understanding images of various resolutions and aspect ratios, as well as the ability to comprehend videos exceeding 20 minutes in length. One of the most notable features of Qwen2-VL is its versatility as an agent capable of operating mobile devices, robots, and other systems based on visual input and text instructions. This makes it a powerful tool for a wide range of applications, from personal assistance to industrial automation. The model also offers robust multilingual support, enabling it to understand and process text in various languages within images, catering to a global user base.
Vision
Qwen2 VL 7B Instruct
Qwen2-VL is the latest iteration of multimodal large language models developed by the Qwen team at Alibaba Cloud. This advanced AI system represents a significant leap forward in the field of vision-language models, building upon its predecessor, Qwen-VL. Qwen2-VL boasts state-of-the-art capabilities in understanding images of various resolutions and aspect ratios, as well as the ability to comprehend videos exceeding 20 minutes in length. One of the most notable features of Qwen2-VL is its versatility as an agent capable of operating mobile devices, robots, and other systems based on visual input and text instructions. This makes it a powerful tool for a wide range of applications, from personal assistance to industrial automation. The model also offers robust multilingual support, enabling it to understand and process text in various languages within images, catering to a global user base.
Vision
Pixtral 12B(2409)
Pixtral 12B is a state-of-the-art multimodal AI model developed by Mistral AI. It combines strong visual understanding capabilities with excellent text processing, making it a versatile tool for various multimodal tasks. Key features include: * Natively multimodal architecture, trained on interleaved image and text data * 400M parameter vision encoder and 12B parameter multimodal decoder based on Mistral Nemo Support for variable image sizes and multiple images within a 128k token context window * Top-tier performance on multimodal benchmarks like MMMU (52.5%), outperforming many larger models * Maintained excellence in text-only tasks, unlike some other multimodal models Pixtral excels in tasks such as chart understanding, document question-answering, and multimodal reasoning. It's particularly strong in instruction following for both multimodal and text-only scenarios. The model can process images at their native resolution and aspect ratio, offering flexibility in token usage for image processing.
Vision
Gemini Flash 1.5 0827 (experiment)
Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots. Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. On most common tasks, Flash achieves comparable quality to other Gemini Pro models at a significantly reduced cost. Flash is well-suited for applications like chat assistants and on-demand content generation where speed and scale matter.
Vision
Gemini Pro 1.5 0827 (experiment)
Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including: - Code generation - Text generation - Text editing - Problem solving - Recommendations - Information extraction - Data extraction or generation - AI agents Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). *Note: Preview models are offered for testing purposes and should not be used in production apps. This model is **heavily rate limited**.*
Reflection Llama-3.1 70B
Reflection Llama-3.1 70B is (currently) the world's top open-source LLM, trained with a new technique called Reflection-Tuning that teaches a LLM to detect mistakes in its reasoning and correct course.
Open Source
Llama 3.1 405B Instruct
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
Open Source
Llama 3.1 70B Instruct
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
Open Source
Llama 3.1 8B Instruct
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
Open Source
GPT-4o 2024-08-06
GPT-4o with structured outputs, with up to 16k max output tokens GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.
Vision
Qwen2 Math 7B Instruct
Qwen2-Math is a series of specialized math language models built upon the Qwen2 LLMs, which significantly outperforms the mathematical capabilities of open-source models and even closed-source models (e.g., GPT4o).
Open Source
Qwen2 Math 1.5B Instruct
Qwen2-Math is a series of specialized math language models built upon the Qwen2 LLMs, which significantly outperforms the mathematical capabilities of open-source models and even closed-source models (e.g., GPT4o).
Open Source
Qwen2 Math 72B Instruct
Qwen2-Math is a series of specialized math language models built upon the Qwen2 LLMs, which significantly outperforms the mathematical capabilities of open-source models and even closed-source models (e.g., GPT4o).
Open Source
Qwen2 Audio 7B Instruct
Qwen2-Audio is the new series of Qwen large audio-language models. Qwen2-Audio is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. We introduce two distinct audio interaction modes: * voice chat: users can freely engage in voice interactions with Qwen2-Audio without text input; * audio analysis: users could provide audio and text instructions for analysis during the interaction;
Open Source
Gemma2 2B Instruct
Gemma2 is a versatile tool used in both machine learning and genetic research. It is part of the PaliGemma family, which includes powerful Vision-Language Models (VLMs) built on open components like the SigLIP vision model and the Gemma language model. In genetics, Gemma2 implements the Genome-wide Efficient Mixed-Model Association (GEMMA) for genome-wide association studies (GWAS) . It is also recognized in the open-source community for its efficiency in handling large models and datasets. Additionally, it provides an implementation of the GEMMA algorithm for statistical analysis of multivariate linear mixed models [4].
Open Source
GPT-4 Turbo
The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to Dec 2023. This model is updated by OpenAI to point to the latest version of [GPT-4 Turbo](/models?q=openai/gpt-4-turbo), currently gpt-4-turbo-2024-04-09 (as of April 2024).
Vision
GPT-4 Vision
Ability to understand images, in addition to all other [GPT-4 Turbo capabilties](/models/openai/gpt-4-turbo). Training data: up to Apr 2023. **Note:** heavily rate limited by OpenAI while in preview. #multimodal
Vision
Claude 3 Haiku
Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-haiku) #multimodal
Vision
Gemma 2B
Gemma by Google is an advanced, open-source language model family, leveraging the latest in decoder-only, text-to-text technology. It offers English language capabilities across text generation tasks like question answering, summarization, and reasoning. The Gemma 7B variant is comparable in performance to leading open source models. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
Open Source
Claude 3 Opus
Claude 3 Opus is Anthropic's most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family) #multimodal
Vision
Claude 3 Sonnet
Claude 3 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family) #multimodal
Vision
Gemini Flash 1.5 (preview)
Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots. Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. On most common tasks, Flash achieves comparable quality to other Gemini Pro models at a significantly reduced cost. Flash is well-suited for applications like chat assistants and on-demand content generation where speed and scale matter. #multimodal
Vision
Gemini Pro 1.5 (preview)
Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including: - Code generation - Text generation - Text editing - Problem solving - Recommendations - Information extraction - Data extraction or generation - AI agents Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). *Note: Preview models are offered for testing purposes and should not be used in production apps. This model is **heavily rate limited**.* #multimodal
Llama 3 70B Instruct
Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
Open Source
Llama 3 70B
Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This is the base 70B pre-trained version. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
Open Source
Llama 3 8B
Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This is the base 8B pre-trained version. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
Open Source
Qwen 2 72B Chat
Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the instruction-tuned 72B Qwen2 model. Compared with the state-of-the-art opensource language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most opensource models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc. Qwen2-72B-Instruct supports a context length of up to 131,072 tokens, enabling the processing of extensive inputs.
Open Source
Claude 3.5 Sonnet(20240620)
Claude 3.5 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments. Claude 3.5 Sonnet raises the industry bar for intelligence, outperforming competitor models and Claude 3 Opus on a wide range of evaluations, with the speed and cost of our mid-tier model, Claude 3 Sonnet.
Vision
GPT-4o-2024-05-13
GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.
Vision
GPT-3.5 Turbo
GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Updated by OpenAI to point to the [latest version of GPT-3.5](/models?q=openai/gpt-3.5). Training data up to Sep 2021.
GPT-4o 64k(alpha test version)
An experimental version of GPT-4o with a maximum of 64K output tokens per request. GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.
Vision
GPT-3.5 Turbo 16k
The latest GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Training data: up to Sep 2021. This version has a higher accuracy at responding in requested formats and a fix for a bug which caused a text encoding issue for non-English language function calls.
GPT-3.5 Turbo (older v0301)
GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Updated by OpenAI to point to the [latest version of GPT-3.5](/models?q=openai/gpt-3.5). Training data up to Sep 2021.
GPT-3.5 Turbo (older v0613)
GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Updated by OpenAI to point to the [latest version of GPT-3.5](/models?q=openai/gpt-3.5). Training data up to Sep 2021.
GPT-3.5 Turbo 16k
The latest GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Training data: up to Sep 2021.
GPT-3.5 Turbo 16k
This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single request at a higher cost. Training data: up to Sep 2021.
GPT-3.5 Turbo Instruct
Similar capabilities as GPT-3 era models. Compatible with legacy Completions endpoint and not Chat Completions.
GPT-4 Turbo
The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to Dec 2023. This model is updated by OpenAI to point to the latest version of [GPT-4 Turbo](/models?q=openai/gpt-4-turbo), currently gpt-4-turbo-2024-04-09 (as of April 2024).
GPT-4 Turbo
The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to Dec 2023. This model is updated by OpenAI to point to the latest version of [GPT-4 Turbo](/models?q=openai/gpt-4-turbo), currently gpt-4-turbo-2024-04-09 (as of April 2024).
GPT-4 Turbo Vision Preview(older v1106)
GPT-4 model with the ability to understand images, in addition to all other GPT-4 Turbo capabilities. This is a preview model, we recommend developers to now use gpt-4-turbo which includes vision capabilities.
Vision
GPT-4 Turbo
The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to Dec 2023. This model is updated by OpenAI to point to the latest version of [GPT-4 Turbo](/models?q=openai/gpt-4-turbo), currently gpt-4-turbo-2024-04-09 (as of April 2024).
Vision
GPT-4 0613
OpenAI's flagship model, GPT-4 is a large-scale multimodal language model capable of solving difficult problems with greater accuracy than previous models due to its broader general knowledge and advanced reasoning capabilities. Training data: up to Sep 2021.
Claude 3 Haiku (20240307)
Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-haiku) #multimodal
Vision
Claude 3 Opus(20240229)
Claude 3 Opus is Anthropic's most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family) #multimodal
Vision
Claude 3 Sonnet(20240229)
Claude 3 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family) #multimodal
Vision
GPT-3.5 Turbo 16k
This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single request at a higher cost. Training data: up to Sep 2021.
Llama 3 8B Instruct
Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
Open Source
Mistral: Mixtral 8x22B (base)
Mixtral 8x22B is a large-scale language model from Mistral AI. It consists of 8 experts, each 22 billion parameters, with each token using 2 experts at a time. It was released via [X](https://twitter.com/MistralAI/status/1777869263778291896). #moe
Open Source
Mistral: Mixtral 8x22B Instruct
Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding, and reasoning - large context length (64k) - fluency in English, French, Italian, German, and Spanish See benchmarks on the launch announcement [here](https://mistral.ai/news/mixtral-8x22b/). #moe
Open Source
Mixtral 8x7B (base)
A pretrained generative Sparse Mixture of Experts, by Mistral AI. Incorporates 8 experts (feed-forward networks) for a total of 47B parameters. Base model (not fine-tuned for instructions) - see [Mixtral 8x7B Instruct](/models/mistralai/mixtral-8x7b-instruct) for an instruct-tuned model. #moe
Open Source
Mixtral 8x7B Instruct
A pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters. Instruct model fine-tuned by Mistral. #moe
Open Source
GPT-4
OpenAI's flagship model, GPT-4 is a large-scale multimodal language model capable of solving difficult problems with greater accuracy than previous models due to its broader general knowledge and advanced reasoning capabilities. Training data: up to Sep 2021.
GPT-4 (older v0314)
GPT-4-0314 is the first version of GPT-4 released, with a context length of 8,192 tokens, and was supported until June 14. Training data: up to Sep 2021.
GPT-4 Turbo (older v1106)
The latest GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Training data: up to Apr 2023. **Note:** heavily rate limited by OpenAI while in preview.
GPT-4 32k
GPT-4-32k is an extended version of GPT-4, with the same capabilities but quadrupled context length, allowing for processing up to 40 pages of text in a single pass. This is particularly beneficial for handling longer content like interacting with PDFs without an external vector database. Training data: up to Sep 2021.
GPT-4 32k (older v0314)
GPT-4-32k is an extended version of GPT-4, with the same capabilities but quadrupled context length, allowing for processing up to 40 pages of text in a single pass. This is particularly beneficial for handling longer content like interacting with PDFs without an external vector database. Training data: up to Sep 2021.
Gemini Pro 1.0
Google's flagship text generation model. Designed to handle natural language tasks, multiturn text and code chat, and code generation. See the benchmarks and prompting guidelines from [Deepmind](https://deepmind.google/technologies/gemini/). Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).
Gemini Pro Vision 1.0
Google's flagship multimodal model, supporting image and video in text or chat prompts for a text or code response. See the benchmarks and prompting guidelines from [Deepmind](https://deepmind.google/technologies/gemini/). Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). #multimodal
Vision
Gemma2 27B Instruct
Gemma2 is a versatile tool used in both machine learning and genetic research. It is part of the PaliGemma family, which includes powerful Vision-Language Models (VLMs) built on open components like the SigLIP vision model and the Gemma language model. In genetics, Gemma2 implements the Genome-wide Efficient Mixed-Model Association (GEMMA) for genome-wide association studies (GWAS) . It is also recognized in the open-source community for its efficiency in handling large models and datasets. Additionally, it provides an implementation of the GEMMA algorithm for statistical analysis of multivariate linear mixed models [4].
Open Source
Gemma2 9B Instruct
Gemma2 is a versatile tool used in both machine learning and genetic research. It is part of the PaliGemma family, which includes powerful Vision-Language Models (VLMs) built on open components like the SigLIP vision model and the Gemma language model. In genetics, Gemma2 implements the Genome-wide Efficient Mixed-Model Association (GEMMA) for genome-wide association studies (GWAS) . It is also recognized in the open-source community for its efficiency in handling large models and datasets. Additionally, it provides an implementation of the GEMMA algorithm for statistical analysis of multivariate linear mixed models [4].
Open Source
CodeLlama 34B Instruct
Code Llama is built upon Llama 2 and excels at filling in code, handling extensive input contexts, and folling programming instructions without prior training for various programming tasks.
Open Source
Llama v2 13B Chat
A 13 billion parameter language model from Meta, fine tuned for chat completions
Open Source
Llama v2 70B Chat
The flagship, 70 billion parameter language model from Meta, fine tuned for chat completions. Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety.
Open Source
LlamaGuard 2 8B
This safeguard model has 8B parameters and is based on the Llama 3 family. Just like is predecessor, [LlamaGuard 1](https://huggingface.co/meta-llama/LlamaGuard-7b), it can do both prompt and response classification. LlamaGuard 2 acts as a normal LLM would, generating text that indicates whether the given input/output is safe/unsafe. If deemed unsafe, it will also share the content categories violated. For best results, please use raw prompt input or the `/completions` endpoint, instead of the chat API. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
Open Source
Mistral Large
This is Mistral AI's closed-source, flagship model. It's powered by a closed-source prototype and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https://mistral.ai/news/mistral-large/). It is fluent in English, French, Spanish, German, and Italian, with high grammatical accuracy, and its 32K tokens context window allows precise information recall from large documents.
Open Source
Mistral Medium
This is Mistral AI's closed-source, medium-sided model. It's powered by a closed-source prototype and excels at reasoning, code, JSON, chat, and more. In benchmarks, it compares with many of the flagship models of other companies.
Open Source
Mistral Small
This model is currently powered by Mixtral-8X7B-v0.1, a sparse mixture of experts model with 12B active parameters. It has better reasoning, exhibits more capabilities, can produce and reason about code, and is multiligual, supporting English, French, German, Italian, and Spanish. #moe
Open Source
Mistral Tiny
This model is currently powered by Mistral-7B-v0.2, and incorporates a "better" fine-tuning than [Mistral 7B](/models/mistralai/mistral-7b-instruct), inspired by community work. It's best used for large batch processing tasks where cost is a significant factor but reasoning capabilities are not crucial.
Open Source
WizardLM-2 7B
WizardLM-2 7B is the smaller variant of Microsoft AI's latest Wizard model. It is the fastest and achieves comparable performance with existing 10x larger opensource leading models It is a finetune of [Mistral 7B Instruct](/models/mistralai/mistral-7b-instruct), using the same technique as [WizardLM-2 8x22B](/models/microsoft/wizardlm-2-8x22b). To read more about the model release, [click here](https://wizardlm.github.io/WizardLM2/). #moe
Open Source
WizardLM-2 8x22B
WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models. It is an instruct finetune of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). To read more about the model release, [click here](https://wizardlm.github.io/WizardLM2/). #moe
Open Source
Meta: CodeLlama 70B Instruct
Code Llama is a family of large language models for code. This one is based on [Llama 2 70B](/models/meta-llama/llama-2-70b-chat) and provides zero-shot instruction-following ability for programming tasks.
Cohere: Command
Command is an instruction-following conversational model that performs language tasks with high quality, more reliably and with a longer context than our base generative models. Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
Open Source
Cohere: Command R
Command-R is a 35B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents. Read the launch post [here](https://txt.cohere.com/command-r/). Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
Open Source
Cohere: Command R+
Command R+ is a new, 104B-parameter LLM from Cohere. It's useful for roleplay, general consumer usecases, and Retrieval Augmented Generation (RAG). It offers multilingual support for ten key languages to facilitate global business operations. See benchmarks and the launch post [here](https://txt.cohere.com/command-r-plus-microsoft-azure/). Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
Yi 34B (base)
The Yi series models are large language models trained from scratch by developers at [01.AI](https://01.ai/).
Open Source
Yi 34B Chat
The Yi series models are large language models trained from scratch by developers at [01.AI](https://01.ai/). This version is instruct-tuned to work better for chat.
Open Source
Yi 6B (base)
The Yi series models are large language models trained from scratch by developers at [01.AI](https://01.ai/).
Open Source
Qwen 1.5 110B Chat
Qwen1.5 110B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
Open Source
Qwen 1.5 14B Chat
Qwen1.5 14B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
Open Source
Qwen 1.5 32B Chat
Qwen1.5 32B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
Open Source
Qwen 1.5 4B Chat
Qwen1.5 4B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
Open Source
Qwen 1.5 72B Chat
Qwen1.5 72B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
Open Source
Qwen 1.5 7B Chat
Qwen1.5 7B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
Open Source
Databricks: DBRX 132B Instruct
DBRX is a new open source large language model developed by Databricks. At 132B, it outperforms existing open source LLMs like Llama 2 70B and [Mixtral-8x7b](/models/mistralai/mixtral-8x7b) on standard industry benchmarks for language understanding, programming, math, and logic. It uses a fine-grained mixture-of-experts (MoE) architecture. 36B parameters are active on any input. It was pre-trained on 12T tokens of text and code data. Compared to other open MoE models like Mixtral-8x7B and Grok-1, DBRX is fine-grained, meaning it uses a larger number of smaller experts. See the launch announcement and benchmark results [here](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm). #moe
Open Source
FireLLaVA 13B
A blazing fast vision-language model, FireLLaVA quickly understands both text and images. It achieves impressive chat skills in tests, and was designed to mimic multimodal GPT-4. The first commercially permissive open source LLaVA model, trained entirely on open source LLM generated instruction following data.
OpenChat 3.5
OpenChat is a library of open-source language models, fine-tuned with "C-RLFT (Conditioned Reinforcement Learning Fine-Tuning)" - a strategy inspired by offline reinforcement learning. It has been trained on mixed-quality data without preference labels.
Open Source
Perplexity: Llama3 Sonar 70B
Llama3 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance. This is a normal offline LLM, but the [online version](/models/perplexity/llama-3-sonar-large-32k-online) of this model has Internet access.
Perplexity: Llama3 Sonar 70B Online
Llama3 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance. This is the online version of the [offline chat model](/models/perplexity/llama-3-sonar-large-32k-chat). It is focused on delivering helpful, up-to-date, and factual responses. #online
Perplexity: Llama3 Sonar 8B
Llama3 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance. This is a normal offline LLM, but the [online version](/models/perplexity/llama-3-sonar-small-32k-online) of this model has Internet access.
Perplexity: Llama3 Sonar 8B Online
Llama3 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance. This is the online version of the [offline chat model](/models/perplexity/llama-3-sonar-small-32k-chat). It is focused on delivering helpful, up-to-date, and factual responses. #online
Phind: CodeLlama 34B v2
A fine-tune of CodeLlama-34B on an internal dataset that helps it exceed GPT-4 on some benchmarks, including HumanEval.
Open Source
Snowflake: Arctic Instruct
Arctic is a dense-MoE Hybrid transformer architecture pre-trained from scratch by the Snowflake AI Research Team. Arctic combines a 10B dense transformer model with a residual 128x3.66B MoE MLP resulting in 480B total and 17B active parameters chosen using a top-2 gating. To read more about this model's release, [click here](https://www.snowflake.com/blog/arctic-open-efficient-foundation-language-models-snowflake/).
Phi-3 Medium Instruct
Phi-3 Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, it excels in tasks involving common sense, mathematics, logical reasoning, and code processing.
Open Source
Mistral-7B-Instruct-v0.3
The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3. Mistral-7B-v0.3 has the following changes compared to Mistral-7B-v0.2 Extended vocabulary to 32768 Supports v3 Tokenizer Supports function calling
Open Source
Qwen 1.5 1.8B Chat
Qwen1.5 1.8B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
Qwen 1.5 110B
Qwen1.5 110B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
Open Source
Qwen 1.5 14B
Qwen1.5 14B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
Open Source
Qwen 1.5 32B
Qwen1.5 32B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
Open Source
Qwen 1.5 4B
Qwen1.5 4B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
Open Source
Qwen 1.5 72B
Qwen1.5 72B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
Open Source
Qwen 1.5 7B
Qwen1.5 7B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
Open Source
Qwen 1.5 1.8B
Qwen1.5 1.8B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
Mistral-7B-Instruct-v0.1
The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3. Mistral-7B-v0.3 has the following changes compared to Mistral-7B-v0.2 Extended vocabulary to 32768 Supports v3 Tokenizer Supports function calling
Open Source
Mistral-7B-Instruct-v0.2
The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3. Mistral-7B-v0.3 has the following changes compared to Mistral-7B-v0.2 Extended vocabulary to 32768 Supports v3 Tokenizer Supports function calling
Open Source
Mistral-7B-Instruct-v0.3
The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3. Mistral-7B-v0.3 has the following changes compared to Mistral-7B-v0.2 Extended vocabulary to 32768 Supports v3 Tokenizer Supports function calling
Open Source
Mistral-7B-v0.1
The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3. Mistral-7B-v0.3 has the following changes compared to Mistral-7B-v0.2 Extended vocabulary to 32768 Supports v3 Tokenizer Supports function calling
Open Source
Mistral: Mixtral 8x22B Instruct v0.1
Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding, and reasoning - large context length (64k) - fluency in English, French, Italian, German, and Spanish See benchmarks on the launch announcement [here](https://mistral.ai/news/mixtral-8x22b/). #moe
Open Source
Mixtral 8x7B Instruct v0.1
A pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters. Instruct model fine-tuned by Mistral. #moe
Open Source
Llama v2 7B Chat
The flagship, 70 billion parameter language model from Meta, fine tuned for chat completions. Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety.
Open Source
Dolphin
This model is based on Mixtral-8x7b The base model has 32k context, I finetuned it with 16k. This Dolphin is really good at coding, I trained with a lot of coding data. It is very obedient but it is not DPO tuned - so you still might need to encourage it in the system prompt as I show in the below examples.
Open Source
Gemma 7B Instruct
Gemma by Google is an advanced, open-source language model family, leveraging the latest in decoder-only, text-to-text technology. It offers English language capabilities across text generation tasks like question answering, summarization, and reasoning. The Gemma 7B variant is comparable in performance to leading open source models. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
Open Source
Gemma2 27B
Gemma2 is a versatile tool used in both machine learning and genetic research. It is part of the PaliGemma family, which includes powerful Vision-Language Models (VLMs) built on open components like the SigLIP vision model and the Gemma language model. In genetics, Gemma2 implements the Genome-wide Efficient Mixed-Model Association (GEMMA) for genome-wide association studies (GWAS) . It is also recognized in the open-source community for its efficiency in handling large models and datasets. Additionally, it provides an implementation of the GEMMA algorithm for statistical analysis of multivariate linear mixed models [4].
Open Source
Codestral Mamba
Codestral Mamba is a newly released language model specialized in code generation, developed by Mistral AI. It boasts linear time inference, allowing it to efficiently handle sequences of infinite length, making it ideal for code productivity tasks. The model was trained with advanced code and reasoning capabilities, performing comparably to state-of-the-art transformer models. It supports extensive in-context retrieval up to 256k tokens. Codestral Mamba is freely available under the Apache 2.0 license and can be deployed via the mistral-inference SDK or TensorRT-LLM. For more details, visit the [original article](https://mistral.ai/news/codestral-mamba/).
Open Source
Qwen 2 7B Chat
Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the instruction-tuned 72B Qwen2 model. Compared with the state-of-the-art opensource language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most opensource models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc. Qwen2-72B-Instruct supports a context length of up to 131,072 tokens, enabling the processing of extensive inputs.
Open Source
Gemma 9B
Gemma by Google is an advanced, open-source language model family, leveraging the latest in decoder-only, text-to-text technology. It offers English language capabilities across text generation tasks like question answering, summarization, and reasoning. The Gemma 7B variant is comparable in performance to leading open source models. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
Open Source
Mistral Nemo
Mistral AI and NVIDIA have collaborated to develop Mistral NeMo, a new 12B language model that represents a significant advancement in AI technology. This model boasts a large context window of up to 128k tokens and delivers state-of-the-art performance in reasoning, world knowledge, and coding accuracy for its size category. Mistral NeMo utilizes a standard architecture, making it easily adaptable and a straightforward replacement for systems currently using Mistral 7B. In a move to promote widespread adoption, both pre-trained base and instruction-tuned checkpoints have been released under the Apache 2.0 license.
Open Source
Mistral Large 2
Mistral AI's latest offering, Mistral Large 2, represents a significant advancement in language model technology. With 123 billion parameters and a 128k context window, it supports numerous languages and coding languages. The model sets a new benchmark in performance-to-cost ratio, achieving 84.0% accuracy on MMLU. It excels in code generation, reasoning, and multilingual tasks, competing with top-tier models like GPT-4 and Claude 3 Opus. Key improvements include enhanced instruction-following, reduced hallucination, and better handling of multi-turn conversations. The model's multilingual proficiency and advanced function calling capabilities make it particularly suitable for diverse business applications. Mistral Large 2 is designed for single-node inference and long-context applications, balancing performance with practical usability.
Open Source
ShieldGemma 2B
ShieldGemma is a series of safety content moderation models built upon Gemma 2 that target four harm categories (sexually explicit, dangerous content, hate, and harassment). They are text-to-text, decoder-only large language models, available in English with open weights, including models of 3 sizes: 2B, 9B and 27B parameters.
Open Source
ShieldGemma 9B
ShieldGemma is a series of safety content moderation models built upon Gemma 2 that target four harm categories (sexually explicit, dangerous content, hate, and harassment). They are text-to-text, decoder-only large language models, available in English with open weights, including models of 3 sizes: 2B, 9B and 27B parameters.
Open Source
ShieldGemma 27B
ShieldGemma is a series of safety content moderation models built upon Gemma 2 that target four harm categories (sexually explicit, dangerous content, hate, and harassment). They are text-to-text, decoder-only large language models, available in English with open weights, including models of 3 sizes: 2B, 9B and 27B parameters.
Open Source
FLUX.1 Dev
FLUX.1 [dev] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. Key Features 1. Cutting-edge output quality, second only to our state-of-the-art model FLUX.1 [pro]. 2. Competitive prompt following, matching the performance of closed source alternatives . Trained using guidance distillation, making FLUX.1 [dev] more efficient. 3. Open weights to drive new scientific research, and empower artists to develop innovative workflows. 4. Generated outputs can be used for personal, scientific, and commercial purposes as described in the [flux-1-dev-non-commercial-license](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md).
FLUX.1 Schnell
FLUX.1 [schnell] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. Key Features: 1. Cutting-edge output quality and competitive prompt following, matching the performance of closed source alternatives. 2. Trained using latent adversarial diffusion distillation, FLUX.1 [schnell] can generate high-quality images in only 1 to 4 steps. 4. Released under the apache-2.0 licence, the model can be used for personal, scientific, and commercial purposes.
FLUX.1 Pro
FLUX.1 [pro] is the best of FLUX.1, offering state-of-the-art performance image generation with top of the line prompt following, visual quality, image detail and output diversity. All FLUX.1 model variants support a diverse range of aspect ratios and resolutions in 0.1 and 2.0 megapixels
Dall-E 3
DALL·E 3 understands significantly more nuance and detail than our previous systems, allowing you to easily translate your ideas into exceptionally accurate images.
O1 Perview 2024-09-12
The OpenAI o1 Preview models are designed to spend more time thinking before responding, improving their ability to reason through complex tasks in science, coding, and math. The first model of this series is now available in ChatGPT and the API, with regular updates expected.
O1 Mini 2024-09-12
The OpenAI o1-mini is a newly released smaller version of the o1 model, designed to optimize reasoning tasks, particularly in coding. It provides advanced reasoning capabilities similar to its larger counterpart, making it well-suited for generating and debugging complex code. However, it is 80% cheaper and faster, making it a cost-effective solution for developers who need reasoning power but don’t require broad world knowledge.
Gemini Flash 1.5 0827 (experiment)
Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots. Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. On most common tasks, Flash achieves comparable quality to other Gemini Pro models at a significantly reduced cost. Flash is well-suited for applications like chat assistants and on-demand content generation where speed and scale matter.
Vision
Gemini Pro 1.5 0827 (experiment)
Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including: - Code generation - Text generation - Text editing - Problem solving - Recommendations - Information extraction - Data extraction or generation - AI agents Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). *Note: Preview models are offered for testing purposes and should not be used in production apps. This model is **heavily rate limited**.*
Vision
Gemini Flash 1.5 0827 (experiment)
Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots. Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. On most common tasks, Flash achieves comparable quality to other Gemini Pro models at a significantly reduced cost. Flash is well-suited for applications like chat assistants and on-demand content generation where speed and scale matter.
Vision
Gemini Pro 1.5 0827 (experiment)
Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including: - Code generation - Text generation - Text editing - Problem solving - Recommendations - Information extraction - Data extraction or generation - AI agents Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). *Note: Preview models are offered for testing purposes and should not be used in production apps. This model is **heavily rate limited**.*
Vision