DeepSeek Reasoner(r1)
The first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1, were introduced to advance reasoning capabilities. DeepSeek-R1-Zero, developed using large-scale reinforcement learning (RL) without prior supervised fine-tuning (SFT), displayed impressive reasoning performance. Through RL, it naturally acquired a range of powerful and intriguing reasoning behaviors. However, DeepSeek-R1-Zero faced challenges such as repetitive outputs, poor readability, and language mixing. To address these limitations and further improve reasoning capabilities, DeepSeek-R1 was developed, incorporating cold-start data before RL. DeepSeek-R1 demonstrated performance on par with OpenAI-o1 across tasks involving mathematics, coding, and reasoning. To foster progress within the research community, DeepSeek-R1-Zero, DeepSeek-R1, and six distilled dense models based on Llama and Qwen were open-sourced. Among them, DeepSeek-R1-Distill-Qwen-32B surpassed OpenAI-o1-mini on various benchmarks, setting new performance standards for dense models.
Community
Open Source
Context Window
128,000
Max Output Tokens
8,000