DeepSeek Reasoner(r1)

The first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1, were introduced to advance reasoning capabilities. DeepSeek-R1-Zero, developed using large-scale reinforcement learning (RL) without prior supervised fine-tuning (SFT), displayed impressive reasoning performance. Through RL, it naturally acquired a range of powerful and intriguing reasoning behaviors. However, DeepSeek-R1-Zero faced challenges such as repetitive outputs, poor readability, and language mixing. To address these limitations and further improve reasoning capabilities, DeepSeek-R1 was developed, incorporating cold-start data before RL. DeepSeek-R1 demonstrated performance on par with OpenAI-o1 across tasks involving mathematics, coding, and reasoning. To foster progress within the research community, DeepSeek-R1-Zero, DeepSeek-R1, and six distilled dense models based on Llama and Qwen were open-sourced. Among them, DeepSeek-R1-Distill-Qwen-32B surpassed OpenAI-o1-mini on various benchmarks, setting new performance standards for dense models.

Provider	Input Token Price	Output Token Price
azure	$2.19/Million Tokens	$2.19/Million Tokens
Deepseek	$0.14/Million Tokens	$0.55/Million Tokens
TogetherAI	$7.00/Million Tokens	$7/Million Tokens