1 Distillation with Reasoning: can DeepSeek R1 Teach Better Than Humans?
Abigail Waugh edited this page 2025-02-10 02:29:09 +07:00


Inclusion of reasoning "chains of idea" (CoT) in the model output substantially enhances its quality, ai-db.science however it increases reasoning cost.

  1. A human expert's chain of thought.
  2. The final response.

    We expanded this dataset by adding:

    Synthetic R1 reasoning, i.e., the CoT created by DeepSeek R1.

    Then, we fine-tuned 3 variants of the model (utilizing LoRA on llama-3.1 -8 B-instruct), each with different training targets:

    Direct Answer Only: Generate the last answer without revealing thinking. Human Expert CoT: Generate the final response along with a reasoning chain resembling the human specialist's. Synthetic R1 CoT: Generate the final response together with DeepSeek R1's artificial reasoning chain. The table below summarizes average precision and reasoning length:

    - Note: The accuracy for the 5-shot baseline might vary from numbers reported elsewhere due to various assessment setups. The essential focus is on comparing relative efficiency throughout distillation approaches, not on beating other models.

    From this study, artificial reasoning CoTs from DeepSeek R1 appear remarkable to human-expert CoTs in increasing performance, albeit with a greater inference cost due to their longer length.

    Fireworks AI Inference and Fine-Tuning Platform

    DeepSeek R1 is available on the Fireworks AI platform. An user-friendly distillation user interface will soon be part of FireOptimizer. If you need earlier gain access to, please get in touch to explore alternatives.

    Conclusions

    By including reasoning-based information through distillation, organizations can dramatically enhance design performance without bearing the complete problem of human-annotated datasets. DeepSeek R1's capability to produce long, high-quality reasoning chains makes it an effective teacher model-showing that, in some cases, the machine may simply out-teach the human.