.

And by a prudent flight and cunning save A life which valour could not, from the grave. A better buckler I can soon regain, But who can get another life again? Archilochus

Monday, January 27, 2025

On deepseek R1 - We So Meta!

"Necessity is the mother of invention"

Key Concepts from the video above

chain of thought reasoning, reinforcement learning, and model distillation

Distillation:

The third important technique that the deepseek researchers used with their R1 model is model distillation. And the idea here is that the actual deepseek model is 671 billion parameters. And to run this you pretty much need a couple thousand GPU at least, as well as a pretty expensive computer to actually run the full model. So to make it more accessible, what they do is they take the larger LLM and then they use it to teach a smaller LLM how it reasons and how it answers questions, so that the smaller LLM can actually perform on the same level as the bigger LLM but at a magnitude of a smaller parameter size like 7 billion parameters. And in the paper "deepseek" the deepseek researchers distilled from their deepseek model into llama 3, as well as quen. And the idea here is that the teacher uses again Chain of Thought reasoning in order to generate examples or generate a lot of examples of it answering questions, and then those examples it just gives directly to the student as as part of the prompt. And the student is supposed to answer the questions in the similar accuracy as the larger model. And this makes the whole LLM ecosystem much more accessible for people who don't have as much resources. And the key Insight is that in this paper they found that the student model during reinforcement learning training actually outperforms the teacher model just by a little bit, but it's doing so again at a small fraction of the memory and storage required to use it. And in the experiment from the paper the researchers actually found that these smaller distilled models from deepseek, as I said, outperform larger models like GPT 40 and Cloud 3.5 Sonet in these math coding and scientific reasoning tasks.

A meta-analysis is a statistical method that combines data from multiple studies to draw conclusions about a common research question. The results of a meta-analysis are often stronger than the results of any individual study.

So what's a meta-AI model?  Harvesting the output of a "community" of AI Agents? 

No comments: