Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I agree with this.

Fwiw many of the improvements in Deepseek were already in other 'can run on your personal computer' AI's such as Meta's Llama. Deepseek is actually very similar to Llama in efficiency. People were already running that on home computers with M3's.

A couple of examples; Meta's multi-token prediction was specifically implemented as a huge efficiency improvement that was taken up by Deepseek. REcurrent ADaption (READ) was another big win by Meta that Deepseek utilized. Multi-head Latent Attention is another technique, not pioneered by Meta but used by both Deepseek and Llama.

Anyway Deepseek isn't some independent revolution out of nowhere. It's actually very very similar to the existing state of the art and just bundles a whole lot of efficiency gains in one model. There's no secret sauce here. It's much better than what openAI has but that's because openAI seem to have forgotten 'The Bitter Lesson'. They have been going at things in an extremely brute force way.

Anyway why do i point out that Deepseek is very similar to something like Llama? Because Meta's spending 100's of billions on chips to run it. It's pretty damn efficient, especially compared to openAI but they are still spending billions on datacenter build-outs.



> openAI seem to have forgotten 'The Bitter Lesson'. They have been going at things in an extremely brute force way.

Isn't the point of 'The Bitter Lesson' precisely that in the end, brute force wins, and hand-crafted optimizations like the ones you mention llama and deepseek use are bound to lose in the end?


Imho the tldr is that the wins are always from 'scaling search and learning'.

Any customisations that aren't related to the above are destined to be overtaken by someone that can improve the scaling of compute. OpenAI do not seem to be doing as much to improve the scaling of the compute in software terms (they are doing a lot in hardware terms admitedly). They have models at the top of the charts for various benchmarks right now but it feels like a temporary win from chasing those benchmarks outside of the focus of scaling compute.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: