Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Meta is in full panic last I heard. They have amassed a collection of pseudo experts there to collect their checks. Yet, Zuck wants to keep burning money on mediocrity. I’ve yet to see anything of value in terms products out of Meta.


DeepSeek was built on the foundations of public research, a major part of which is the Llama family of models. Prior to Llama open weights LLMs were considerably less performant; without Llama we might not have gotten Mistral, Qwen, or DeepSeek. This isn't meant to diminish DeepSeek's contributions, however: they've been doing great work on mixture of experts models and really pushing the community forward on that front. And, obviously, they've achieved incredible performance.

Llama models are also still best in class for specific tasks that require local data processing. They also maintain positions in the top 25 of the lmarena leaderboard (for what that's worth these days with suspected gaming of the platform), which places them in competition with some of the best models in the world.

But, going back to my first point, Llama set the stage for almost all open weights models after. They spent millions on training runs whose artifacts will never see the light of day, testing theories that are too expensive for smaller players to contemplate exploring.

Pegging Llama as mediocre, or a waste of money (as implied elsewhere), feels incredibly myopic.


As far as I know, Llama's architecture has always been quite conservative: it has not changed that much since LLaMA. Most of their recent gains have been in post-training.

That's not to say their work is unimpressive or not worthy - as you say, they've facilitated much of the open-source ecosystem and have been an enabling factor for many - but it's more that that work has been in making it accessible, not necessarily pushing the frontier of what's actually possible, and DeepSeek has shown us what's possible when you do the latter.


So at least Zuck had at least one good idea, useful for all of us !


I never said Llama is mediocre. I said the teams they put together is full of people chasing money. And the billions Meta is burning is going straight to mediocrity. They’re bloated. And we know exactly why Meta is doing this and it’s not because they have some grand scheme to build up AI. It’s to keep these people away from their competition. Same with billions in GPU spend. They want to suck up resources away from competition. That’s their entire plan. Do you really think Zuck has any clue about AI? He was never serious and instead built wonky VR prototypes.


> And we know exactly why Meta is doing this and it’s not because they have some grand scheme to build up AI. It’s to keep these people away from their competition

I don't see how you can confidently say this when AI researchers and engineers are remunerated very well across the board and people are moving across companies all the time, if the plan is as you described it, it is clearly not working.

Zuckerberg seems confident they'll have an AI-equivalent of a mid-level engineer later this year, can you imagine how much money Meta can save by replacing a fraction of its (well-paid) engineers with fixed Capex + electric bill?


this is the same magical thinking Uber had when they were gonna have self driving cars replace their drivers


> I said the teams they put together is full of people chasing money.

Does it mean they are mediocre? it's not like OpenAI or Anthropic pay their engineers peanuts. Competition is fierce to attract top talents.


In contrast to the Social Media industry (or word processors or mobile phones), the market for AI solutions seems not to have of an inherent moat or network effects which keep the users stuck in the market leader.

Rather with AI, capitalism seems working at its best with competitors to OpenAI building solutions which take market share and improve products. Zuck can try monopoly plays all day, but I don't think this will work this time.


I guess all that leetcoding and stack ranking didn't in fact produce "the cream of the crop"...


There's an interesting tweet here from someone who used to work at DeepSeek, which describes their hiring process and culture. No mention of LeetCoding for sure!

https://x.com/wzihanw/status/1872826641518395587


they almost certainly ask coding/technical questions. the people doing this work are far beyond being gatekept by leetcode

leetcode is like HN’s “DEI” - something they want to blame everything on


they recruit from top Computer Science programs, the top of the class MS and PhD students


what is leetcode


a style of coding challenges asked in interviews for software engineers, generally focused on algorithmic thinking


It’s also known for being not reflective of the actual work that most companies do, especially the companies that use it.


I've recently ended an internship for my bachelor at the Italian research Council where I had to deal with federated learning, and it was hard as well for my researchers supervisors. However, I sort of did a good job. I'm fairly sure I wouldn't be able to solve many leetcode exercises, since it's something that I've never had to deal with aside from university tasks... And I made a few side projects for myself as well


leetcode.com - If you interview at Meta, these are the questions they'll ask you


Did you read the tweet? It doesn't sound that way to me. They hire specialized talent (note especially the "Know-It-All" part)


Deepseek team is mostly quants from my understanding which explains why they were able to pull this off. Some of the best coders I’ve met have been quants.


the real bloat is in managers, Sr. Managers, Directors, Sr. Directors, and VPs, not the engineers.

At least engineers have some code to show for, unlike managerial class...


It produces the cream of the leetcoding stack ranking crop.


You get what you measure.


You sound extremely satisfied by that. I'm glad you found a way to validate your preconceived notions on this beautiful day. I hope your joy is enduring.


>They have amassed a collection of pseudo experts there to collect their checks

LLaMA was huge, Byte Latent Transformer looks promising.. absolutely no idea were you got this idea from.


The issue with Meta is that the LLaMA team doesn't incorporate any of the research the other teams produce.


I would think Meta - who open source their model - would be less freaked out than those others that do not.


The criticism seems to mostly be that Meta maintains very expensive cost structure and fat organisation in the AI. While Meta can afford to do this, if smaller orgs can produce better results it means Meta is paying a lot for nothing. Meta shareholders now need to ask the question how many non-productive people Meta is employing and is Zuck in the control of the cost.


That makes sense. I never could see the real benefit for Meta to pay a lot to produce these open source models (I know the typical arguments - attracting talent, goodwill, etc). I wonder how much is simply LeCun is interested in advancing the science and convinced Zuck this is good for company.


LeCun doesn't run their AI team - he's not in LLaMA's management chain at all. He's just especially public.


Yep - Meta's FAIR (Facebook AI Research) and GenAI (LLaMA) groups are separate, and LeCun is part of FAIR. The head of FAIR is Joelle Pineau.


Meta’s AI org does a heck of a lot more than produce LLM’s. R&D on ads targeting and ranking more than pays for itself.


It is great to see that this is the result of spending a lot in hardware while cutting costs in software development :) Well deserved.


They got momentarily leap-frogged, which is how competition is supposed to work!


What I don't understand is why Meta needs so many VPs and directors. Shouldn't the model R&D be organized holacratically? The key is to experiment as many ideas as possible anyway. Those who can't experiment or code should remain minimal in such a fast-pacing area.


bloated PyTorch general purpose tooling aimed at data-scientists now needs a rethink. Throwing more compute at the problem was never a solution to anything. The silo’ing of the cs and ml engineers resulted in bloating of the frameworks and tools, and inefficient use of hw.

Deepseek shows impressive e2e engineering from ground up and under constraints squeezing every ounce of the hardware and network performance.


> I’ve yet to see anything of value in terms products out of Meta.

Quest, PyTorch?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: