Meta is in full panic last I heard. They have amassed a collection of pseudo exp...

popinman322 · on Jan 25, 2025

DeepSeek was built on the foundations of public research, a major part of which is the Llama family of models. Prior to Llama open weights LLMs were considerably less performant; without Llama we might not have gotten Mistral, Qwen, or DeepSeek. This isn't meant to diminish DeepSeek's contributions, however: they've been doing great work on mixture of experts models and really pushing the community forward on that front. And, obviously, they've achieved incredible performance.

Llama models are also still best in class for specific tasks that require local data processing. They also maintain positions in the top 25 of the lmarena leaderboard (for what that's worth these days with suspected gaming of the platform), which places them in competition with some of the best models in the world.

But, going back to my first point, Llama set the stage for almost all open weights models after. They spent millions on training runs whose artifacts will never see the light of day, testing theories that are too expensive for smaller players to contemplate exploring.

Pegging Llama as mediocre, or a waste of money (as implied elsewhere), feels incredibly myopic.

Philpax · on Jan 25, 2025

As far as I know, Llama's architecture has always been quite conservative: it has not changed that much since LLaMA. Most of their recent gains have been in post-training.

That's not to say their work is unimpressive or not worthy - as you say, they've facilitated much of the open-source ecosystem and have been an enabling factor for many - but it's more that that work has been in making it accessible, not necessarily pushing the frontier of what's actually possible, and DeepSeek has shown us what's possible when you do the latter.

wiz21c · on Jan 27, 2025

So at least Zuck had at least one good idea, useful for all of us !

lvl155 · on Jan 26, 2025

I never said Llama is mediocre. I said the teams they put together is full of people chasing money. And the billions Meta is burning is going straight to mediocrity. They’re bloated. And we know exactly why Meta is doing this and it’s not because they have some grand scheme to build up AI. It’s to keep these people away from their competition. Same with billions in GPU spend. They want to suck up resources away from competition. That’s their entire plan. Do you really think Zuck has any clue about AI? He was never serious and instead built wonky VR prototypes.

sangnoir · on Jan 26, 2025

> And we know exactly why Meta is doing this and it’s not because they have some grand scheme to build up AI. It’s to keep these people away from their competition

I don't see how you can confidently say this when AI researchers and engineers are remunerated very well across the board and people are moving across companies all the time, if the plan is as you described it, it is clearly not working.

Zuckerberg seems confident they'll have an AI-equivalent of a mid-level engineer later this year, can you imagine how much money Meta can save by replacing a fraction of its (well-paid) engineers with fixed Capex + electric bill?

wonnage · on Jan 26, 2025

this is the same magical thinking Uber had when they were gonna have self driving cars replace their drivers

yodsanklai · on Jan 26, 2025

> I said the teams they put together is full of people chasing money.

Does it mean they are mediocre? it's not like OpenAI or Anthropic pay their engineers peanuts. Competition is fierce to attract top talents.

oezi · on Jan 26, 2025

In contrast to the Social Media industry (or word processors or mobile phones), the market for AI solutions seems not to have of an inherent moat or network effects which keep the users stuck in the market leader.

Rather with AI, capitalism seems working at its best with competitors to OpenAI building solutions which take market share and improve products. Zuck can try monopoly plays all day, but I don't think this will work this time.

corimaith · on Jan 25, 2025

I guess all that leetcoding and stack ranking didn't in fact produce "the cream of the crop"...

HarHarVeryFunny · on Jan 25, 2025

There's an interesting tweet here from someone who used to work at DeepSeek, which describes their hiring process and culture. No mention of LeetCoding for sure!

https://x.com/wzihanw/status/1872826641518395587

whimsicalism · on Jan 25, 2025

they almost certainly ask coding/technical questions. the people doing this work are far beyond being gatekept by leetcode

leetcode is like HN’s “DEI” - something they want to blame everything on

slt2021 · on Jan 25, 2025

they recruit from top Computer Science programs, the top of the class MS and PhD students

dmix · on Jan 26, 2025

what is leetcode

whimsicalism · on Jan 26, 2025

a style of coding challenges asked in interviews for software engineers, generally focused on algorithmic thinking

angoragoats · on Jan 26, 2025

It’s also known for being not reflective of the actual work that most companies do, especially the companies that use it.

amarcheschi · on Jan 26, 2025

I've recently ended an internship for my bachelor at the Italian research Council where I had to deal with federated learning, and it was hard as well for my researchers supervisors. However, I sort of did a good job. I'm fairly sure I wouldn't be able to solve many leetcode exercises, since it's something that I've never had to deal with aside from university tasks... And I made a few side projects for myself as well

strictnein · on Jan 26, 2025

leetcode.com - If you interview at Meta, these are the questions they'll ask you

tempaccount420 · on Jan 26, 2025

Did you read the tweet? It doesn't sound that way to me. They hire specialized talent (note especially the "Know-It-All" part)

lvl155 · on Jan 25, 2025

Deepseek team is mostly quants from my understanding which explains why they were able to pull this off. Some of the best coders I’ve met have been quants.

slt2021 · on Jan 25, 2025

the real bloat is in managers, Sr. Managers, Directors, Sr. Directors, and VPs, not the engineers.

At least engineers have some code to show for, unlike managerial class...

omgwtfbyobbq · on Jan 25, 2025

It produces the cream of the leetcoding stack ranking crop.

brookst · on Jan 25, 2025

You get what you measure.

rockemsockem · on Jan 25, 2025

You sound extremely satisfied by that. I'm glad you found a way to validate your preconceived notions on this beautiful day. I hope your joy is enduring.

fngjdflmdflg · on Jan 26, 2025

>They have amassed a collection of pseudo experts there to collect their checks

LLaMA was huge, Byte Latent Transformer looks promising.. absolutely no idea were you got this idea from.

astrange · on Jan 26, 2025

The issue with Meta is that the LLaMA team doesn't incorporate any of the research the other teams produce.

ks2048 · on Jan 25, 2025

I would think Meta - who open source their model - would be less freaked out than those others that do not.

miohtama · on Jan 25, 2025

The criticism seems to mostly be that Meta maintains very expensive cost structure and fat organisation in the AI. While Meta can afford to do this, if smaller orgs can produce better results it means Meta is paying a lot for nothing. Meta shareholders now need to ask the question how many non-productive people Meta is employing and is Zuck in the control of the cost.

ks2048 · on Jan 25, 2025

That makes sense. I never could see the real benefit for Meta to pay a lot to produce these open source models (I know the typical arguments - attracting talent, goodwill, etc). I wonder how much is simply LeCun is interested in advancing the science and convinced Zuck this is good for company.

astrange · on Jan 26, 2025

LeCun doesn't run their AI team - he's not in LLaMA's management chain at all. He's just especially public.

HarHarVeryFunny · on Jan 26, 2025

Yep - Meta's FAIR (Facebook AI Research) and GenAI (LLaMA) groups are separate, and LeCun is part of FAIR. The head of FAIR is Joelle Pineau.

kevinventullo · on Jan 26, 2025

Meta’s AI org does a heck of a lot more than produce LLM’s. R&D on ads targeting and ranking more than pays for itself.

meiraleal · on Jan 25, 2025

It is great to see that this is the result of spending a lot in hardware while cutting costs in software development :) Well deserved.

jiggawatts · on Jan 25, 2025

They got momentarily leap-frogged, which is how competition is supposed to work!

hintymad · on Jan 25, 2025

What I don't understand is why Meta needs so many VPs and directors. Shouldn't the model R&D be organized holacratically? The key is to experiment as many ideas as possible anyway. Those who can't experiment or code should remain minimal in such a fast-pacing area.

bwfan123 · on Jan 26, 2025

bloated PyTorch general purpose tooling aimed at data-scientists now needs a rethink. Throwing more compute at the problem was never a solution to anything. The silo’ing of the cs and ml engineers resulted in bloating of the frameworks and tools, and inefficient use of hw.

Deepseek shows impressive e2e engineering from ground up and under constraints squeezing every ounce of the hardware and network performance.

amelius · on Jan 26, 2025

> I’ve yet to see anything of value in terms products out of Meta.

Quest, PyTorch?