I have to wonder why the authors skipped the potential solution of removing cont...

commonsearch · on Dec 26, 2021

Its not that it was made complicated. Its trading one type of complexity for another. I think your under estimating the costs of having a one off team run their own service and hardware. There is also an opportunity cost for those people wasting time running hardware and the support teams involved for a unique service. They could save some couple millions of dollars or they could work on projects that enable much more growth. Twitter has $1.2b in revenue in a quarter.

marcinzm · on Dec 26, 2021

Isn't the problem then that each host would be underutilized on average by a lot? It has X cpus and the service can never use more than X cpus. If a service has any spiky loads then it'd been overprovisioned cpu to handle them at good latency.

That seems significantly more expensive at scale.

aetimmes · on Dec 27, 2021

Because then you have a snowflake service with a non-standard environment and still haven't solved the problem for all the other services that are still on Mesos.

toast0 · on Dec 26, 2021

I suspect it's the temptation of oversubscription. If service A and service B each use 50% of a server, it's so tempting to put them both on one server to maximize efficiency. Even if sometimes you need 4 servers running A and B to serve the load that can be managed with one server each of A and B.

Or if you've broken things up into small pieces that aren't big enough to use a whole server, that can feel inefficient as well.

xorcist · on Dec 26, 2021

> that would add to ops costs for this service,

Wouldn't fewer moving parts mean lower operational costs?

Kalium · on Dec 26, 2021

Only to the extent that cost is a function of complexity. This isn't always the case. In a case like this, going to bare metal likely brings with it significant drawbacks in organizational complexity, orchestrational complexity, and more while allowing for much better utilization of memory and cpu resources.

Telling someone whose car is making some funny noises that it's simpler to go back to horse-and-buggy times would both increase costs and decrease the number of user-servicable moving parts. There's some significant overhead attached.

xorcist · on Dec 26, 2021

Bare metal has nothing to do with this. It isn't even touched upon in the article. It discusses a scheduler, and the parent post suggests exempting these kind of jobs from the scheduler in question, which they obviously aren't a very good product fit for.

Should you wish to really stretch that car analogy, maybe a bit more appropriate than a horse would be: If you aren't happy with your travel agency aren't booking your taxi trips in time, try booking with the taxi company directly.

mabbo · on Dec 26, 2021

Yes and no.

It would lower the operations costs of hardware, hopefully (that's the entire goal of this article) but you'd need more people resources to manage it, I would guess. Mesos and containers automate a lot of thinking work.

Kalium · on Dec 26, 2021

Once you move to hosts dedicated to specific services, as seems to be the suggestion here, you also might increase the overall hardware cost across your set of services. The cost per some of the services might decrease, though.