Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have to wonder why the authors skipped the potential solution of removing containers and mesos from the equation entirely.

If you gave this service a dedicated, non-co-located fleet, running the JVM directly on the OS, and ran basic autoscaling of the number of hosts, you'd eliminate a huge number of the moving parts of the system that are causing these issues.

Yes, that would add to ops costs (edit: human ops costs) for this service, but when you're spending 8 figures per year in it, clearly the budget is available.

To quote the great philosopher Avril Lavigne: "Why'd you have to go and make things so complicated?"



Its not that it was made complicated. Its trading one type of complexity for another. I think your under estimating the costs of having a one off team run their own service and hardware. There is also an opportunity cost for those people wasting time running hardware and the support teams involved for a unique service. They could save some couple millions of dollars or they could work on projects that enable much more growth. Twitter has $1.2b in revenue in a quarter.


Isn't the problem then that each host would be underutilized on average by a lot? It has X cpus and the service can never use more than X cpus. If a service has any spiky loads then it'd been overprovisioned cpu to handle them at good latency.

That seems significantly more expensive at scale.


Because then you have a snowflake service with a non-standard environment and still haven't solved the problem for all the other services that are still on Mesos.


I suspect it's the temptation of oversubscription. If service A and service B each use 50% of a server, it's so tempting to put them both on one server to maximize efficiency. Even if sometimes you need 4 servers running A and B to serve the load that can be managed with one server each of A and B.

Or if you've broken things up into small pieces that aren't big enough to use a whole server, that can feel inefficient as well.


> that would add to ops costs for this service,

Wouldn't fewer moving parts mean lower operational costs?


Only to the extent that cost is a function of complexity. This isn't always the case. In a case like this, going to bare metal likely brings with it significant drawbacks in organizational complexity, orchestrational complexity, and more while allowing for much better utilization of memory and cpu resources.

Telling someone whose car is making some funny noises that it's simpler to go back to horse-and-buggy times would both increase costs and decrease the number of user-servicable moving parts. There's some significant overhead attached.


Bare metal has nothing to do with this. It isn't even touched upon in the article. It discusses a scheduler, and the parent post suggests exempting these kind of jobs from the scheduler in question, which they obviously aren't a very good product fit for.

Should you wish to really stretch that car analogy, maybe a bit more appropriate than a horse would be: If you aren't happy with your travel agency aren't booking your taxi trips in time, try booking with the taxi company directly.


Yes and no.

It would lower the operations costs of hardware, hopefully (that's the entire goal of this article) but you'd need more people resources to manage it, I would guess. Mesos and containers automate a lot of thinking work.


Once you move to hosts dedicated to specific services, as seems to be the suggestion here, you also might increase the overall hardware cost across your set of services. The cost per some of the services might decrease, though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: