Oct 20

Job description

Mission

To reinforce our technical prowess, we are looking to grow our operations team. If you’re looking for an exciting, high-growth opportunity with an award-winning, cutting-edge company, this could be just the job for you

For its PaaS solution https://platform.sh is looking for an Operations and Service Reliability Engineer with a taste for Python and Go, great Linux system understanding, and a real hunger for the challenges of building robust, distributed systems.

Platform.sh is a PaaS shrouded in a lot of black magic (we can consistently clone a whole running cluster, with its state, databases, indexes in a matter of seconds). We want to get this down to the hundreds of milliseconds domain. Interested? There is more...

Our external API is pure Hypermedia REST + oAuth on top of Pyramid. It mechanizes the Git layer and needs more features.

We can consistently generate from the same manifest a Docker container, an LXC one, or VM disk images (AWS, Azure, OpenStack), we want more targets.

We probably have the highest industry container density. We need to get it higher.

We support any Python, Ruby, NodeJS or PHP, Java and .NET, time to roll-out Elixir, of course, Elixir (and Rust. We need Rust).

We need to have more auto-healing on the high-availability clusters. We need more performance out of our multi-protocol ssh proxy. We need work on our Ceph Implementation. We need to get the Debian package generation streamlined and faster. We need… great ideas on how to make Platform.sh even better.

Directly reporting to our VP of Infrastructure and in close interaction with our Engineering and Customer Support teams, you will be responsible for:

Note: We don't like stress, so we build everything to be robust and resiliant, but stuff does break. This is a role with on-call duties. If page-duty fills you with dread... well, this might not be a fit.