Sketch is where great design begins for designers and their teams around the world. Would you like to join us and help take the infrastructure that supports us to the next level? We're looking to expand our team with full-time Site Reliability Engineers.
As a Site Reliability Engineer at Sketch, you will focus on shaping our cloud infrastructure and make sure all the pieces work well together: development environments, metrics processing and observability, security policies, network design, deployment strategies, high availability, etc.
You will work closely with backend, frontend and Mac developers to guarantee platform stability, and actively participate in the architecture and design of new projects.
At Sketch, we work with a unique technology blend: A deeply interconnected platform consisting of a Linux-based cloud platform and our award-winning macOS application.
Our cloud stack backend is based on containerised services and serverless, built on Elixir and Go, and exposing GraphQL and REST APIs, with most pieces deployed on AWS and automated through Terraform. Our backend services persist data in PostgreSQL databases and other minor services.
We use Chef for configuration management in the rare occasions when we need to configure instances for non-cloud services, and Python or Go for small programs or scripts, (e.g. to migrate data, run recurring jobs or automate operations).
Our monitoring, metrics and alerting stack includes ELK and Datadog, and we use CircleCI and Github Actions for CI/CD and testing.
Due to our unique technology blend, you will find plenty of interesting challenges when working as an SRE at Sketch, such as:
Managing and helping autoscale our Renderfarm that currently processes more than 100k documents daily, using M1 Mac Minis.
Helping to design, maintain and battle test our real-time collaboration features, making sure we offer an incredible experience for our customers by extracting each ounce of performance from all layers of the stack: from HTTP requests, caching and WebSockets, to backend autoscaling and database performance.
Improving our continuous deployment pipeline by designing and setting up fully automated ephemeral test environments comprising all the different application and cloud pieces.
Setting up, debugging and owning enterprise-oriented features such as Single-Sign-On and full-featured Sketch document embedding.
Working towards achieving full platform observability through curated metrics and actionable alerts, using the best open-source tools for the job.
You care about security, code quality, scalability, performance, and simplicity. Above all, you seek operational excellence and apply the best engineering practices possible. Not everything that you or your team do can be perfect, but you make sure that you always know the trade-offs. You back your decisions with arguments. You don't care for hype and always try to find the best solution and technology for the job and its context.
Sketch is a 100% remote company, and your colleagues are distributed around the globe. Being remote adds great flexibility, and helps us build a more diverse team. We put respect for each other above everything else.
Besides being remote we work asynchronously as often as we can. This means that our team communicates mostly using Slack and GitHub. When we need it, we also have video calls.
Our Technology team has more than 50 people today, split between Mac, Backend, Frontend, Infrastructure and QA. In particular, the Infrastructure team has 5 members. We work in multidisciplinary squads: people from different roles, including members of the Product team, work together on solving problems and delivering functionality for our customers.
Professional experience managing Linux-based and cloud-native distributed systems
Experience coding with high-level programming languages like Python for technical operations tasks and services automation
Experience with Infrastructure as Code tools such as Terraform, and configuration management tools to automate manual operations
A good understanding of the HTTP protocol and the behavior of production web services
Excellent communication skills and a good written and spoken English
Based in EU or UK
Full time employment, with a flexible schedule
As many vacation days as you need
Whatever training you need to develop in your job
Private healthcare and gym reimbursements
The laptop you need
Paid family leave
An annual company meetup
Even if you're not able to tick all of these boxes, we would still love to hear from you.