Senior Site Reliability Engineer | Orkes · Teeming.ai
Orkes
Scale your distributed applications, modernize your workflows for durability, and protect against software failures and downtimes with Orkes, the leading orchestration platform for developers.
Orkes…
Scale your distributed applications, modernize your workflows for durability, and protect against software failures and downtimes with Orkes, the leading orchestration platform for developers.
Orkes…
What they do: Managed orchestration platform (Conductor-compatible) for workflows and microservices with AI/agentic features
Founded by: Creators of Netflix Conductor (Jeu George, Viren Baraiya, Boney Sekh, Dilip Lukose)
Funding: At least $29.3M (Seed $9.3M; Series A $20M)
Employees: Approximately 74
Company Overview
Problem Domain
Workflow and microservices orchestration; durability and observability of distributed workflows; AI/agentic workflow orchestration.
Industry
Software Development
Funding Track Record
Seed- 2022-02-28
9300000
Announced exit from stealth with $9.3M
Series A- 2024-02-21
20000000
Series A with participation from Battery Ventures and Vertex Ventures U.S.
Investor Signal
“Backed by institutional venture investors including Nexus Venture Partners, Battery Ventures, and Vertex Ventures U.S.”
Founders
What we do
Join the Team
Senior Site Reliability Engineer
RemoteRemote (US), US
Remote • Remote (US), US
Related Companies
Company
HQ
Industry
Total Funding
Temporal Technologies
🇺🇸US
Information TechnologyInternet ServicesSoftware
$706M
LiveKit
🌍Remote
Data and AnalyticsDeepTechHardwareInformation TechnologyInternet ServicesSoftware
-
Baseten
🇺🇸US
Software
$585M
Kestra
🇫🇷FR
—
-
Distyl
🇺🇸US
Data and AnalyticsDeepTechInformation TechnologySoftware
$202M
Who you are
5+ years of SRE/DevOps experience with production systems at scale
Capability to code in statically compiled language. Java preferably. By definition SRE is software engineer not sysadmin
Deep understanding of distributed systems, microservices, and cloud-native architecture
Experience with REST, gRPC, and asynchronous messaging systems (Kafka, RabbitMQ, etc.)
Strong problem-solving skills and a passion for learning new technologies
A collaborative mindset—we win as a team
Expert-level knowledge of at least one cloud provider (AWS, GCP, Azure) including their core services, networking, and security models and exposure to 2 remaining ones
Possessing expert-level knowledge of at least one cloud provider (AWS, GCP, or Azure) is essential, encompassing their core services, networking, and security models. Exposure to the remaining two providers is also required
Strong programming skills in Java, Go, or similar languages – you're a software engineer first
Deep understanding of containerization (Docker, Kubernetes) and orchestration across cloud environments and corresponding security hardening
Experience with Infrastructure as Code (Terraform, Pulumi) for multi-cloud deployments
Proficiency with monitoring tools (Prometheus, Grafana, DataDog, New Relic) and log aggregation systems
Solid grasp of networking concepts including load balancing, CDNs, DNS, and security best practices
Experience with service mesh technologies (Istio, Linkerd, Consul Connect)
Knowledge of database administration across cloud-native and traditional systems
Familiarity with chaos engineering and disaster recovery testing
Understanding of compliance frameworks (SOC2, PCI DSS, HIPAA) in multi-cloud environments
Experience with orchestration engines or workflow systems (e.g., Conductor, Camunda, Temporal)
Familiarity with Kubernetes and cloud-native environments (AWS/GCP/Azure)
What the job involves
Benefits
Fully Distributed: We believe in a remote-first, truly distributed workforce
Stock Options: Our compensation packages optionally includes stock options
Medical, Dental, Vision: Comprehensive health insurance, for you and your loved ones
Flexible PTO: Take time off to rest, relax and re-charge with flexible PTO
Flexible Work Environment: We offer flexible working hours and other arrangements to match your needs
Startup jobs. A lot of them.
Your next opportunity is in here somewhere. Sign up to explore 52,000+ startups and their open roles. No spam. No gamification. Just jobs.
52,000+
Startups
65,000+
Open Roles
1,400+
New This Week
Software Engineer
InternshipManchester, GB
Internship • Manchester, GB
AI Researcher
Part-timeBerlin, DE
Part-time • Berlin, DE
DevOps Engineer
Part-timeBerlin, DE
Part-time • Berlin, DE
Data Scientist
Full-timeUtrecht, NL
Full-time • Utrecht, NL
Frontend Developer
Full-timeNiš, RS
Full-time • Niš, RS
Technical Writer
Full-timeNew York, US
Full-time • New York, US
We’re hiring a Senior Site Reliability Engineer to help us evolve and scale our platform. If you’re passionate about clean, efficient architecture, love solving tough distributed systems challenges, and thrive in an environment where your ideas actually shape the product then we’d love to talk
We're building the next generation of resilient, scalable infrastructure that powers millions of users worldwide. Our platform operates across multiple cloud environments, ensuring 99.99% uptime while handling massive traffic spikes and complex distributed workloads
We believe that reliability isn't just about keeping the lights on—it's about creating systems so robust and well-designed that our engineering teams can innovate fearlessly, knowing their applications will scale seamlessly and recover gracefully from any failure
Write clean, maintainable code and champion engineering best practices for infrastructure automation
Design and implement multi-cloud infrastructure spanning AWS, GCP, Azure, and hybrid environments
Solve complex problems in distributed computing and event-driven systems, including designing high availability architectures
Develop sophisticated monitoring and alerting systems that provide deep visibility into distributed applications
Collaborate closely with product, design, and engineering teammates to ship high-impact features
Automate everything – from infrastructure provisioning to incident response and capacity planning
Lead incident response and conduct thorough post-mortems to continuously improve system reliability
Collaborate with engineering teams to embed reliability principles into application design from day one
Optimize costs across multi-cloud deployments while maintaining performance and reliability standards
Drive technical discussions and decisions that influence the future of the platform