Site Reliability Engineer | Runloop AI · Teeming.ai
Runloop AI
Runloop AI is a platform designed to build, test, and scale AI-powered software engineering tools by providing secure, scalable development environments, advanced code understanding tools, and AI…
Runloop AI is a platform designed to build, test, and scale AI-powered software engineering tools by providing secure, scalable development environments, advanced code understanding tools, and AI…
What they do: Enterprise-grade platform to build, benchmark, and deploy AI-powered software-engineering agents (Devboxes, Axons, Blueprints, Snapshots, Benchmarks).
Founded / CEO: Jonathan Wall — founder and CEO.
Funding: $7.0M seed announced July 2025 (led by The General Partnership, participation from Blank Ventures).
Compliance & security: Positions platform for SOC2, GDPR, HIPAA compliance and VPC deployment for sandboxed execution of AI-generated code.
Team size (approx.): About 14 employees.
Company Overview
Problem Domain
Productionizing AI agents for software engineering—secure execution, reproducible benchmarking, and performance monitoring of AI-generated code.
Industry
Developer tooling / AI infrastructure
Funding Track Record
Seed- 2025-07-30
$7.0M
Round included participation from Blank Ventures.
Investor Signal
“Backed by early-stage investors (The General Partnership lead, Blank Ventures participating) indicating seed-stage VC support for infrastructure-focused AI startups.”
Founders
What we do
Join the Team
Site Reliability Engineer
RemoteUS
Remote • US
About Runloop
Runloop is building the foundational infrastructure for the next generation of AI development. We provide AI engineers and data scientists with lightning-fast, secure, and reproducible code sandboxes. Our platform enables teams to experiment, iterate, and deploy their projects without the friction of environment setup and dependencies. We are a small but mighty team dedicated to building a rock-solid platform that empowers innovation.
The Role
We're looking for a skilled and passionate Site Reliability Engineer to join our team. As an SRE, you'll be responsible for the reliability, observability, performance, and security of our core platform—the very foundation on which our users build their futures. You'll work closely with our engineering team to develop and maintain the systems that power our code sandboxes, ensuring a seamless and stable experience for our customers. This is a critical role that blends a deep understanding of operations with a software engineering mindset.
Responsibilities
Qualifications
Proven experience as an SRE, DevOps Engineer, or similar role.
Strong programming skills in languages like Python or Go.
Deep expertise in containerization technologies such as Docker and Kubernetes.
Experience with cloud infrastructure and tools like Terraform and/or Pulumi.
Familiarity with monitoring and alerting tools like Prometheus, Grafana, or Datadog.
Bonus Points
Experience with chaos engineering techniques, front-end observability tools (e.g., Sentry, RUM, synthetic monitoring), or building CI/CD pipelines for front-end delivery.
Benefits
Competitive salary and equity.
Comprehensive health, dental, and vision insurance for you and your dependents
Opportunity to work on cutting-edge AI technology and make a real impact on the future of software engineering.
Free lunch and snacks
Location:
In office 4 days a week in San Francisco, optional 1 day a week WFH
Join Us
If you're excited about shaping the future of AI-driven software engineering and empowering developers to build the next generation of coding tools, we want to hear from you. Join Runloop and be at the forefront of the AI revolution in software development.
Runloop is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability status, protected veteran status, sexual orientation, gender identity or any other characteristic protected by law.
Startup jobs. A lot of them.
Your next opportunity is in here somewhere. Sign up to explore 52,000+ startups and their open roles. No spam. No gamification. Just jobs.
52,000+
Startups
65,000+
Open Roles
1,300+
New This Week
Technical Writer
Part-timeNovi Sad, RS
Part-time • Novi Sad, RS
DevOps Engineer
Part-timeSan Francisco, US
Part-time • San Francisco, US
Software Engineer
ContractHaifa
Contract • Haifa
Data Scientist
Full-timeAustin, US
Full-time • Austin, US
Product Designer
Part-timeMunich, DE
Part-time • Munich, DE
AI Researcher
InternshipRotterdam, NL
Internship • Rotterdam, NL
Design and maintain our production infrastructure on cloud platforms like AWS, GCP, or Azure.
Monitor and respond to system alerts and incidents, ensuring high availability and a secure environment for our users' code using Grafana, Prometheus
Collaborate with developers to ensure new features and services are designed with scalability and reliability in mind.
Troubleshoot and resolve complex issues related to our infrastructure, networking, and the sandbox environment.
Participate in an on-call rotation to support our production systems.
Define and track SLIs/SLOs, manage error budgets, and proactively monitor distributed systems with logging and tracing.
Automate deployments, scaling, provisioning, and recovery tasks to reduce toil and build self-healing systems.
Lead incident response, conduct root-cause analysis, and facilitate blameless post-mortems to drive continual improvement.
Collaborate cross-functionally with product, engineering, and developer relations to ensure reliable releases and an outstanding developer experience.
Plan for capacity growth, forecast system usage, and contribute to safe release and change management processes.
Mentor and support front-end developers in building reliable distributed front-end systems (CDNs, caching, client-side observability).
A solid understanding of networking, security, and Linux systems administration.
Experience designing, scaling, and maintaining distributed systems (backend platforms, APIs, or front-end infrastructure).
Proficiency in implementing observability frameworks (metrics, logging, tracing) and aligning reliability goals with developer velocity. Hands-on experience managing incidents, running on-call operations, and producing actionable post-mortems.
Ability to mentor engineers and influence reliability practices across teams, especially for front-end infrastructure and performance.