Senior Site Reliability Engineer | Parallel Domain · Teeming.ai
Parallel Domain
Parallel Domain provides synthetic sensor data and simulation tools to train, test, and validate perception systems for autonomous vehicles and robots. The platform programmatically generates labeled…
Parallel Domain provides synthetic sensor data and simulation tools to train, test, and validate perception systems for autonomous vehicles and robots. The platform programmatically generates labeled…
Headquarters / offices: San Francisco Bay Area and Vancouver, BC
Product: Synthetic sensor data and PD Replica digital-twin / simulation platform
Use cases: Training, testing, and validation of perception for vehicles, drones, robots, agriculture, warehouse, security
Latest known funding: Series B $30M (announced 2022-11-16)
Company Overview
Problem Domain
Synthetic data generation and simulation for machine perception (autonomous vehicles, drones, trucks, robots).
Founded
2017
Industry
Software Development
Tech Stack
API
SDK
Web tools
Funding Track Record
Series B- 2022-11-16
30000000
Series B announced at $30M with participation from return investors
Investor Signal
“Backed by venture and strategic investors including March Capital, Costanoa Ventures, Foundry Group, Calibrate Ventures, Ubiquity Ventures, and Toyota Ventures”
Founders
What we do
Join the Team
Senior Site Reliability Engineer
On-SiteRemote - Pacific Northwest Area, ES
On-Site • Remote - Pacific Northwest Area, ES
Related Companies
Company
HQ
Industry
Total Funding
SpAItial AI
🇬🇧London, GB
Information TechnologySoftware
$13M
Waabi
🇨🇦Toronto, CA
DeepTechTransportation
$283M
Black Forest Labs
🌍Remote
Data and AnalyticsDeepTechEducationInformation TechnologyMediaSoftware
-
Synergeticon GmbH
🇩🇪Hamburg, DE
Data and AnalyticsDeepTechInformation TechnologyManufacturingSoftware
-
Batch Robotics GmbH
🇩🇪Munich, DE
DeepTechManufacturing
-
We’re hiring a Senior Site Reliability Engineer to help build and operate that infrastructure. This role sits at the core of how we run large-scale, distributed simulation workloads for autonomous-systems testing and validation
You’ll work across multi-region AWS infrastructure, operate Kubernetes at scale, and contribute directly to reliability, security, and deployment systems that the rest of the engineering org depends on
This is a hands-on role with the broad ownership typical of a startup. You’ll partner closely with platform, simulation, and ML teams to keep the system running smoothly and evolving
We’re growing the team—two of these roles are open—and the work is substantive: multi-region GPU scheduling, Windows workloads on Kubernetes, large-scale batch simulation, and an enterprise product direction that will require rethinking parts of how we deploy and operate
Infrastructure ownership and cloud operations. Design, build, and maintain multi-region AWS infrastructure using Terraform. Operate and scale EKS clusters across production regions: autoscaling, node lifecycle, workload health. Manage networking across environments: VPC design, DNS, load balancing, and cross-region connectivity. Support infrastructure changes, migrations, and expansions into new regions. Contribute to and improve GitOps-based deployment workflows using GitHub Actions, Helm, and Kustomize
Reliability engineering and incident response. Help build and run incident management processes: severity definitions, escalation paths, on-call practices. Lead incident response, debugging, and root-cause analysis. Write postmortems and drive systemic reliability improvements from what they surface. Improve observability across metrics, logging, tracing, and dashboards. Support GPU and batch workloads running on Kubernetes
Security and access management. Provide security-conscious feedback on platform architecture decisions. Own cloud IAM governance: roles, policies, and access boundaries across accounts and services. Lead compliance-adjacent work including audit-readiness, partner certification requirements, and supporting responses to customer security questionnaires
Platform tooling and developer experience. Improve CI/CD pipelines and infrastructure validation. Support engineers with infrastructure debugging, environment setup, and performance issues. Contribute to tooling and automation in Python and Bash. Take on adjacent responsibilities as needed in a startup environment
Your next opportunity is in here somewhere. Sign up to explore 70,000+ startups and their open roles. No spam. No gamification. Just jobs.
70,000+
Startups
81,000+
Open Roles
4,500+
New This Week
Mobile Developer
Full-timeHaifa
Full-time • Haifa
DevOps Engineer
ContractHaifa
Contract • Haifa
Mobile Developer
InternshipHaifa
Internship • Haifa
AI Researcher
Full-timeManchester, GB
Full-time • Manchester, GB
DevOps Engineer
Part-timeBerlin, DE
Part-time • Berlin, DE
Backend Developer
ContractManchester, GB
Contract • Manchester, GB
Flexibility to work from our office in the San Francisco Bay Area or your home office
Competitive compensation
Employer-paid supplemental medical, mental health, dental, and vision benefits
401(k)
Paid vacation and sick time, winter shutdown, and 11 stat holidays each year
Paid parental leave
New hire equipment + accessories budget to optimize your setup
$1,500 annual learning and development allowance- AWS depth. Solid experience across VPC, IAM, EKS, S3, and CloudWatch
Experience. 5+ years in SRE, DevOps, or infrastructure engineering roles, with a track record of operating production systems across multiple regions
Observability. Experience with tooling such as Prometheus and Grafana
Kubernetes expertise. Cluster operations, autoscaling, RBAC, and Helm
Terraform. Modules, state management, and multi-environment patterns
Scripting. Comfort with Python and Bash for tooling and automation
Networking fundamentals. CIDR, DNS, load balancing, VPN, and cross-region connectivity
Pragmatism and ownership. Comfortable in a fast-moving startup with evolving priorities. You take ownership of systems while collaborating closely with other teams, and you’re pragmatic about tradeoffs between speed, reliability, and complexity
CI/CD and GitOps. Experience with GitHub Actions, ArgoCD, or similar workflows
Cross-platform familiarity. Working knowledge of both Linux and Windows environments. Operational experience supporting Windows-based workloads is a meaningful advantage
Windows on Kubernetes. Experience with Windows node pools, Windows AMIs, and GPU-adjacent components on K8s
Cost optimization. Cloud cost optimization at scale
Service mesh. Familiarity with service proxy or service mesh patterns
Container OS. Experience with container-optimized OS images (e.g., Bottlerocket, Packer)
Domain workloads. Experience supporting simulation, ML, or rendering workloads in cloud infrastructure
GPU scheduling. Familiarity with GPU scheduling on Kubernetes, including NVIDIA device plugin configuration
AWS extras. Exposure to AWS Storage Gateway, Active Directory integrations, or AWS Transfer Family
You think in failure modes and proactively surface issues
You hold a principled view on security and push back constructively when designs introduce unnecessary risk
You communicate clearly across engineering, product, and customer-facing teams, flagging issues with urgency proportional to customer impact
You take end-to-end ownership of complex efforts and know when to push for the clean solution versus the pragmatic one