Help build the cloud platform that powers a next-generation radiology workflow system. As NewVue.ai’s Platform/SRE Lead, you’ll own the AWS foundations, CI/CD, observability, and security that let our product teams ship fast, safely, and repeatedly. If you like turning chaos into paved roads and greenfield into reliable, auditable infrastructure, this is your role.
About NewVue.ai
NewVue.ai is a rapidly growing healthcare technology company focused on modernizing radiology workflow. Our platform helps imaging groups and hospitals streamline their operations by connecting disparate systems, like PACS, dictation tools, and EHRs, into a unified, efficient radiologist cockpit. We work closely with radiologists, IT teams, and clinical leaders to improve turnaround times, reduce errors, and deliver better patient outcomes. We’re a growing, nimble team that values autonomy, flexibility, creativity, and a collaborative spirit.
Job Overview
The Platform/SRE Lead owns NewVue’s cloud platform and delivery pipeline so product teams can ship fast and safely. You will design and run our AWS foundations (IaC, networking, security), standardize CI/CD (blue-green/canary, rollback), instrument observability, and embed security/compliance gates for HIPAA/SOC 2. You’ll turn our orchestration platform into a paved road for products, so features launch in weeks, not quarters.
Key Responsibilities
Platform Engineering & AWS
- Build and own AWS foundations: VPC design, subnets, routing, security groups, WAF, KMS, IAM, backups, and DR (RTO/RPO).
- Implement Infrastructure as Code (Terraform and/or CDK) with reusable modules, environment strategy, drift detection, and change control.
- Operate core services (e.g., ECS/EKS, EC2, RDS/Aurora, S3, SQS/SNS/EventBridge, API Gateway/ALB/NLB) supporting multi-tenant microservices and data flows (HL7, FHIR, S3/SFTP).
CI/CD & Release Engineering
- Standardize pipelines for all services (build, test, scan, deploy) with artifact/versioning, progressive delivery (blue-green/canary), and automated rollback.
- Implement test gates (unit/API/contract/smoke), schema and API compatibility checks, and zero-downtime migrations.
Observability & Reliability
- Establish logs/metrics/traces and golden dashboards; define SLIs/SLOs and error budgets per product.
- Lead on-call readiness: runbooks, incident response/postmortems, MTTR improvements, capacity/perf tuning, and cost optimization (tagging/FinOps, Savings Plans).
- Stand up monitoring/alerting with Datadog/Dynatrace (or similar) and manage vendor integrations.
Security & Compliance by Design
- Embed SAST/DAST, dependency and container scans, IaC policy-as-code, SBOMs, and secrets management in the pipeline.
- Partner with CTO on HIPAA/SOC 2/HITRUST controls; document and evidence controls within CI/CD and cloud.
Developer Experience (“Paved Paths”)
- Provide service templates, adapter scaffolds, local dev tooling, and reference repos that enforce best practices by default.
- Coach engineers on cloud/runtime patterns (resiliency, retries/fan-out/fan-in, idempotency, backpressure).
Cross‑Functional Delivery
- Work with Head of Engineering to align capacity and keep the bi-weekly release train on time.
- Partner with CTO/Lead Architect on platform guardrails and reusable “pipes” for product teams (Ingress/Egress APIs only).
- Support COO/TPM and Implementation on onboarding tiers and reusable adapters for customer go-lives.
Required Qualifications
- 7+ years in DevOps/SRE/Platform Engineering; 3+ years leading or acting as primary owner for cloud platforms.
- Deep AWS expertise (networking, IAM/KMS, containers/orchestration, RDS/Aurora, eventing), with IaC at scale (Terraform and/or CDK).
- Proven CI/CD design and operation (progressive delivery, automated rollback, artifacts, env promotion).
- Strong observability practice (logs/metrics/traces), SLO/SLI design, and incident management (on-call, postmortems).
- Security-minded: secrets mgmt, least privilege, image/dependency/IaC scanning; experience supporting HIPAA/SOC 2 or similar.
- Proficiency in a scripting language (Python, TypeScript, or Bash) and solid Git workflows.
Nice to Have
- Healthcare/radiology data flows (HL7, FHIR), PHI handling patterns, and regulated-environment experience.
- Experience with contract testing, schema registries, and message replay/poison queue handling.
- Datadog or Dynatrace administration; FinOps practices.
What Success Looks Like (90 Days)
- One standardized CI/CD pipeline pattern adopted by product teams; blue-green/canary live for at least one service.
- Baseline observability with golden dashboards; documented SLOs for Reporting and Cockpit services.
- Terraform/CDK modules in place for core infra; backups/DR tested; secrets centralized.
- Incident/runbook library started; MTTR trending down; first cost dashboard live.
Growth Path
- Build a small Platform/SRE function (SRE + QA automation partner).
- Evolve paved-path tooling, expand policy-as-code, and scale multi-tenant controls as customer volume grows.
Benefits
- Comprehensive health, dental, and vision insurance
- Unlimited PTO – with the expectation of responsible use and alignment with team needs
- Fully remote work environment – work from anywhere in the U.S.