
UptimeAI is a company that provides AI-driven operational excellence solutions for heavy manufacturing industries, focusing on optimizing maintenance, reliability, and process efficiency. Their product combines artificial intelligence with over 200 years of combined industry experience to deliver real-time predictions, root cause analysis, and prescriptive recommendations for plant operations. The AI system integrates with existing plant data systems and continuously learns from new data and user feedback to improve accuracy and reduce false alarms. UptimeAI's solution covers the full balance of plant, including over 120 equipment types and 500 failure modes, enabling comprehensive monitoring and diagnosis to prevent equipment failures, reduce maintenance costs, and increase productivity and efficiency. Their clients include major utilities, oil marketing companies, and chemical producers globally, demonstrating strong traction and market acceptance. The company operates on a SaaS business model, delivering rapid deployment and scalable AI solutions tailored for process industries.

UptimeAI is a company that provides AI-driven operational excellence solutions for heavy manufacturing industries, focusing on optimizing maintenance, reliability, and process efficiency. Their product combines artificial intelligence with over 200 years of combined industry experience to deliver real-time predictions, root cause analysis, and prescriptive recommendations for plant operations. The AI system integrates with existing plant data systems and continuously learns from new data and user feedback to improve accuracy and reduce false alarms. UptimeAI's solution covers the full balance of plant, including over 120 equipment types and 500 failure modes, enabling comprehensive monitoring and diagnosis to prevent equipment failures, reduce maintenance costs, and increase productivity and efficiency. Their clients include major utilities, oil marketing companies, and chemical producers globally, demonstrating strong traction and market acceptance. The company operates on a SaaS business model, delivering rapid deployment and scalable AI solutions tailored for process industries.
**About the Company:**
UptimeAI is leading the way in predictive analytics and AI-driven solutions to optimize operational uptime and reduce downtime for industrial and enterprise clients. Our innovative platform harnesses cutting-edge data science to deliver actionable insights, ensuring maximum efficiency and reliability. UptimeAI uniquely combines Artificial Intelligence with Subject Matter Knowledge from 200+ years of cumulative experience to explain interrelations across upstream/downstream equipment, adapt to changes, identify problems, and give prescriptive diagnosis like a human expert would.
**About the Role:**
We are a fast-growing, AI-first SaaS startup backed by top-tier investors and operating across India and the US. Our platform helps enterprises optimize critical business functions using cutting-edge AI and automation. As we scale, we’re looking for a hands-on DevOps Engineer who thrives in startup environments and can take ownership of cloud infrastructure, deployment, and CI/CD workflows.
**Responsibilities:**
- Design, implement, and manage cloud infrastructure across Azure for both internal platforms and customer-specific deployments
- Configure and maintain VPCs, VPNs, and peering to enable secure, scalable, and isolated environments
- Build and automate CI/CD pipelines for application and ML workloads
- Manage multi-tenant vs single-tenant deployments based on customer requirements
- Implement monitoring, alerting, logging, and disaster recovery strategies
- Work closely with engineering to ensure seamless Dev→Prod flows and secure release management
- Set up and manage infrastructure as code (e.g., Terraform, Pulumi, Bicep, CloudFormation)
- Optimize costs, performance, and availability for both internal and customer-facing cloud workloads
- Enforce security best practices, access control, and compliance across infrastructure
**Qualifications:**
- 3 - 8 years of experience as a DevOps/SRE/Cloud Engineer in high-growth SaaS or product startups
- AWS Certified (at least Solutions Architect - Associate) and Azure Certified (e.g., AZ-104 or higher)
- Strong experience with Azure networking, including: VPC, VPNs, Subnets, Route Tables, Security Groups, NAT Gateways
- Site-to-site VPN setups for enterprise customers
- Proven experience deploying applications to customer-controlled cloud environments (BYOC) and company-controlled SaaS environments
- Expertise with tools like: CI/CD: GitHub Actions, GitLab CI, Azure Pipelines; IaC: Terraform, Bicep, or Pulumi; Containers: Docker, Kubernetes (EKS/AKS preferred)
- Familiarity with Secrets Management, IAM, Role-based Access Control, and SSO/SAML integration
- Strong scripting skills in Bash, Python, or PowerShell
- Comfortable working in a fast-paced, ambiguous startup environment
**Required Skills:**
- Experience with AI/ML pipeline deployment or GPU workloads
- Exposure to SOC2, ISO27001, or GDPR compliance in a cloud environment
- Familiarity with tools like Prometheus, Grafana, Datadog, ELK, or Azure Monitor
**Pay range and compensation package:**
Not specified in the provided job description.
**Equal Opportunity Statement:**
UptimeAI is committed to diversity and inclusivity in the workplace.
**Why to join UptimeAI:**
- Impact Industry-Wide Change: Contribute to transformative solutions that significantly improve operational efficiency and reliability for global clients.
- Collaborative and Growth-Oriented Environment: Join a talented, passionate team that values innovation, continuous learning, and professional growth.
- Opportunities for Leadership and Innovation: Lead pioneering projects, influence product development, and shape the future of industrial AI solutions.