Hydrolix is a streaming data lake platform designed to manage high-volume streaming log data, transforming the economics of data management. It combines decoupled storage, indexed search, and stream…
Big DataCost ReductionData ManagementLog DataObservabilityPetabyte ScaleReal-time AnalyticsStreaming Data Lakehydrolix.io
Hydrolix
Hydrolix is a streaming data lake platform designed to manage high-volume streaming log data, transforming the economics of data management. It combines decoupled storage, indexed search, and stream…
Big DataCost ReductionData ManagementLog DataObservabilityPetabyte ScaleReal-time AnalyticsStreaming Data Lakehydrolix.io
HQPortland, US
Team Size221
Open JobsUnknown
Total Funding$145M
Latest Fundraiselast year
TL;DR
What they do: Streaming data lake optimized for high-volume log and observability analytics with real-time and historical queries
Founded / HQ: Founded 2018; headquartered in Portland, Oregon
Recent funding: Raised an $80M Series C (April 2025); prior rounds include $35M Series B and $10M seed
High-cost and complexity of storing, querying, and retaining large-scale log and observability data
Founded
2018
Industry
Software Development
Funding Track Record
Seed- 2021-02-24
$10M
Seed announced Feb 24, 2021
Series B- 2024-05-22
$35M
Company reported total raised of $68M after this round
Series C- 2025-04-03
$80M
Series C announced Apr 3, 2025
Investor Signal
“Backed by multiple institutional investors including QED Investors, Blumberg Capital, Frontline Ventures, Pruven Capital, Sozo Ventures, S3 Ventures and others”
Founders
What we do
Join the Team
Principal SRE
RemoteIN
Remote • IN
We are looking for a
Principal Site Reliability Engineer
to join our dynamic Services team. In this role, you will contribute to the reliability and scalability of our cutting-edge platform, ensuring exceptional solutions tailored to our customers’ unique needs. This is a highly technical, hands-on role that requires deep expertise in system reliability and automation.
Key Responsibilities:
Startup jobs. A lot of them.
Your next opportunity is in here somewhere. Sign up to explore 70,000+ startups and their open roles. No spam. No gamification. Just jobs.
70,000+
Startups
83,000+
Open Roles
4,800+
New This Week
Product Designer
InternshipTel Aviv
Internship • Tel Aviv
Software Engineer
InternshipRotterdam, NL
Internship • Rotterdam, NL
Machine Learning Engineer
InternshipUtrecht, NL
Internship • Utrecht, NL
Machine Learning Engineer
Full-timeNovi Sad, RS
Full-time • Novi Sad, RS
Software Engineer
Part-timeCambridge, GB
Part-time • Cambridge, GB
Frontend Developer
Full-timeBerlin, DE
Full-time • Berlin, DE
Related Companies
Company
HQ
Industry
Total Funding
WisdomAI
🇺🇸San Francisco, US
Data and AnalyticsInformation TechnologySoftware
$73M
Chalk
🇺🇸San Francisco, US
Data and AnalyticsDeepTechInformation TechnologySoftware
$60M
Tiger Data (creators of TimescaleDB)
🇺🇸US
—
-
Snowplow
🇬🇧London, GB
Data and AnalyticsInformation TechnologySoftware
$55M
Druid AI
🇺🇸New York City, US
Administrative ServicesData and AnalyticsDeepTechHR and RecruitingInformation TechnologySoftware
$82M
Reliability Engineering:
Design and build automated systems that ensure the reliability and scalability of our Kubernetes clusters and Hydrolix deployments across multiple cloud platforms, eliminating manual operational tasks.
Automation and Efficiency
: Identify, quantify, and systematically eliminate repetitive manual work through automation and improved tooling, eliminating toil and freeing the team to focus on high-value work.
Observability Infrastructure
: Build and enhance comprehensive observability systems that provide deep visibility into system behavior, enable debugging and troubleshooting, and support data-driven reliability decisions
CI/CD and Deployment Automation
: Design and build robust CI/CD pipelines and deployment automation that enable safe, frequent releases with minimal human intervention.
Infrastructure Reliability
: Deploy, maintain, and ensure a highly reliable fleet of Kubernetes clusters and Hydrolix deployments across multiple cloud platforms.
Service Optimization
: Design, implement, and maintain systems and processes to enhance the reliability, availability, and performance of our services.
Root Cause Analysis
: Conduct comprehensive root cause analyses for system failures, implementing long-term preventive measures.
Collaboration and Customer Engagement
Cross-Functional Teamwork
: Work closely with software engineering, infrastructure, and product teams to integrate reliability practices into every stage of the development lifecycle.
Knowledge Sharing
: Document systems, create runbooks, and share knowledge across the organization to build collective expertise in reliability engineering.
Reliability Advocacy
: Champion SRE best practices and foster a culture of operational excellence across the organization.
Reliability Systems
: Build and maintain centralized reliability platforms, tools, and services that empower all engineering teams to operate their systems effectively.
Global Team Collaboration
: Collaborate with a distributed team of engineers worldwide to provide round-the-clock support and continuous improvement of our reliability posture.
Customer-Facing Reliability
: Work with customers to understand reliability requirements and ensure our platform meets their operational needs.
Qualifications and Skills:
SRE Expertise:
With a minimum 10+ years of proven experience as a Site Reliability Engineer, DevOps Engineer, or similar role, supporting large-scale, complex distributed systems in production.
Demonstrated ability to operate at a principal level by setting reliability direction, defining standards, and influencing system design across multiple teams.
Architecture, Performance & Scalability
Deep experience designing and evolving system architectures with reliability, scalability, and operability as first-class concerns.
In-depth experience in application and infrastructure performance tuning and scaling to handle heavy workloads under varying traffic patterns and failure scenarios.
Ability to identify systemic bottlenecks, capacity risks, and inefficiencies, and drive long-term architectural improvements.
Automation, Platform & Infrastructure Engineering
Exceptional track record of eliminating toil through automation, including building internal platforms or frameworks that enable safe, scalable self-service.
In-depth knowledge of configuration management and Infrastructure as Code (IaC) tools such as Terraform, Pulumi, and Ansible for provisioning and managing infrastructure consistently across environments.
Observability & Reliability Engineering
Deep expertise in observability tools and practices, with the ability to design end-to-end monitoring strategies aligned with business outcomes.
Strong understanding of core reliability concepts, including SLIs, SLOs, SLAs, error budgets, golden signals, and quality gates.
Hands-on experience with distributed tracing, synthetic monitoring, end-user monitoring, performance testing, and chaos engineering.
Proven experience driving blameless postmortems and ensuring learnings result in measurable reliability improvements.
Kubernetes & Distributed Systems
Deep understanding of Kubernetes architecture, operations, failure modes, and ecosystem tooling.
Experience designing and operating multi-cluster and/or multi-region Kubernetes platforms at scale.
Cloud & Multi-Cloud Expertise
Demonstrated proficiency in at least one major cloud platform (AWS, GCP, Azure, or Linode), with experience building cloud-native systems.
Familiarity with multi-cloud or hybrid architectures and the operational trade-offs involved.
Networking, Security & Traffic Management
Experience with network load balancing, traffic management, and capacity planning at scale.
Strong understanding of security technology stacks, Transport Layer Security (TLS), certificate management, and standard networking protocols and configurations.
Data & Storage Systems
Experience working with SQL databases; familiarity with PostgreSQL is a plus.
Ability to reason about performance, availability, and scaling characteristics of data-intensive systems.
Programming & Systems Engineering
Strong programming ability in Go, Python, or Rust, with a proven ability to build and maintain production-quality tools, services, and automation.
Comfortable reviewing, shaping, and influencing code across multiple teams and services.
Linux & Infrastructure Fundamentals
Deep experience with Linux systems, including performance tuning, capacity planning, and low-level system troubleshooting.
Incident Management & Operational Excellence
Extensive experience leading high-severity incidents, managing cross-team response, and driving post-incident reviews.
Ability to translate incident learnings into systemic fixes, architectural changes, and improved operational standards.
We look forward to seeing how you can make an impact at Hydrolix.