
Founded by data industry veterans and backed by LinkedIn, DataHub enables organizations to deploy AI in production through an enterprise-grade metadata platform handling 3M+ PyPI downloads monthly. Leveraging our extensible metadata graph architecture with lineage-driven compliance and API-first design, we've built a unified system for technical teams requiring production-grade discovery, observability, and governance. Our dual solutions—open-source DataHub Core and fully-managed DataHub Cloud—provide what enterprises need for continuous AI & data asset management at scale. DataHub is a unique solution in this space with the following key differentiators: * Scalability: DataHub offers best-in-class enterprise-grade scalability in connecting to over 80 data sources, offering an embeddable connector framework, and ingesting large volumes and high velocity of metadata. * Extensibility: DataHub’s highly extensible metadata model offers easy flexibility in adapting to an organization’s unique data landscape, entities, relationships, ownership, and custom metadata descriptors. * Completeness: DataHub Cloud’s unified platform adds AI-based enhancements and automations for discovery & understanding, quality management, and collaborative governance, allowing users to confidently use and manage data and AI assets. * Ease of Adoption: Customers of DataHub benefit from the joint innovation, peer support, and growing skill base of an energized community of over 13,000 DataHub practitioners. Its user-friendly interface has a powerful and intuitive design, making it easier for users to navigate and utilize its features without extensive training. The managed service of DataHub Cloud offers dedicated support, improved performance and availability, and secure deployment options to ease adoption across an enterprise. For engineering teams deploying AI in production, DataHub delivers unified metadata infrastructure across all AI & data assets with enterprise-grade performance.

Founded by data industry veterans and backed by LinkedIn, DataHub enables organizations to deploy AI in production through an enterprise-grade metadata platform handling 3M+ PyPI downloads monthly. Leveraging our extensible metadata graph architecture with lineage-driven compliance and API-first design, we've built a unified system for technical teams requiring production-grade discovery, observability, and governance. Our dual solutions—open-source DataHub Core and fully-managed DataHub Cloud—provide what enterprises need for continuous AI & data asset management at scale. DataHub is a unique solution in this space with the following key differentiators: * Scalability: DataHub offers best-in-class enterprise-grade scalability in connecting to over 80 data sources, offering an embeddable connector framework, and ingesting large volumes and high velocity of metadata. * Extensibility: DataHub’s highly extensible metadata model offers easy flexibility in adapting to an organization’s unique data landscape, entities, relationships, ownership, and custom metadata descriptors. * Completeness: DataHub Cloud’s unified platform adds AI-based enhancements and automations for discovery & understanding, quality management, and collaborative governance, allowing users to confidently use and manage data and AI assets. * Ease of Adoption: Customers of DataHub benefit from the joint innovation, peer support, and growing skill base of an energized community of over 13,000 DataHub practitioners. Its user-friendly interface has a powerful and intuitive design, making it easier for users to navigate and utilize its features without extensive training. The managed service of DataHub Cloud offers dedicated support, improved performance and availability, and secure deployment options to ease adoption across an enterprise. For engineering teams deploying AI in production, DataHub delivers unified metadata infrastructure across all AI & data assets with enterprise-grade performance.
Product: Open-source metadata platform (DataHub Core) and managed SaaS (DataHub Cloud) for metadata, discovery, lineage, observability, and governance
Differentiators: Scalability, extensibility, completeness, and ease of adoption
Founders: Shirshanka Das and Swaroop Jagadish
Funding: Raised seed ($9M) and later rounds including $21M (2023) and Series B activity reported
Software Development / Metadata & Data Catalog
| Company |
|---|
Metadata management, data discovery, lineage, observability, and governance for AI and data workflows
Software Development
$9M
Seed round participation included LinkedIn and Insight Partners
$21M
Raised to grow the enterprise data catalog platform
DataHub is an AI & Data Context Platform adopted by over 3,000 enterprises, including Apple, CVS Health, Netflix, and Visa. Innovated jointly with a thriving open-source community of 13,000+ members, DataHub's metadata graph provides in-depth context of AI and data assets with best-in-class scalability and extensibility.
The company's enterprise SaaS offering, DataHub Cloud, delivers a fully managed solution with AI-powered discovery, observability, and governance capabilities. Organizations rely on DataHub solutions to accelerate time-to-value from their data investments, ensure AI system reliability, and implement unified governance, enabling AI & data to work together and bring order to data chaos.
Role Overview: As a Forward Deploy Engineer in our Customer Success team, you will partner with leading organizations to solve complex data challenges. You'll engage with data engineers, data scientists, and analysts to adopt Datahub’s innovative solutions and best practices, empowering them to achieve their data goals.
This role will shape the Forward Deploy Engineer function and contribute to the Customer Success and Engineering teams to help solve critical technological challenges. You'll work with a supportive, tight-knit team and interact with customers across the U.S., Europe, and the Middle East.
Large-Scale Financial Data Experience: Proven track record of partnering with Fortune 500 financial institutions to architect and implement enterprise-grade data management solutions handling petabyte-scale datasets across trading systems, risk management platforms, and regulatory reporting infrastructure. Experience navigating complex financial data ecosystems involving real-time market data feeds, transactional databases, and compliance-critical data warehouses while ensuring strict adherence to regulatory frameworks including SOX, GDPR, and financial services data governance standards. Demonstrated ability to work directly with quantitative analysts, risk managers, and compliance teams to understand their data lineage requirements and translate business needs into scalable technical solutions that support mission-critical financial operations and regulatory auditing processes.
Large-Scale Software Company Experience : Extensive experience working with Fortune 500 software companies as a Solution Architect or Forward Deploy Engineer,
designing and implementing enterprise data management solutions at scale. Proven ability to navigate complex technical landscapes in high-growth technology organizations, understanding their unique challenges around rapid data growth, multi-tenant architectures, and the need for real-time data insights to drive product development and business intelligence initiatives.
Thought Leadership & Customer Advisory : Demonstrated expertise in providing strategic guidance and best practices to enterprise customers, with a proven track record of developing and delivering customer-facing white papers, technical case studies, and solution frameworks. Experience translating complex technical concepts into business value propositions and creating scalable, repeatable solution-driven frameworks that can be adapted across multiple enterprise implementations. Strong ability to establish trusted advisor relationships with C-level executives and technical leadership teams, influencing strategic technology decisions through authoritative thought leadership content.
The Employee shall perform the following duties and responsibilities:
Technical Advisory: Serve as a trusted technical advisor, guiding customers on data ingestion, governance, and best practices.
Customer Collaboration: Collaborate with customers to understand their business needs, developing solutions that align with their objectives.
Technical Implementation: Build custom integrations and resolve technical issues during customer onboarding.
Python Developmen t - Advanced proficiency in Python is essential for building data integration solutions, automation scripts, and custom connectors within the Datahub ecosystem.
Data Platform Expertise
Data Lineage Implementation
Financial Services Domain Knowledge
Enterprise Integration Skills
Data Discovery & Cataloging
Apache Kafka
Elasticsearch - Knowledge of search indexing and optimization for Datahub's discovery features and performance tuning of metadata search capabilities.
Container Orchestration
Infrastructure Deployment: Build custom installs for self-hosted solutions in Azure, AWS, GCP with Terraform, Helm Charts, Ansible, and other deployment tools.
Product Development Partnership: Partner with product and engineering teams to identify opportunities for feature improvements and innovation.
Documentation and Knowledge Sharing: Create and maintain technical documentation, implementation guides, and share knowledge within the team and
the community.
Large Scale Company Experience: Experience as a Solutions Architect/Forward Deploy Engineer with Fortune 100-500 companies is highly desirable
Benefits and Perks We invest in people so they can do their best work and enjoy doing it. Our benefits reflect the way we build: practical, thoughtful, and designed to support long-term growth.
Competitive Compensation We offer salaries that reflect your skills, experience, and the impact you make. You bring value—we make sure you're recognized for it.
Equity for everyone Every team member receives an ownership stake in the company. When we grow, you grow with us.
Remote Work All roles are remote unless otherwise specified in the job description. Review the job description to confirm if the role you are interested in is remote or hybrid.
Location flexibility Home office, coworking space, or something in between? We support your ideal setup. You’ll receive a monthly coworking stipend to use whenever you need a change of pace or in-person collaboration time.
Comprehensive health coverage Your well-being matters. We cover 99% of medical, dental, and vision premiums employees, and 65% for dependents.
Flexible savings accounts We offer FSAs to help cover planned or unexpected healthcare costs. You can also opt into a Dependent Care FSA to support family needs.
Support for every path to parenthood Through Carrot Fertility, we provide inclusive fertility benefits and family-forming support. All U.S. employees have access, regardless of age, gender identity, or family structure.
Time off that works for you We trust you to take the time you need. Our unlimited PTO and sick leave policy is designed for flexibility, rest, and real life.
Why Join Us DataHub is at a rare inflection point: we’ve achieved product-market fit, earned the trust of leading enterprises, and secured backing from top-tier investors like Bessemer Venture Partners and 8VC. The context platform market is expected to grow from $1B to $9B in the next five years—and we’re leading the way.
By joining our team, you’ll:
If you're passionate about technology, enjoy working with customers, and want to be part of a fast-growing company changing the industry, we want to hear from you!