
Protege is an AI training data platform that connects AI developers with data holders. For AI developers, Protege offers a vast collection of high-quality training data across numerous modalities and verticals, with a quick and easy process for data procurement, reducing the time by 90% or more. The data is thoughtfully sourced and ethically managed. For data holders, Protege provides access to AI developers ranging from startups to large tech companies, ensuring governance control over privacy and IP, and offering a pain-free platform to share data. Protege's platform allows for seamless and quick data exchange, with expertise in determining data value and ensuring fair compensation. They have a network of AI tech companies using their platform and emphasize data source centricity with expert support.

Protege is an AI training data platform that connects AI developers with data holders. For AI developers, Protege offers a vast collection of high-quality training data across numerous modalities and verticals, with a quick and easy process for data procurement, reducing the time by 90% or more. The data is thoughtfully sourced and ethically managed. For data holders, Protege provides access to AI developers ranging from startups to large tech companies, ensuring governance control over privacy and IP, and offering a pain-free platform to share data. Protege's platform allows for seamless and quick data exchange, with expertise in determining data value and ensuring fair compensation. They have a network of AI tech companies using their platform and emphasize data source centricity with expert support.
What they do: AI training-data platform that connects AI developers with data holders and curates rights-protected multimodal datasets
Founded: 2024
Headquarters / Focus: New York City; initial vertical focus includes healthcare and media
Recent funding: Multiple rounds including $10M seed (Sep 2024) and a $25M Series A (Aug 2025); a later $30M Series A extension announced Jan 2026
Data infrastructure for AI training — sourcing, curating, and transacting high-quality, rights-cleared training data across verticals (notably healthcare and media).
2024
Data Infrastructure and Analytics
10000000
Participants included SV Angel, Liquid 2 Ventures, Bloomberg Beta, Flex Capital, Adam D'Angelo and others
25000000
30000000
Described as an extension expanding the prior Series A
“Includes participation from CRV, Footwork, Andreessen Horowitz (a16z), Bloomberg Beta, Flex Capital, SV Angel, Liquid 2 Ventures, Adam D'Angelo, Shaper Capital, Travis May, and others”
Company Overview:
We are building Protege to solve the biggest unmet need in AI — getting access to the right training data. The process today is time intensive, incredibly expensive, and often ends in failure. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data.
Solving AI’s data problem is a generational opportunity. We’re backed by world-class investors and already powering partnerships with some of the most ambitious teams in AI. The company that succeeds will be one of the largest in AI — and in tech.
We’re a lean, fast-moving, high-trust team of builders who are obsessed with velocity and impact. Our culture is built for people who thrive on ambiguity, own outcomes, and want to shape the future of data and AI.
We’re looking for a Founding Product Manager to own product development in Protege’s media business. With a handful of media customers today (including some of the biggest foundation models) and a full pipeline, we’re looking for someone to take the reins from our Head of Product and deliver on maturing the product suite serving our media vertical. Product opportunities span data ingestion and delivery systems — the backbone of how we scale our media data catalog for AI training — as well as the development of UIs and possibly data products. It will be up to you to find opportunities, work with stakeholders to prioritize them, design solutions with engineering, and oversee their build and rollout.
Up first, you’ll bridge engineering, product, and partnerships to ensure that millions of hours of content flow seamlessly through Protege’s ingestion, normalization, and metadata enrichment pipelines. You’ll work cross-functionally to build a product roadmap that will help the media business scale and translate those opportunities into clear requirements for our engineering team, managing the development, QA, and launch cycles for those solutions.
What You’ll Do
Define, prioritize, and execute the roadmap for Protege’s media business, starting with projects in data ingestion/processing and content visibility/searchability.
Work with external partners and internal teams to understand pain points and prioritize solutions.
Translate abstract problems (e.g., “partner ingestion bottlenecks”) into actionable, measurable engineering work.
Manage backlog, tradeoffs, and stakeholder communication with precision and transparency.
Understand how AI developers use media embeddings and metadata downstream, ensuring data quality aligns with use case needs.
As a founding team member: contribute to product culture, hiring, and process definition.
Who You Are
Bonus if you have these experiences:
Why Product at Protege

| Company |
|---|