Protege

Protege is an AI training data platform that connects AI developers with data holders. For AI developers, Protege offers a vast collection of high-quality training data across numerous modalities and…

AI DevelopersAI Training DataData GovernanceData HoldersData PlatformData ProcurementEthical SourcingMachine Learningwithprotege.ai

Protege

AI DevelopersAI Training DataData GovernanceData HoldersData PlatformData ProcurementEthical SourcingMachine Learningwithprotege.ai

HQUS

Team Size48

Open Jobs19

Total Funding$35M

Latest Fundraise8 months ago

TL;DR

What they do: AI training-data platform that connects AI developers with data holders and curates rights-protected multimodal datasets

Founded: 2024

Headquarters / Focus: New York City; initial vertical focus includes healthcare and media

Recent funding: Multiple rounds including $10M seed (Sep 2024) and a $25M Series A (Aug 2025); a later $30M Series A extension announced Jan 2026

Company Overview

Problem Domain

Data infrastructure for AI training — sourcing, curating, and transacting high-quality, rights-cleared training data across verticals (notably healthcare and media).

Founded

2024

Industry

Data Infrastructure and Analytics

Tech Stack

Cloudflare

Cloudflare CDN

Content Delivery Network

DMARC

DNSSEC

DoubleClick.Net

Google Analytics

HSTS

IPv6

US Privacy User Signal Mechanism

Funding Track Record

Seed- 2024-09-10

10000000

Participants included SV Angel, Liquid 2 Ventures, Bloomberg Beta, Flex Capital, Adam D'Angelo and others

Series A- 2025-08-13

25000000

Series A extension- 2026-01-07

30000000

Described as an extension expanding the prior Series A

Investor Signal

“Includes participation from CRV, Footwork, Andreessen Horowitz (a16z), Bloomberg Beta, Flex Capital, SV Angel, Liquid 2 Ventures, Adam D'Angelo, Shaper Capital, Travis May, and others”

Founders

What we do

Join the Team

Product Manager, Data Lab

RemoteUS

Remote • US

Related Companies

Company	HQ	Industry	Total Funding
Pareto.AI	🇺🇸US	HR and RecruitingInformation TechnologySoftware	-
Spotlab	🇪🇸Madrid, ES	BiotechnologyData and AnalyticsDeepTechHealthSoftware	-
SpotLab	🇪🇸Madrid, ES	BiotechnologyData and AnalyticsDeepTechHealthSoftware	-
SynMax	🇺🇸Houston, US	Data and AnalyticsDeepTechInformation TechnologyManufacturing	$19M
ADVANCE®AI	🇬🇧GB	BiotechnologyData and AnalyticsDeepTech	-

Company Overview: We are building Protege to solve the biggest unmet need in AI — getting access to the right training data. The process today is time intensive, incredibly expensive, and often ends in failure. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data.

Solving AI’s data problem is a generational opportunity. We’re backed by world-class investors and already powering partnerships with some of the most ambitious teams in AI. The company that succeeds will be one of the largest in AI — and in tech.

We’re a lean, fast-moving, high-trust team of builders who are obsessed with velocity and impact. Our culture is built for people who thrive on ambiguity, own outcomes, and want to shape the future of data and AI.

The Opportunity We’re hiring a Product Manager, Data Lab to sit at the center of Protege’s research and innovation engine.

This role exists to translate cutting-edge AI research and experimentation into scalable product capabilities — ensuring that the tools, workflows, and systems our Data Lab uses are aligned with how modern AI models are actually trained, evaluated, and deployed.

You will work closely with research scientists, applied ML engineers, and product teams to:

accelerate experimentation
improve reproducibility and iteration velocity
and decide which research outputs should become real, durable product features

This is a role for someone who understands frontier AI deeply , but chooses to apply that understanding through product judgment rather than research authorship.

What You’ll Do Productize Frontier AI Workflows

Partner closely with Data Lab scientists to understand how models are being trained, evaluated, and iterated today
Translate experimental workflows (data curation, labeling, evaluation, fine-tuning, feedback loops) into scalable product and platform capabilities
Identify patterns across experiments that are worth standardizing versus those that should remain bespoke

Build Tools That Reflect How AI Is Actually Built

Lead product discovery and execution for internal tools that support modern AI development:
dataset versioning
evaluation pipelines
annotation and human-in-the-loop workflows
experiment tracking and reproducibility
Ensure tooling reflects real-world frontier practices, not academic abstractions

Be a Bridge Between Research and Product

Serve as the primary product interface for the Data Lab
Translate research intuition into product requirements engineers can build against
Help researchers reason about tradeoffs between novelty, robustness, and scalability
Collaborate with Platform and Vertical PMs to ensure new capabilities integrate cleanly into customer-facing products

Exercise Strong Product Judgment

Decide when an experimental capability is ready to move from “research mode” to “product mode”
Apply an 80/20 mindset without undermining scientific rigor
Sunset or deprioritize tools and ideas that do not meaningfully advance AI development velocity or data quality

Measure Impact, Not Activity

Define success metrics tied to:
experiment cycle time
researcher productivity
adoption of internal tools
downstream impact on customer data products
Use qualitative and quantitative feedback to continuously iterate

Who You Are Deeply Fluent in Modern AI

You have hands-on or adjacent experience with how frontier AI models are built today — including large-scale training, fine-tuning, evaluation, and data iteration
You understand concepts like:
training data quality vs quantity tradeoffs
evaluation benchmarks vs real-world performance

A Product Thinker, Not a Researcher

You don’t need to publish papers — but you need to understand them
You excel at turning complex technical systems into clear product decisions
You enjoy asking: “What problem does this actually solve, and at what scale?”

Experienced Product Manager

5+ years of product management experience, ideally in:
AI/ML platforms
developer tools
data infrastructure
or internal research tooling
Strong experience working with highly technical stakeholders

Collaborative and High-Agency

Excellent communicator across research, engineering, and product
Comfortable influencing without authority
Bias toward shipping, learning, and iterating

Nice to Have

Prior experience working with or adjacent to frontier model builders
Experience with multimodal AI systems (text, audio, video, healthcare data)
Background in ML engineering, data science, or applied research before PM

Why Protege

Work directly on the infrastructure powering frontier AI development
Partner with world-class researchers and product leaders
Shape how experimental AI capabilities become scalable, real-world products
Competitive compensation, equity, and benefits

Protege

Protege

TL;DR

Company Overview

Problem Domain

Founded

Industry

Tech Stack

Funding Track Record

Investor Signal

Founders

What we do

Join the Team

Product Manager, Data Lab

Related Companies

Startup jobs. A lot of them.

Technical Writer

Technical Writer

Product Designer

Software Engineer

DevOps Engineer

Product Designer