
Protege is an AI training data platform that connects AI developers with data holders. For AI developers, Protege offers a vast collection of high-quality training data across numerous modalities and verticals, with a quick and easy process for data procurement, reducing the time by 90% or more. The data is thoughtfully sourced and ethically managed. For data holders, Protege provides access to AI developers ranging from startups to large tech companies, ensuring governance control over privacy and IP, and offering a pain-free platform to share data. Protege's platform allows for seamless and quick data exchange, with expertise in determining data value and ensuring fair compensation. They have a network of AI tech companies using their platform and emphasize data source centricity with expert support.

Protege is an AI training data platform that connects AI developers with data holders. For AI developers, Protege offers a vast collection of high-quality training data across numerous modalities and verticals, with a quick and easy process for data procurement, reducing the time by 90% or more. The data is thoughtfully sourced and ethically managed. For data holders, Protege provides access to AI developers ranging from startups to large tech companies, ensuring governance control over privacy and IP, and offering a pain-free platform to share data. Protege's platform allows for seamless and quick data exchange, with expertise in determining data value and ensuring fair compensation. They have a network of AI tech companies using their platform and emphasize data source centricity with expert support.
What they do: AI training-data platform that connects AI developers with data holders and curates rights-protected multimodal datasets
Founded: 2024
Headquarters / Focus: New York City; initial vertical focus includes healthcare and media
Recent funding: Multiple rounds including $10M seed (Sep 2024) and a $25M Series A (Aug 2025); a later $30M Series A extension announced Jan 2026
| Company |
|---|
Data infrastructure for AI training — sourcing, curating, and transacting high-quality, rights-cleared training data across verticals (notably healthcare and media).
2024
Data Infrastructure and Analytics
10000000
Participants included SV Angel, Liquid 2 Ventures, Bloomberg Beta, Flex Capital, Adam D'Angelo and others
25000000
30000000
Described as an extension expanding the prior Series A
“Includes participation from CRV, Footwork, Andreessen Horowitz (a16z), Bloomberg Beta, Flex Capital, SV Angel, Liquid 2 Ventures, Adam D'Angelo, Shaper Capital, Travis May, and others”