New research institution brings rigor and standards to the AI data layer, launching with participation from five of the world’s leading AI companies
Protege, an AI data platform providing trusted, real-world data at scale, announced DataLab at Protege, a new research institution advancing the science of AI data. Built to support leading AI labs and global technology companies operating at the frontier of AI, DataLab helps AI researchers and pioneers navigate challenges and ambiguity in data quality, selection, representation, complexity, methodology, and safety for AI research.
DataLab’s team of in-house experts and researchers innovate to produce, repackage, and surface novel training and evaluation datasets from data produced in the real world. At launch, a majority of the “Magnificent 7” AI companies and major frontier AI labs are collaborating with DataLab across various AI training and evaluation data projects.
DataLab launches at a time when AI development is increasingly shaped by data limitations. As models grow more advanced, progress depends not only on model size and compute, but also on access to high-quality, well-curated training data. Built with the same scientific ambition of a frontier model lab, DataLab brings discipline to dataset design, construction, and evaluation, establishing clear quality standards and reproducible methodologies that translate into more reliable systems and measurable performance gains.
“We understand the three core pillars driving AI: models, chips, and data. We are convinced that with the right datasets—the third, underdeveloped pillar—you can push the entire frontier forward,” said Bobby Samuels, CEO of Protege. “We created DataLab to treat data as infrastructure, not exhaust. If we want more capable, reliable systems, we need standards, reproducibility, and real scientific discipline at the data layer.”
DataLab operates across three core areas:
- Scientific partnerships: DataLab engages directly with leading AI researchers to navigate frontier-level technical discussions and identify commercially viable pathways.
- Building high-value datasets and data products: Through deep methodological discipline, exposure to commercial data applications, and rigorous processes, DataLab develops new product opportunities that originate from the lab.
- Leading AI data research: DataLab maintains an active presence within the broader academic community by publishing cutting-edge data research, designing evaluations and benchmarks, and identifying gaps in today’s training and evaluation data.
Led by Engy Ziedan, Co-Founder and Chief Scientific Officer at Protege, DataLab brings together machine learning researchers, economists, and domain experts with deep experience in evaluation, dataset design, and applied AI systems. Built to operate alongside both frontier AI research institutions and the world’s leading technology companies, the team combines academic rigor with applied expertise to raise the standards of the AI data layer.
Marketing Technology News: MarTech Interview With Fredrik Skantze, CEO and Co-founder of Funnel
“The strength of DataLab is its ability to integrate perspectives that are often siloed,” said Ziedan. “Advancing AI requires more than larger models or more data alone. It requires thinking at the margin, where we weigh the marginal value of a datapoint on learning and the opportunity cost of choosing the wrong dataset. This requires disciplined dataset design, careful evaluation, and a deep understanding of real-world complexity. Our team is structured to deliver exactly that.”
Since its launch, DataLab has released multimodal healthcare benchmark datasets designed to reflect diagnostic ambiguity and longitudinal clinical context, co-designed MedScribe and Medcode, two multimodal benchmarks for healthcare, and is collaborating with frontier AI organizations on high-stakes data challenges ranging from advanced cancers to agentic task selection to audio de-identification to international healthcare representation.
Marketing Technology News: The Death of Third-Party Cookies Was Just the Start. Are You Ready for Consent Orchestration?
“Data quality has become the defining constraint in frontier AI development, yet investment and innovation have lagged,” said Nikhil Basu Trivedi, Co-Founder and General Partner at Footwork. “That changes with DataLab at Protege, which brings the same level of rigor and expertise to AI data that we have for AI chips and models. DataLab experts are doing the essential AI data infrastructure work and research that moves the AI frontier forward.”
As AI systems move from research environments into high-stakes, real-world use cases, the strength of the data foundation becomes decisive. DataLab is inviting collaboration from frontier labs, academic researchers, and domain experts committed to raising the standards for how AI data is built, validated, and measured.










