DefinedCrowd launches DefinedData and welcomes Balderton as a new investor

This product launch follows the recent closing of a US$50.5M Series B funding round and the addition of Balderton as a new investor.

DefinedCrowd is the leader in AI training data, combining their market-leading workflow automation, machine learning and human intelligence to provide the highest quality training data to teams at the cutting edge of AI.

The company, founded in 2015 by CEO Daniela Braga, now offers vision, speech and NLP data in over 50 languages spanning 70 countries, with its platform already serving a blue-chip global client base that includes Mastercard, BMW, Yahoo Japan and many more. DefinedCrowd's market-leading position fuelled 600%+ growth last year.  

Daniela Braga, founder and CEO of DefinedCrowd

Sourcing high quality training data continues to be a time consuming and expensive process with data scientists often spending upward to 80% of their time on the data acquisition. This pain point and the demand for training data will only increase. Gartner estimates that “by 2022, 35% of large organizations will be either sellers or buyers of data via formal online data marketplaces, up from 25% in 2020.” 

A range of players has emerged to solve for discrete parts of the data supply chain, from sourcing data (both human and synthetic) to automating annotation. Some have gone after specific verticals, such as vision or speech. However, DefinedCrowd is unique in having a full-stack solution that seamlessly integrates with customer workflows and can efficiently deliver high-quality data at scale.

More exciting still, today DefinedCrowd launches DefinedData: a new marketplace for AI training data. The new on-demand service will help companies to speed up the release of AI products to market using high-quality, off-the-shelf datasets from DefinedCrowd.

These pre-collected datasets, annotated and validated by a global crowd, can be used to train baseline models or evaluate and benchmark current models. 

The value of easily accessible data is not to be underestimated. However, our offering also centres itself around quality. We know the success of our customers’ AI model greatly depends on the quality of data used to fuel them.

Daniela Braga, founder and CEO of DefinedCrowd.

DefinedCrowd's new DefinedData Offering

DefinedData will start with speech data, allowing customers to source high quality data sets in multiple languages and domains on a self-serve basis, either via one-time purchase or subscription. By May 2021, the library is expected to grow to include over 25,000 hours of speech and natural language data.

"As the appetite for high quality data continues to grow, the market for training data will become increasingly modularised. High quality training data libraries and marketplaces will be a key feature of the value chain, allowing teams to both monetise existing data sets as well as source new data time and cost effectively. We are incredibly excited to be joining Daniela and DefinedCrowd on their journey as they pave the way in this space."

Laura Connell, Principal, Balderton

Stay in touch with Balderton

Sign up for our newsletter to stay up to date on news from Balderton, and our portfolio.