Software & Apps

Data Provenance Platform Launched to Address Data Transparency Crisis in AI

10/26/2023

1 minute read

Researchers from MIT, Cohere for AI, and 11 other institutions have collaborated to launch the Data Provenance Platform, aiming to tackle the data transparency crisis in the AI space. The platform aims to address the lack of information and understanding surrounding the origin and licensing of widely used AI datasets.

The team undertook an audit and tracing process of nearly 2,000 of the most popular fine-tuning datasets, which have collectively been downloaded millions of times and have played a crucial role in numerous breakthroughs in natural language processing (NLP). The authors of the project, Shayne Longpre from MIT Media Lab and Sara Hooker from Cohere for AI, describe this initiative as the largest audit of AI datasets to date.

In their announcement, Longpre and Hooker stated, “For the first time, these datasets include tags to the original data sources, numerous re-licensings, creators, and other data properties.” This comprehensive information is now made available and easily accessible through the interactive platform called the Data Provenance Explorer.

The Data Provenance Explorer enables developers to track and filter thousands of datasets, taking legal and ethical considerations into account. It also provides an avenue for scholars and journalists to explore the composition and data lineage of popular AI datasets.

The Latest

AI Agents and the Future of User Interface and User Experience

The Language of Immersive Technology: AR, MR, and VR

tag to contain the h1 title –> The Power of GPT-4 Turbo: The Latest Advancement in AI Language Models

Qualcomm becomes No. 2 Leader in U.S. Patent Grants, Surpassing IBM

Data Provenance Platform Launched to Address Data Transparency Crisis in AI

Leave a Reply Cancel reply

AI Agents and the Future of User Interface and User Experience

The Language of Immersive Technology: AR, MR, and VR

tag to contain the h1 title –> The Power of GPT-4 Turbo: The Latest Advancement in AI Language Models

Qualcomm becomes No. 2 Leader in U.S. Patent Grants, Surpassing IBM

MMGuardian Introduces AI-Powered Smartphone for Kids Focusing on Child Safety

The Rise of AI Wearables: Tab Raises $1.9 Million in Seed Funding

OpenAI’s GPT Store: A Platform for Custom GPTs

OpenAI Announces New ChatGPT Team Subscription Tier

Data Provenance Platform Launched to Address Data Transparency Crisis in AI

Leave a Reply Cancel reply

Related Posts