Security & Surveillance

AI Dataset Raises Concerns

12/21/2023

2 minute read

A massive open-source AI dataset, LAION-5B, has come under scrutiny due to its inclusion of child sexual abuse material, according to a new report published by the Stanford Internet Observatory. The dataset, which has been used to train popular AI text-to-image generators, contains at least 1,008 instances of child sexual abuse material, with additional instances suspected. The report warns that this could enable AI products built on this data to generate new and potentially realistic child abuse content. LAION, the organization behind the dataset, has responded by temporarily taking down its datasets to ensure their safety before republishing them.

LAION’s image datasets have faced criticism in the past. In a paper published in October 2021, cognitive scientist Abeba Birhane highlighted problematic and explicit content in an earlier image dataset released by LAION. The dataset contained images and text pairs depicting rape, pornography, malign stereotypes, racist and ethnic slurs, and other highly problematic content.

Furthermore, there have been instances of private medical record photos taken by a doctor in 2013 being referenced in the LAION-5B image dataset. The artist Lapine discovered these photos on the Have I Been Trained website, which allows individuals to search for their work in popular AI training datasets.

A class-action lawsuit, Andersen et al. v. Stability AI LTD et al., was filed by visual artists Sarah Andersen, Kelly McKernan, and Karla Ortiz against Stability AI, Midjourney, and DeviantArt. Although LAION was not directly sued, it was named in the lawsuit. The lawsuit alleges that Stability AI downloaded or acquired copies of billions of copyrighted images from the internet, including LAION’s dataset, without permission.

LAION-5B has also raised concerns regarding privacy and intellectual property rights. Award-winning artist Karla Ortiz, who has worked for prominent film studios, spoke about the dataset’s controversial content, which includes private medical records, non-consensual pornography, images of children, and even social media pictures of individuals’ faces, during a virtual panel organized by the FTC.

The Latest

AI Agents and the Future of User Interface and User Experience

The Language of Immersive Technology: AR, MR, and VR

tag to contain the h1 title –> The Power of GPT-4 Turbo: The Latest Advancement in AI Language Models

Qualcomm becomes No. 2 Leader in U.S. Patent Grants, Surpassing IBM

AI Dataset Raises Concerns

Leave a Reply Cancel reply

AI Agents and the Future of User Interface and User Experience

The Language of Immersive Technology: AR, MR, and VR

tag to contain the h1 title –> The Power of GPT-4 Turbo: The Latest Advancement in AI Language Models

Qualcomm becomes No. 2 Leader in U.S. Patent Grants, Surpassing IBM

MMGuardian Introduces AI-Powered Smartphone for Kids Focusing on Child Safety

The Rise of AI Wearables: Tab Raises $1.9 Million in Seed Funding

OpenAI’s GPT Store: A Platform for Custom GPTs

OpenAI Announces New ChatGPT Team Subscription Tier

AI Dataset Raises Concerns

Leave a Reply Cancel reply

Related Posts