A investigation by Knowing Machines and Der SPIEGEL into the LAION-5B dataset, a collection of over 5 billion image-text pairs. These are the data assets produced during the investigation:
- https://knowingmachines.org/models-all-the-way
- https://www.spiegel.de/netzwelt/web/laion-5b-so-entsteht-das-weltbild-einer-kuenstlichen-intelligenz-a-dea0157b-d364-4622-8d77-3a1cb6c4afd1
laion1B-nolang-domains.csv
laion1B-nolang-domains.tld.csv
laion2B-en-domains.csv
laion2B-en-domains.tld.csv
laion2B-multi-domains.csv
laion2B-multi-domains.tld.csv
laion1B-nolang-license-counts.csv
laion1B-nolang-nsfw-counts.csv
laion2B-en-license-counts.csv
laion2B-en-nsfw-counts.csv
laion2B-multi-license-counts.csv
laion2B-multi-nsfw-counts.csv
laion1B-nolang-classified-top-500-tld.csv
laion2B-en-classified-top-500-tld.csv
laion2B-multi-classified-top-500-tld.csv
The Engelberg Center and Knowing Machines do not claim any rights in the data assets included in this repo. Therefore, the CC0 license attached to the repo only applies to the extent that there are new rights in this specific compilation, and to the text of this readme.