The Pushshift Reddit Dataset
- 1. Pushshift.io
- 2. Max Planck Institute
- 3. University of Colorado Boulder
- 4. Elon University
- 5. Binghamton University
Description
The Pushshift Reddit Dataset
We provide a small sample of the Pushshift Reddit dataset. The sample consists of two files:
RS_2019-04.zst: All Reddit submissions that were posted during April 2019.
RC_2019-04.zst: All Reddit comments that were posted during April 2019.
The full dataset can be downloaded from: https://files.pushshift.io/reddit/submissions/ for submissions and https://files.pushshift.io/reddit/comments/ for comments. In the website, you can find a file for each month of our data collection. Each file is a newline delimited json (ndjson) file , where each line contains the json object of a submission or a comment.
Files
Files
(21.1 GB)
Name | Size | Download all |
---|---|---|
md5:5651d5fc9ab9577a56be33e8f52c2bdf
|
15.5 GB | Download |
md5:e24ecb20e08751f0bf3b9189860d7ac9
|
5.6 GB | Download |