Releases: hczhu/TickerTick-API
TickerTick stock news dataset 2025-05-26
Use the following link to download the dataset: Dataset Download Link
The dataset has close to 16 million news stories. The dataset file has each stock news story as a line in JSON format in reverse chronological order. An example news story in prettified multi-line JSON format is shown below:
{
"title": "Tech giants Nvidia, OpenAI and others join forces for massive UAE Stargate AI data center",
"url": "https://qz.com/american-tech-partners-with-uae-for-new-ai-data-center-1851781991",
"unix_timestamp": 1747936200,
"id": "-3744939139222479336",
"tickers_direct": [
".openai",
"orcl",
"nvda"
],
"tickers_indirect": [
"csco"
],
"description": "A group of global tech giants gathered in Abu Dhabi to pose for a photo as anAI supergroup, including OpenAI's Sam Altman, Oracle's (ORCL) Larry Ellison, Nvidia's (NVDA) Jensen Huang, and Chuck Robbins of Cisco (CSCO), along with their new UAE partners. Read more..."
}
The fields of the JSON blob are explained below. Most of the fields have the same semantics as the ones in the response of TickerTick API.
Field name | Meaning | Optional field? (If yes, this field can be missing) |
---|---|---|
title | The title of this news story | No |
url | The original URL for the full news story | No |
unix_timestamp | The UNIX timestamp when the news was reported | No |
id | A unique string ID of this news story | No |
description | A short description of this news story | Yes |
tickers_direct | The tickers that the news story is directly about, e.g., the name of the company for the ticker is mentioned | Yes |
tickers_indirect | The tickers that the news story is indirectly about, e.g., the CEO or a product of the company for this ticker is mentioned | Yes |
Note that many well-known pre-IPO startups (e.g., Bytedance, the parent company of TikTok) have made-up tickers like .openai
and .databricks
.
TickerTick stock news dataset 2023-11-23
Use the following link to download the dataset: Dataset Download Link
The dataset has close to 8 million news stories. The dataset file has each stock news story as a line in JSON format in reverse chronological order. An example news story in prettified multi-line JSON format is shown below:
{
"title": "Europe gives Meta, TikTok six days to share information on response to Israel-Hamas conflict",
"url": "https://www.cnbc.com/2023/10/19/israel-hamas-eu-gives-meta-tiktok-six-days-to-provide-information.html",
"unix_timestamp": 1697727889,
"id": "3341850707742811898",
"tickers_direct": [
"meta",
"fb"
],
"tickers_indirect": [
".bytedance"
],
"description": "The EU said it would like Meta and TikTok to hand over information on how they're tackling misinformation about the Israel-Hamas war."
}
The fields of the JSON blob are explained below. Most of the fields have the same semantics as the ones in the response of TickerTick API.
Field name | Meaning | Optional field? (If yes, this field can be missing) |
---|---|---|
title | The title of this news story | No |
url | The original URL for the full news story | No |
unix_timestamp | The UNIX timestamp when the news was reported | No |
id | A unique string ID of this news story | No |
description | A short description of this news story | Yes |
tickers_direct | The tickers that the news story is directly about, e.g., the name of the company for the ticker is mentioned | Yes |
tickers_indirect | The tickers that the news story is indirectly about, e.g., the CEO or a product of the company for this ticker is mentioned | Yes |
Note that many well-known pre-IPO startups (e.g., Bytedance, the parent company of TikTok) have made-up tickers like .bytedance
and .databricks
.