AI News Aggregator

A modular TypeScript-based news aggregator that collects, enriches, and analyzes AI-related content from multiple sources.

Features

Multiple Data Sources
- Discord channel raw message data (including users, reactions)
- GitHub repository statistics
- Solana token analytics (Codex)
- CoinGecko market data
- (Twitter support may require maintenance due to API changes)
Processing & Analysis
- AI-powered structured summaries of Discord channel activity (using OpenAI/OpenRouter)
- Raw data export
- Topic extraction (optional, configurable)
Storage & Deployment
- SQLite database for persistent storage (with optional encryption via GitHub Actions)
- Daily summary generation (JSON & Markdown)
- Deployment to GitHub Pages

Prerequisites

Node.js ≥ 18 (v23 recommended based on workflows)
TypeScript
SQLite3 (Command-line tool needed for integrity checks in workflows)
npm

Installation

# Clone the repository
git clone https://github.com/yourusername/ai-news.git

# Install dependencies
cd ai-news
npm install

# Create .env file and add your credentials for local runs
cp example.env .env

Configuration

Local runs use an .env file. GitHub Actions workflows use repository secrets.

Local `.env` File

Create a .env file in the project root:

# OpenAI / OpenRouter
OPENAI_API_KEY=           # Your OpenRouter API key (or OpenAI if not using OpenRouter)
# OPENAI_DIRECT_KEY=        # Optional: Direct OpenAI key if needed for specific features
USE_OPENROUTER=true      # Set to true to use OpenRouter
SITE_URL=your_site.com    # Your site URL for OpenRouter attribution
SITE_NAME=YourAppName     # Your app name for OpenRouter attribution

# Discord
DISCORD_TOKEN=            # Your Discord Bot Token
DISCORD_GUILD_ID=         # The ID of the Discord server you are monitoring
# DISCORD_APP_ID=          # Likely not needed unless using slash commands

# Crypto Analytics
CODEX_API_KEY=            # Your Codex API key

# Optional: Twitter (Requires careful cookie handling)
# TWITTER_USERNAME=
# TWITTER_PASSWORD=
# TWITTER_EMAIL=
# TWITTER_COOKIES='[]' # JSON string of cookies

GitHub Actions Secrets (`ENV_SECRETS`)

For running via GitHub Actions, create a single repository secret named ENV_SECRETS containing a JSON object with your credentials. You also need a secret named SQLITE_ENCRYPTION_KEY for database encryption.

Navigate to your GitHub repository.
Go to "Settings" > "Secrets and variables" > "Actions".
Click "New repository secret".
Name: ENV_SECRETS. Value: Copy and paste the JSON below, filling in your values.
Click "New repository secret" again.
Name: SQLITE_ENCRYPTION_KEY. Value: Enter a strong password for encrypting the database.

ENV_SECRETS JSON Structure:

{
  "OPENAI_API_KEY": "sk-or-....",
  "USE_OPENROUTER": "true",
  "SITE_URL": "your_site.com",
  "SITE_NAME": "YourAppName",
  "DISCORD_APP_ID": "YOUR_DISCORD_APP_ID",
  "DISCORD_TOKEN": "YOUR_DISCORD_BOT_TOKEN",
  "DISCORD_GUILD_ID": "YOUR_DISCORD_SERVER_ID",
  "CODEX_API_KEY": "YOUR_CODEX_KEY",
  "TWITTER_USERNAME": "",
  "TWITTER_PASSWORD": "",
  "TWITTER_EMAIL": "",
  "TWITTER_COOKIES": "[]"
}

Running the Application

Configuration files (e.g., discord-raw.json, elizaos-dev.json) define which sources and generators run.

# Build the project
npm run build

# Run the main continuous process (using config/sources.json by default)
npm start

# Run using a specific config file
npm start -- --source=discord-raw.json

# --- Historical Data Fetching & Processing --- #

# Run historical script using a specific config (processes yesterday by default)
npm run historical -- --source=discord-raw.json --output=./output/discord

# Run historical for a specific date
npm run historical -- --source=elizaos-dev.json --date=2024-01-15 --output=./output/elizaos-dev

# Run historical for a date range
npm run historical -- --source=hyperfy-discord.json --after=2024-01-10 --before=2024-01-16 --output=./output/hyperfy

# Run historical for dates after a specific date
npm run historical -- --source=discord-raw.json --after=2024-01-15 --output=./output/discord

# Run historical for dates before a specific date
npm run historical -- --source=discord-raw.json --before=2024-01-10 --output=./output/discord

# Run historical with specific Twitter fetch mode
npm run historical -- --source=elizaos.json --date=2025-04-26 --fetchMode=timeline
# (--fetchMode can be 'search' or 'timeline'. 'search' is default for historical runs; 'timeline' is better for retweets but slower.)

Project Structure

config/                 # JSON configuration files for different pipelines
data/                   # SQLite databases (encrypted in repo, decrypted by Actions)
output/                 # Generated raw data exports and summaries
src/
├── aggregator/         # Core aggregation logic (ContentAggregator, HistoricalAggregator)
├── plugins/
│   ├── ai/             # AI provider implementations (OpenAIProvider)
│   ├── enrichers/      # Content enrichment plugins (e.g., AiTopicsEnricher - optional)
│   ├── generators/     # Output generation (RawDataExporter, DiscordSummaryGenerator)
│   ├── sources/        # Data source implementations (DiscordRawDataSource, etc.)
│   └── storage/        # Database storage handlers (SQLiteStorage)
├── helpers/            # Utility functions (config, date, files, etc.)
├── types.ts            # TypeScript type definitions
├── index.ts            # Main application entry point (continuous)
└── historical.ts       # Entry point for historical data processing
# ... other config and project files

Twitter Data Fetching Notes

When fetching historical Twitter data using npm run historical, you can specify a fetch mode using the --fetchMode flag:

--fetchMode=search (Default for historical script):
- Uses Twitter's search API for the specified date(s).
- Generally faster and more efficient for fetching tweets from a precise date or range.
- May be less reliable for comprehensively capturing all retweets or all activities for some accounts.
- Recommended if your primary goal is original tweets from specific dates and speed is a priority.
--fetchMode=timeline:
- Scans user timelines by fetching recent tweets and then filters by date.
- More comprehensive for capturing all tweet types, including retweets.
- Can be slower as it might process more tweets than strictly necessary for the target date, especially for very active users.
- Recommended for initial large historical data fetches where completeness of retweets is important, or if the search mode is not yielding desired results for specific accounts/dates.

For continuous operation (npm start), TwitterSource defaults to the timeline mode to ensure better capture of all tweet types, including retweets, over time.

Adding New Sources

Create a new class in src/plugins/sources/ that implements ContentSource (and potentially fetchHistorical).
Define necessary parameters and logic within the class.
Add a configuration block for your new source in the relevant JSON config file(s) under the sources array.

Example ContentSource Interface:

import { ContentItem } from "../../types";

export interface ContentSource {
  name: string;
  fetchItems(): Promise<ContentItem[]>;
  fetchHistorical?(date: string): Promise<ContentItem[]>;
  // Other methods like init() if needed
}

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

MIT License

Core Data Structures

ContentItem

Represents a unit of data stored in the items table. The exact content varies by type.

interface ContentItem {
  id?: number;          // Assigned by storage
  cid: string;          // Unique Content ID from source, or generated
  type: string;         // e.g., "discordRawData", "codexAnalyticsData", etc.
  source: string;       // Name of the source plugin instance (e.g., "hyperfyDiscordRaw")
  title?: string;       // Optional title
  text?: string;        // Main content (e.g., JSON string for discordRawData)
  link?: string;        // URL to original content (if applicable)
  topics?: string[];    // AI-generated topics (if enricher is used)
  date?: number;        // Creation/publication timestamp (epoch seconds)
  metadata?: Record<string, any>; // Additional source-specific data (e.g., channelId, guildName)
}

DiscordRawData (Stored as JSON string in `ContentItem.text` for `type: 'discordRawData'`)

interface DiscordRawData {
  channel: {
    id: string;
    name: string;
    topic: string | null;
    category: string | null;
  };
  date: string; // ISO date string for the day fetched
  users: { [userId: string]: { name: string; nickname: string | null; roles?: string[]; isBot?: boolean; } };
  messages: { /* ... message details ... */ }[];
}

SummaryItem (Stored in `summary` table)

Represents a generated summary.

interface SummaryItem {
  id?: number;          // Assigned by storage
  type: string;         // e.g., "hyperfyDiscordSummary", "elizaosDevSummary"
  title?: string;       // e.g., "Hyperfy Discord - 2024-01-15"
  categories?: string;  // JSON string containing detailed stats and channel summaries
  markdown?: string;    // Full Markdown content of the summary
  date?: number;        // Timestamp for the summary period (epoch seconds)
}

Example Summary JSON Output (`YYYY-MM-DD.json`)

This structure is derived from the SummaryItem.categories field.

{
  "server": "Server Name",
  "title": "Server Name Discord - YYYY-MM-DD",
  "date": 1705363200, // Example epoch timestamp
  "stats": {
    "totalMessages": 150,
    "totalUsers": 25
  },
  "categories": [
    {
      "channelId": "12345",
      "channelName": "general",
      "summary": "Brief AI summary of the general channel...",
      "messageCount": 100,
      "userCount": 20
    },
    {
      "channelId": "67890",
      "channelName": "development",
      "summary": "Brief AI summary of the development channel...",
      "messageCount": 50,
      "userCount": 15
    }
    // ... more channels
  ]
}

Supported Source Types (Examples)

Discord (`DiscordRawDataSource`)

Fetches raw messages, user details, reactions, edits, replies for specified channels daily.
Data is stored as discordRawData items.
Subsequent generators (DiscordSummaryGenerator, RawDataExporter) process these items.

GitHub Stats (`GitHubStatsSource`)

Fetches repository statistics (issues, PRs, commits, contributors).
Stores data as specific ContentItem types.

Cryptocurrency Analytics (`CodexAnalyticsSource`)

Fetches token data (price, volume, etc.) from Codex.so.
Stores data as codexAnalyticsData items.

Scheduled Tasks (GitHub Actions)

The application uses GitHub Actions workflows (.github/workflows/) for scheduled data fetching and processing. Examples:

discord-raw.yml: Fetches raw Discord data, generates summaries, exports raw data, deploys encrypted DB and outputs to GitHub Pages.
elizaos-dev.yml: Similar process for ElizaOS Dev Discord data.
hyperfy.yml: Similar process for Hyperfy Discord data.
Schedules typically run daily.

Storage

The application uses SQLite. Databases are encrypted in the data/ directory when stored in the repository / gh-pages branch and decrypted during workflow runs.

`items` Table

Stores fetched content from various sources.

CREATE TABLE IF NOT EXISTS items (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  cid TEXT UNIQUE,          -- Content ID (can be null initially)
  type TEXT NOT NULL,     -- Type of content (e.g., discordRawData)
  source TEXT NOT NULL,   -- Name of the source instance
  title TEXT,
  text TEXT,              -- Main content (often JSON for raw data)
  link TEXT,
  topics TEXT,            -- JSON array of strings
  date INTEGER,           -- Epoch timestamp (seconds)
  metadata TEXT           -- JSON object for extra info
);

`summary` Table

Stores generated summaries.

CREATE TABLE IF NOT EXISTS summary (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  type TEXT NOT NULL,     -- Type of summary (e.g., elizaosDevSummary)
  title TEXT,
  categories TEXT,        -- JSON object with detailed structure
  markdown TEXT,          -- Full markdown content
  date INTEGER            -- Epoch timestamp (seconds) for the summary period
);

`cursor` Table

Stores the last processed message ID for certain sources.

CREATE TABLE IF NOT EXISTS cursor (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  cid TEXT NOT NULL UNIQUE, -- Key identifying the source/channel (e.g., "discordRaw-12345")
  message_id TEXT NOT NULL  -- Last fetched Discord message snowflake ID
);

Name		Name	Last commit message	Last commit date
Latest commit History 1,122 Commits
.cursor/rules		.cursor/rules
.github/workflows		.github/workflows
autodoc		autodoc
config		config
data		data
html		html
json		json
md		md
src		src
.gitignore		.gitignore
README.md		README.md
example.env		example.env
package.json		package.json
tsconfig.json		tsconfig.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI News Aggregator

Features

Prerequisites

Installation

Configuration

Local `.env` File

GitHub Actions Secrets (`ENV_SECRETS`)

Running the Application

Project Structure

Twitter Data Fetching Notes

Adding New Sources

Contributing

License

Core Data Structures

ContentItem

DiscordRawData (Stored as JSON string in `ContentItem.text` for `type: 'discordRawData'`)

SummaryItem (Stored in `summary` table)

Example Summary JSON Output (`YYYY-MM-DD.json`)

Supported Source Types (Examples)

Discord (`DiscordRawDataSource`)

GitHub Stats (`GitHubStatsSource`)

Cryptocurrency Analytics (`CodexAnalyticsSource`)

Scheduled Tasks (GitHub Actions)

Storage

`items` Table

`summary` Table

`cursor` Table

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

bozp-pzob/ai-news

Folders and files

Latest commit

History

Repository files navigation

AI News Aggregator

Features

Prerequisites

Installation

Configuration

Local .env File

GitHub Actions Secrets (ENV_SECRETS)

Running the Application

Project Structure

Twitter Data Fetching Notes

Adding New Sources

Contributing

License

Core Data Structures

ContentItem

DiscordRawData (Stored as JSON string in ContentItem.text for type: 'discordRawData')

SummaryItem (Stored in summary table)

Example Summary JSON Output (YYYY-MM-DD.json)

Supported Source Types (Examples)

Discord (DiscordRawDataSource)

GitHub Stats (GitHubStatsSource)

Cryptocurrency Analytics (CodexAnalyticsSource)

Scheduled Tasks (GitHub Actions)

Storage

items Table

summary Table

cursor Table

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Local `.env` File

GitHub Actions Secrets (`ENV_SECRETS`)

DiscordRawData (Stored as JSON string in `ContentItem.text` for `type: 'discordRawData'`)

SummaryItem (Stored in `summary` table)

Example Summary JSON Output (`YYYY-MM-DD.json`)

Discord (`DiscordRawDataSource`)

GitHub Stats (`GitHubStatsSource`)

Cryptocurrency Analytics (`CodexAnalyticsSource`)

`items` Table

`summary` Table

`cursor` Table

Packages