A powerful command-line interface (CLI) tool designed to quickly scan a project directory, generate a clean, structured report of its contents (folder tree + text file content), and optionally pass this report to an LLM for analysis, rendering the result in a local web page.
- Project Structure: Generates a visual tree representation of the project directory.
- Text File Contents: Includes the full content of identifiable text files within the project.
- Intelligent Filtering: Automatically ignores common directories (
node_modules
,.git
,dist
, build/cache folders, virtual environments, etc.) and specific noisy files (package-lock.json
,.env
, lock files, etc.). - Binary/Non-Text Exclusion: Skips binary files, images, archives, media, and other non-text formats (unless specific parsers are available, like for PDF and Word documents).
- PDF Scanning: Extracts text from PDF files using the
pdf-parse
Node.js library. - Word Document Scanning (.docx): Extracts text from modern Microsoft Word documents (
.docx
) using themammoth
library. - YouTube Transcript Fetching: Automatically detects YouTube links in
.txt
files, fetches the video transcript (without timestamps), and includes it in the summary directly after the link. - Optional LLM Integration: Pass the generated summary directly to an OpenAI-compatible LLM API for automated analysis using the
--llm
flag. - Customizable Prompting: Use a template file (
--prompt
) to control the instructions given to the LLM, injecting the project summary using a special tag ({{SUMMARY}}
). - Configurable LLM Settings: Easily adjust the LLM
model
andtemperature
via command-line options. - Secure API Key Handling: Loads your OpenAI API key securely from a
.env
file. - Rich Web Rendering: When using LLM integration, the Markdown response from the model is beautifully rendered in a local web page.
- Automatic Browser Opening: The generated web page is automatically opened in your default browser.
- Clipboard Integration: Copies the generated report to your clipboard (default behavior when not using
--llm
, or explicitly with--copy
). - Modular Design: New functionalities (LLM processing, web rendering) are kept in separate files for better organization.
When working with large language models (LLMs) for tasks like code explanation, refactoring, debugging, or generating documentation, providing the necessary context about your codebase is crucial. Copying individual files and explaining the structure manually is tedious and often incomplete.
This tool simplifies that process significantly:
- Comprehensive Context: The generated report gives the LLM (or a human reviewer) both the "map" (folder structure) and the "details" (file contents) in one place.
- Reduced Noise: By intelligently ignoring irrelevant files and directories, the report focuses only on the relevant parts, reducing token usage for LLMs and improving context clarity.
- Structured Format: The output is formatted with clear separators, making it easier for models (and humans) to parse.
- Direct LLM Integration: The
--llm
flag automates the process of sending this context to an LLM, bypassing manual copy/paste steps and immediately providing the LLM's analysis in an easy-to-read format. - Customizable Workflow: Tailor the LLM's task using a specific prompt template.
It's also highly useful for:
- Onboarding new team members by quickly sharing a project overview.
- Generating documentation outlines.
- Getting a bird's-eye view of an unfamiliar codebase.
- Preparing code for sharing or review.
This tool is a Node.js CLI application. You will need Node.js installed on your system to run it.
-
Install Node.js: If you don't have Node.js installed, download and install the recommended version (or v18.0.0 or later) from the official website: nodejs.org. We recommend Node.js v18 or later for compatibility with newer features and libraries.
You can verify your installation by opening a terminal and running:
node -v npm -v
Make sure the Node.js version is 18.0.0 or higher.
-
Install the CLI Tool: Once Node.js and npm (Node Package Manager) are installed, you can install the package globally. This makes the
summarize
command available in your terminal from any directory.Option A: Install from local directory (recommended): If you have cloned or downloaded this repository, navigate to the project directory and run:
# First, install dependencies npm install # Then, install the package globally npm install -g . # OR use npm link for development npm link
Option B: Install from npm registry: If the package has been published to npm (not available yet), you can install it directly:
npm install -g summarize-code-base
Note for macOS/Linux users: You might need to use
sudo
if you encounter permission errors:sudo npm install -g . # OR sudo npm link
Troubleshooting: If you encounter a "Cannot find module" error when running the
summarize
command, make sure you've installed the dependencies first withnpm install
before installing the package globally. -
Setup for LLM Integration (Optional): If you plan to use the
--llm
functionality with OpenAI, you need an API key.- Get your OpenAI API key from the OpenAI Platform API Keys page.
- In the directory where you installed the
summarize-code-base
code (if you cloned it), or in your project's root directory where you might run the command from, create a file named.env
. - Add your API key to this file like this:
Important: Replace
OPENAI_API_KEY=YOUR_ACTUAL_OPENAI_API_KEY_HERE
YOUR_ACTUAL_OPENAI_API_KEY_HERE
with your actual secret key. - Security: Ensure you do not commit your
.env
file to version control (e.g., add.env
to your.gitignore
).
Navigate to the directory you want to summarize, or run the command specifying the target directory path.
The basic command requires the path to the project directory:
# Summarize the current directory (default behavior: console output + clipboard)
summarize .
# Summarize a different directory (default behavior: console output + clipboard)
summarize /path/to/your/project
When not using the --llm
flag (the default), the generated report is printed to your console and automatically copied to your clipboard.
Project Code Summarizer for 'my-project' starts...
--- Section 1: Folder Structure ---
my-project
βββ documents
β βββ report.docx
β βββ manual.pdf
βββ mobile_app
β βββ main_view.swift
βββ notes.txt
βββ package-lock.jsonΒ # Ignored file name, but shown in structure
βββ package.json
βββ project_summary.js
βββ scripts
β βββ data_processor.py
βββ src
Β Β βββ components
Β Β βΒ Β βββ Button.js
Β Β βββ utils
Β Β Β Β βββ helpers.js
--- Section 2: File Contents (8 files) ---
--- File: documents/report.docx ---
This is the content of the Word document.
It might contain various text elements.
--- End of File: documents/report.docx ---
--- File: documents/manual.pdf ---
This is the extracted text from the PDF.
PDFs can have complex layouts, but we get the text.
--- End of File: documents/manual.pdf ---
--- File: mobile_app/main_view.swift ---
import SwiftUI
struct MainView: View {
var body: some View {
Text("Hello, Swift!")
}
}
--- End of File: mobile_app/main_view.swift ---
--- File: notes.txt ---
This is a simple text file.
It contains some notes for the project.
--- End of File: notes.txt ---
--- File: package.json ---
{
Β "name": "my-project",
Β "version": "1.0.0",
Β "description": "An example project",
Β ...
}
--- End of File: package.json ---
--- File: project_summary.js ---
#!/usr/bin/env node
const fs = require('fs').promises;
...
--- End of File: project_summary.js ---
--- File: scripts/data_processor.py ---
def process_data(data):
# Imagine complex data processing here
return data.upper()
print(process_data("sample input"))
--- End of File: scripts/data_processor.py ---
--- File: src/components/Button.js ---
import React from 'react';
const Button = ({ children }) => {
Β return <button>{children}</button>;
};
export default Button;
--- End of File: src/components/Button.js ---
Project Code Summarizer for 'my-project' ends.
β
Summary copied to clipboard!
Use the --llm
flag to send the summary to the LLM for analysis and render the response in a browser.
# Summarize current directory and send to LLM (requires .env with OPENAI_API_KEY)
summarize . --llm
# Summarize a different directory and send to LLM
summarize /path/to/your/project --llm
When using --llm
:
- The extensive summary is not printed to the console.
- The summary is injected into the prompt template.
- The prompt is sent to the OpenAI API.
- The LLM's response (expected Markdown) is converted to HTML.
- A simple local web server starts temporarily.
- The HTML report is automatically opened in your default browser.
- The raw summary is not copied to the clipboard by default (use
--copy
to force it).
Here is an example of applying this project summarizer on this Github repo (recursive???) and display the summary website with diagrams and tables:
You can customize the LLM processing using the following options with the --llm
flag:
--prompt <path>
(Alias:-p
): Specify the path to a custom prompt template file. Defaults toprompt_template.txt
in the current directory. The template should contain the placeholder{{SUMMARY}}
where the project summary will be injected.summarize . --llm --prompt ./my-prompts/analysis-template.txt
--model <model_name>
(Alias:-m
): Specify the OpenAI model to use. Defaults togpt-4o
(orgpt-3.5-turbo
ifgpt-4o
is not available or preferred).summarize . --llm --model gpt-3.5-turbo
--temperature <value>
(Alias:-t
): Set the temperature for the LLM response (a number between 0.0 and 2.0). Defaults to0.7
.summarize . --llm --temperature 1.0
--copy
(Alias:-c
): Force copying the raw generated summary to the clipboard even when using the--llm
flag. By default,--copy
is true when--llm
is false, and false when--llm
is true.summarize . --llm --copy # Use LLM AND copy the raw summary to clipboard summarize . --no-copy # Don't copy to clipboard (only print to console)
You can combine these options:
summarize /path/to/project --llm --model gpt-4o --temperature 0.5 --prompt my_template.txt --copy
- Entry Point (
index.js
): This is the main script executed. It usesyargs
to parse all command-line arguments (directory
,--llm
,--prompt
, etc.). It also loads environment variables from.env
usingdotenv
. - Summary Generation (
project_summary.js
): Theindex.js
script calls thegenerateProjectSummary
function fromproject_summary.js
. This function traverses the specified directory, applies the ignore rules, collects text file content, and formats the output into a single large summary string. This function returns the string but does not print or copy it itself anymore. - Conditional Output: Based on the presence of the
--llm
flag:- If
--llm
is NOT used: Theindex.js
script prints the generated summary string to the console and, ifclipboardy
is available and--copy
is enabled, copies it to the clipboard (replicating the original behavior). - If
--llm
IS used:index.js
retrieves theOPENAI_API_KEY
from environment variables.index.js
calls theprocessWithLLM
function fromllm_processor.js
, passing the summary string and the LLM configuration options (prompt path, model, temperature, API key).- LLM Processing (
llm_processor.js
): This module reads the specified prompt template, replaces the{{SUMMARY}}
placeholder with the generated summary, initializes the OpenAI client, makes a request to the OpenAI API, and returns the LLM's text response. index.js
receives the LLM response and calls therenderAndServe
function fromweb_renderer.js
.- Web Rendering (
web_renderer.js
): This module takes the LLM's Markdown response, converts it into HTML usingmarked
, embeds it in a simple HTML template with basic styling, starts a temporary local HTTP server to serve this HTML content on an available port, and uses theopen
package to automatically open the server's URL in the user's default browser.
- If
The tool comes with built-in lists of common directories and files to ignore, defined within project_summary.js
. These are designed to focus the summary on relevant codebase files.
IGNORED_DIRS
: Contains directories likenode_modules
,.git
,dist
, build/cache folders for various languages/frameworks, virtual environments (venv
,env
), etc.IGNORED_FILES
: Contains specific file names likepackage-lock.json
,.env
,.env.local
, and various lock files (poetry.lock
,yarn.lock
,composer.lock
, etc.).NON_TEXT_EXTENSIONS
: Contains file extensions for binary files, images, archives, media, databases, fonts, etc. (Note: PDF files are processed using thepdf-parse
library).
These lists are quite comprehensive and cover many typical project setups.
Note: Currently, the tool does not support custom ignore patterns via command-line arguments or configuration files. The built-in lists are used.
Contributions are welcome! If you have suggestions for improvements, bug fixes, or want to add more file/directory patterns to the ignore lists, feel free to open an issue or submit a pull request.
- Fork the repository.
- Clone your fork:
git clone https://github.com/TomHuynhSG/code-base-summarizer-llm.git
- Install dependencies:
npm install
- Link the package for local testing:
npm link
(You can now usesummarize
in your terminal from any directory, pointing to your local code) - Make your changes.
- Test thoroughly.
- Commit your changes and push to your fork.
- Create a pull request to the original repository.
The tool processes PDF files using the pdf-parse Node.js library. This library is included as a project dependency and installed automatically when you run npm install
. No separate Python or external tool installation is required for PDF handling.
When a PDF file is encountered, pdf-parse
attempts to extract its text content. The extracted text is included in the summary report.
The tool supports scanning modern Microsoft Word documents (.docx
files). Text is extracted using the mammoth Node.js library. This library is included as a project dependency and installed automatically when you run npm install
. No separate external tool installation is required for .docx
handling.
Legacy .doc
files are no longer supported.
Here are some planned features and potential future directions for the summarize-code-base
tool:
- Custom Ignore Patterns: Allow users to specify additional files, directories, or patterns to ignore via command-line arguments or a configuration file (e.g.,
.summarizerc
). - Support for Other LLMs/APIs: Extend LLM integration to support models from providers other than OpenAI.
- Multiple Output Formats: Add options to output the summary or LLM response in different formats (e.g., JSON, pure Markdown file).
- Output to File: Implement an option to save the generated report or LLM response directly to a specified file.
- Integrate with Local LLM (Concept): Explore integration with local Large Language Models (LLMs).
- Enhance PDF Processing: Add more options for PDF processing, such as controlling the level of detail or focusing on specific parts of PDFs.
- Integrate with Vision LLM for Images (Concept): Investigate using local Vision-Language Models (VLMs) to analyze image files (currently ignored) and generate text descriptions.
- Progress Indicator: For large projects, add a visual indicator to show the scanning progress.
- Huynh Nguyen Minh Thong (Tom Huynh) - tomhuynhsg@gmail.com