Coder is a Python-based tool designed to convert quantitative finance research articles into actionable trading algorithms compatible with QuantConnect. By utilizing Natural Language Processing (NLP) and OpenAI's language models, this tool automates the extraction of trading strategies and risk management techniques from PDF articles, summarizes the findings, and generates ready-to-use QuantConnect Python code with proper syntax highlighting.
- Introduction and limitations
- Features
- Installation
- Usage
- Configuration
- Dependencies
- Contributing
- License
- Acknowledgements
- Contact
This script streamlines the process of transforming quantitative finance research into executable trading algorithms. By automating text extraction, preprocessing, and analysis from PDF documents, the tool facilitates the efficient development of trading strategies within the QuantConnect environment. This automation reduces manual effort, minimizes errors, and accelerates the implementation of complex financial models. Generated code is not warranted error-free. Minor corrections may be required using MIA (built-in QuantConnect AI) or debugging. LLM pair-coding is an evolving field and performance should improve over time.
-
PDF Text Extraction: Utilizes
pdfplumber
to accurately extract text from complex PDF structures. -
Text Preprocessing: Cleans extracted text by removing URLs, headers, footers, and irrelevant content.
-
Heading Detection: Identifies section headings using SpaCy's NLP capabilities for structured content organization.
-
Keyword Analysis: Categorizes sentences into trading signals and risk management based on predefined keywords.
-
Article Summarization: Generates concise summaries of extracted strategies and risk management techniques using OpenAI's GPT-4.
-
QuantConnect Code Generation: Automatically generates QuantConnect-compatible Python algorithms based on the extracted data.
-
GUI Display: Presents the article summary and generated code in separate Tkinter windows with syntax highlighting powered by Pygments.
-
Error Handling & Validation: Validates generated code for syntax errors and refines it automatically if necessary.
-
Python 3.8 or Higher: Ensure Python is installed on your system. Download Python
-
OpenAI API Key: Obtain an API key from OpenAI to enable AI-driven functionalities.
- Clone the Repository
git clone https://github.com/SL-Mar/Article_to_Code.git
cd Article_to_Code
- Create a Virtual Environment
It's recommended to use a virtual environment to manage dependencies.
python -m venv venv
- Activate the Virtual Environment
For macOS/Linux:
source venv/bin/activate
For Windows:
venv\Scripts\activate
- Install Dependencies
pip install -r requirements.txt
- Download spaCy Model
python -m spacy download en_core_web_sm
- Configure Environment Variables
Create a .env file in the root directory and add your OpenAI API key:
echo "OPENAI_API_KEY=your_openai_api_key_here" > .env
echo "LOG_LEVEL=INFO" >> .env
Execute the main script with the path to your PDF article as an argument:
python article_to_code.py path/to/your/article.pdf
- Load PDF: The application processes the specified PDF, extracting and analyzing its content.
- View Summary: A concise summary of the trading strategy and risk management sections is displayed.
- Review Generated Code: The corresponding QuantConnect Python code is showcased with syntax highlighting.
- Copy and Save: Use the provided buttons to copy the summary or code to your clipboard or save the code to a file.
- OPENAI_API_KEY: Your OpenAI API key for accessing GPT-4 functionalities.
- LOG_LEVEL: Set the logging level (e.g., DEBUG, INFO, WARNING, ERROR). Default is INFO.
By default, the application attempts to refine the generated code up to 3 times if syntax errors are detected. You can adjust this by modifying the max_refine_attempts parameter in the ArticleProcessor class.
The project relies on several external libraries. All dependencies are listed in the requirements.txt file.
- pdfplumber
- spaCy
- openai
- python-dotenv
- tkinter
- pygments
Contributions are welcome! Please follow these steps to contribute :
- Fork the Repository
- Create a New Branch
git checkout -b feature/YourFeatureName
- Commit Your Changes
git commit -m "Add feature: YourFeatureName"
- Push to the Branch
git push origin feature/YourFeatureName
- Open a Pull Request
Provide a clear description of the changes and the problem they address.
This project is licensed under the MIT License. You are free to use, modify, and distribute this software. See the LICENSE file for more details.
- QuantConnect as the backtest and trading platform
- pdfplumber for efficient PDF text extraction.
- spaCy for powerful natural language processing capabilities.
- OpenAI for providing the GPT-4 model used in AI-driven summarization and code generation.
- Tkinter for the graphical user interface framework.
- Pygments for syntax highlighting in the GUI.