RAG Web Scraper

A Retrieval-Augmented Generation (RAG) web scraper built with Streamlit, LangChain, and Ollama that allows you to:

Enter any URL to crawl a webpage
Extract content and convert it to markdown
Split text into chunks and create embeddings
Ask questions about the webpage content
Get AI-generated answers based on the relevant context

Features

Simple web interface built with Streamlit
Converts HTML to Markdown for better processing
Uses LangChain with Ollama for local LLM integration
In-memory vector store for quick retrieval
Chat interface for questions and answers
Context-aware responses from the AI

Requirements

Python 3.11+
Ollama running locally with llama3.2 model installed

Installation

# Install dependencies with Poetry
poetry install

# Or with pip
pip install -r requirements.txt

Usage

Make sure Ollama is running locally with the llama3.2 model
Run the application:

poetry run streamlit run main.py

Enter a URL to crawl
Ask questions about the content
Clear chat history and reset index as needed

How It Works

The application loads a webpage and converts HTML to markdown
Text is split into smaller chunks for processing
Chunks are embedded and stored in an in-memory vector store
When you ask a question, the system retrieves relevant chunks
The LLM generates an answer based on the retrieved context

Credits

Created by Sascha Corti (sascha@corti.com)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
mypy.ini		mypy.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Web Scraper

Features

Requirements

Installation

Usage

How It Works

Credits

About

Releases

Packages

Languages

License

TechPreacher/rag_web_scraper

Folders and files

Latest commit

History

Repository files navigation

RAG Web Scraper

Features

Requirements

Installation

Usage

How It Works

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages