8000 GitHub - Rishi-Jain2602/AskSmart: AskSmart: A RAG-Powered Intelligent Query System
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

AskSmart: A RAG-Powered Intelligent Query System

Notifications You must be signed in to change notification settings

Rishi-Jain2602/AskSmart

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 

Repository files navigation

AskSmart: A RAG-Powered Intelligent Query System

AskSmart is a powerful document retrieval system that allows you to upload and process various formats such as PDF, DOCX, JSON, and TXT. Our advanced AI technology retrieves relevant information and generates context-aware responses to your queries.

Supported Formats: PDF, DOCX, JSON, TXT


Local Environment Setup

  1. Clone the Repository
git clone https://github.com/Rishi-Jain2602/AskSmart.git
  1. Create Virtual Environment
cd backend
virtualenv venv
venv\Scripts\activate  # On Windows
source venv/bin/activate  # On macOS/Linux
  1. Install the Project dependencies
  • 3.1 Navigate to the Backend Directory and install Python dependencies:
cd backend
pip install -r requirements.txt
  • 3.2 Navigate to the Frontend Directory and install Node.js dependencies:
cd frontend
npm install
  1. Run the React App

Start the React app with the following command:

cd frontend
npm start
  1. Run the Backend (FastAPI App)

Open a new terminal and run the backend:

cd backend
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
  • The server will be running at http://127.0.0.1:8000.

Ingestion Pipeline

This code is responsible for processing and storing documents that are uploaded by users, preparing them for retrieval and generation of context-based responses. Here's how the pipeline works:

  1. File Upload and Processing Based on Format:: When a user uploads a document (PDF, DOCX, JSON, or TXT), the system processes it using appropriate loaders based on the file type. Here’s the logic for handling different formats:

    if file_path.endswith(".pdf"):
         loader = PyPDFLoader(file)
     elif file_path.endswith(".docx"):
         loader = Docx2txtLoader(file)
     elif file_path.endswith(".json"):
         loader = JSONLoader(file, jq_schema=".", text_content=False, json_lines=False)
     elif file_path.endswith(".txt"):
         loader = TextLoader(file)
     else:
         raise ValueError("Unsupported file format. Please upload a PDF, DOCX, JSON, or TXT file.")
     
     # Load the document content
     documents = loader.load()
  2. Document Processing: After uploading, the file is processed using specific loaders based on its file type (e.g., PyPDFLoader for PDFs, Docx2txtLoader for DOCX). The document content is split into smaller text chunks to enable more efficient querying later.

    text_splitter = CharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
    docs = text_splitter.split_documents(documents)
  3. Handling Duplicate File Names : The system automatically checks if a collection with the same name exists. If it does, the old data is deleted, and the new data is updated. This prevents conflicts when uploading a new file with the same name.

     if client.collections.exists(valid_name):
         client.collections.delete(valid_name)
  4. Data Storage: These chunks are then added to a collection in Weaviate, with relevant metadata such as the document title, chunk index, and document ID.

    data_properties = {
        "Document_title": str(doct_name),
        "chunk": str(chunk),
        "chunk_index": i,
        "DoctID": str(docuemtID)
    }
    data_object = wvc.data.DataObject(properties=data_properties)
    chunks_list.append(data_object)
  5. Chunk Insertion: The chunks are inserted into Weaviate in batches to optimize performance.

    for i in range(0, len(chunks_list), batch_size):
        batch = chunks_list[i:i + batch_size]
        chunks.data.insert_many(batch)

API Endpoint Implementation

  1. Document Upload API:

    • Endpoint: POST https://asksmart.onrender.com/Rag/upload
    • Description: This API allows you to upload documents in formats like PDF, DOCX, JSON, and TXT. The document is processed and stored in a Weaviate collection.

    Curl Command:

    curl -X POST "https://asksmart.onrender.com/Rag/upload" -F "file=@/path/to/your/file.pdf"
    • Replace /path/to/your/file.pdf with the actual path of the document.
    • Response:
    {
      "doct_id": "Generated_collection_name",
      "filename": "yourfile.pdf",
      "message": "Upload and processing complete"
    }
  2. Chatting API:

    • Endpoint: POST https://asksmart.onrender.com/Rag/chat
    • Description: Query the uploaded document to get context-based responses.

    Curl Command:

    curl -X POST "https://asksmart.onrender.com/Rag/chat" \
      -H "Content-Type: application/json" \
      -d '{ 
            "user_id": "SESSION_ID_VALUE", 
            "query": "CURRENT_INPUT_VALUE", 
            "doctID": "STORED_DOC_ID_VALUE" 
          }'
    • Replace SESSION_ID_VALUE, CURRENT_INPUT_VALUE, and STORED_DOC_ID_VALUE with actual values.
    • Response:
    {
      "response": "Generated response based on document"
    }

Backend Deployment on Render:

  • The backend of this project is deployed on Render.

  • You can automate deployment with a render.yaml file to define the environment settings for Render or follow these manual steps:

    • Create a New Web Service:

      • Log into Render and create a new web service for the backend.
      • Connect it to your GitHub repository.
    • Build Command:

      • In the service settings, set the build co 7262 mmand to install dependencies and run the FastAPI server:
        cd backend
        pip install -r requirements.txt
        uvicorn main:app --host 0.0.0.0 --port 8000
    • Environment Variables:

      • Add the necessary environment variables for Weaviate, OpenAI, and other integrations under the "Environment" tab.
    • Deploy:

      • Once configured, deploy the service and obtain the live URL for your backend (e.g., https://asksmart.onrender.com).

Frontend Deployment on Vercel:

  • The frontend of this project is deployed on Vercel.

  • You can automate deployment via Vercel or follow these manual steps:

    • Create a New Project:

      • Log into Vercel and create a new project.
      • Select your GitHub repository and choose the frontend folder to deploy as a separate Vercel project.
    • Build Command:

      • In the project settings, set the build command to install dependencies and build the React app:
        cd frontend
        npm install
        npm run build
    • Output Directory:

      • Ensure that the output directory is set to build (as Vercel automatically looks for this folder in React projects).
    • Environment Variables:

      • Set any necessary environment variables, such as API URLs pointing to the backend (e.g., the Render URL).
    • Deploy:

      • Deploy the frontend, and Vercel will provide a live URL (e.g., https://asksmart-frontend.vercel.app).

Note

  1. Make sure you have Python 3.x and npm 10.x installed
  2. It is recommended to use a virtual environment for backend to avoid conflict with other projects.
  3. If you encounter any issue during installation or usage please contact rishijainai262003@gmail.com or rj1016743@gmail.com

Releases

No releases published

Packages

No packages published
0