AskSmart is a powerful document retrieval system that allows you to upload and process various formats such as PDF, DOCX, JSON, and TXT. Our advanced AI technology retrieves relevant information and generates context-aware responses to your queries.
Supported Formats: PDF, DOCX, JSON, TXT
- Clone the Repository
git clone https://github.com/Rishi-Jain2602/AskSmart.git
- Create Virtual Environment
cd backend
virtualenv venv
venv\Scripts\activate # On Windows
source venv/bin/activate # On macOS/Linux
- Install the Project dependencies
- 3.1 Navigate to the Backend Directory and install Python dependencies:
cd backend
pip install -r requirements.txt
- 3.2 Navigate to the Frontend Directory and install Node.js dependencies:
cd frontend
npm install
- Run the React App
Start the React app with the following command:
cd frontend
npm start
- Run the Backend (FastAPI App)
Open a new terminal and run the backend:
cd backend
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
- The server will be running at
http://127.0.0.1:8000
.
This code is responsible for processing and storing documents that are uploaded by users, preparing them for retrieval and generation of context-based responses. Here's how the pipeline works:
-
File Upload and Processing Based on Format:: When a user uploads a document (PDF, DOCX, JSON, or TXT), the system processes it using appropriate loaders based on the file type. Here’s the logic for handling different formats:
if file_path.endswith(".pdf"): loader = PyPDFLoader(file) elif file_path.endswith(".docx"): loader = Docx2txtLoader(file) elif file_path.endswith(".json"): loader = JSONLoader(file, jq_schema=".", text_content=False, json_lines=False) elif file_path.endswith(".txt"): loader = TextLoader(file) else: raise ValueError("Unsupported file format. Please upload a PDF, DOCX, JSON, or TXT file.") # Load the document content documents = loader.load()
-
Document Processing: After uploading, the file is processed using specific loaders based on its file type (e.g.,
PyPDFLoader
for PDFs,Docx2txtLoader
for DOCX). The document content is split into smaller text chunks to enable more efficient querying later.text_splitter = CharacterTextSplitter(chunk_size=2000, chunk_overlap=200) docs = text_splitter.split_documents(documents)
-
Handling Duplicate File Names : The system automatically checks if a collection with the same name exists. If it does, the old data is deleted, and the new data is updated. This prevents conflicts when uploading a new file with the same name.
if client.collections.exists(valid_name): client.collections.delete(valid_name)
-
Data Storage: These chunks are then added to a collection in Weaviate, with relevant metadata such as the document title, chunk index, and document ID.
data_properties = { "Document_title": str(doct_name), "chunk": str(chunk), "chunk_index": i, "DoctID": str(docuemtID) } data_object = wvc.data.DataObject(properties=data_properties) chunks_list.append(data_object)
-
Chunk Insertion: The chunks are inserted into Weaviate in batches to optimize performance.
for i in range(0, len(chunks_list), batch_size): batch = chunks_list[i:i + batch_size] chunks.data.insert_many(batch)
-
Document Upload API:
- Endpoint:
POST https://asksmart.onrender.com/Rag/upload
- Description: This API allows you to upload documents in formats like PDF, DOCX, JSON, and TXT. The document is processed and stored in a Weaviate collection.
Curl Command:
curl -X POST "https://asksmart.onrender.com/Rag/upload" -F "file=@/path/to/your/file.pdf"
- Replace
/path/to/your/file.pdf
with the actual path of the document. - Response:
{ "doct_id": "Generated_collection_name", "filename": "yourfile.pdf", "message": "Upload and processing complete" }
- Endpoint:
-
Chatting API:
- Endpoint:
POST https://asksmart.onrender.com/Rag/chat
- Description: Query the uploaded document to get context-based responses.
Curl Command:
curl -X POST "https://asksmart.onrender.com/Rag/chat" \ -H "Content-Type: application/json" \ -d '{ "user_id": "SESSION_ID_VALUE", "query": "CURRENT_INPUT_VALUE", "doctID": "STORED_DOC_ID_VALUE" }'
- Replace
SESSION_ID_VALUE
,CURRENT_INPUT_VALUE
, andSTORED_DOC_ID_VALUE
with actual values. - Response:
{ "response": "Generated response based on document" }
- Endpoint:
-
The backend of this project is deployed on Render.
-
You can automate deployment with a
render.yaml
file to define the environment settings for Render or follow these manual steps:-
Create a New Web Service:
- Log into Render and create a new web service for the backend.
- Connect it to your GitHub repository.
-
Build Command:
- In the service settings, set the build co
7262
mmand to install dependencies and run the FastAPI server:
cd backend pip install -r requirements.txt uvicorn main:app --host 0.0.0.0 --port 8000
- In the service settings, set the build co
7262
mmand to install dependencies and run the FastAPI server:
-
Environment Variables:
- Add the necessary environment variables for Weaviate, OpenAI, and other integrations under the "Environment" tab.
-
Deploy:
- Once configured, deploy the service and obtain the live URL for your backend (e.g.,
https://asksmart.onrender.com
).
- Once configured, deploy the service and obtain the live URL for your backend (e.g.,
-
-
The frontend of this project is deployed on Vercel.
-
You can automate deployment via Vercel or follow these manual steps:
-
Create a New Project:
- Log into Vercel and create a new project.
- Select your GitHub repository and choose the frontend folder to deploy as a separate Vercel project.
-
Build Command:
- In the project settings, set the build command to install dependencies and build the React app:
cd frontend npm install npm run build
- In the project settings, set the build command to install dependencies and build the React app:
-
Output Directory:
- Ensure that the output directory is set to
build
(as Vercel automatically looks for this folder in React projects).
- Ensure that the output directory is set to
-
Environment Variables:
- Set any necessary environment variables, such as API URLs pointing to the backend (e.g., the Render URL).
-
Deploy:
- Deploy the frontend, and Vercel will provide a live URL (e.g.,
https://asksmart-frontend.vercel.app
).
- Deploy the frontend, and Vercel will provide a live URL (e.g.,
-
- Make sure you have Python 3.x and npm 10.x installed
- It is recommended to use a virtual environment for backend to avoid conflict with other projects.
- If you encounter any issue during installation or usage please contact rishijainai262003@gmail.com or rj1016743@gmail.com