-
Notifications
You must be signed in to change notification settings - Fork 837
Refactor Code for Improved Logging, Tokenization, Regex Efficiency, and Error Handling #73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…nd Error Handling
Thanks for your great contribution. |
from enum import Enum | ||
import re | ||
from typing import List, Set, Union, Optional, Dict, Tuple, Any | ||
import tiktoken |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to reorder the imports according to PEP8 code style conventions: https://peps.python.org/pep-0008/#imports
I'd suggest split a single big refactor PR to multiple smaller PRs if a refactor is needed. Currently it seems there are multiple different refactor packed in one PR. Would be nice to create smaller PRs and that would be much easier to review and less likely break the existing functionalities. |
agree! |
Hi @RajeshBasnet-dev 👋, Thank you so much for your contribution to this PR! Your work is really appreciated. If you haven’t already, feel free to join our Discord community here: Discord Invite Link — it's a great place to connect with our team and other contributors, share ideas, and stay up to date with the project! You can find me as @scarlettt_moon there! Looking forward to connecting! 😊 |
Logging Misuse:
Some logging statements were using logging.info where logging.error was more appropriate. The log levels have been corrected for clarity.
Redundant Tokenization:
The code had redundant tokenization logic. I have refactored the code to avoid re-tokenizing the same data.
Regex Optimization:
Multiple regex substitutions were combined into a single efficient regex to improve performance.
Improved Error Handling:
I added better error handling and exception management for functions that were lacking proper fallback mechanisms.
How It Was Solved:
Imports: The missing dependencies tiktoken and langchain were installed and the imports were fixed.
Logging: I updated the log levels for error messages to use logging.error instead of logging.info.
Tokenization: I refactored the code to centralize the tokenization logic in one place to avoid redundancy.
Regex: Optimized multiple regex patterns into a single, more efficient pattern.
Error Handling: Introduced proper error handling with try-except blocks to handle exceptions more gracefully.
Testing:
The changes have been tested locally to ensure the imports are resolved, and the functions work as expected without errors.
Additional Notes:
The changes will improve the overall efficiency and readability of the code. Let me know if any adjustments are needed!