Welcome to my GitHub repository! Here, you will find a collection of projects that showcase my journey in data science and machine learning, demonstrating a strong foundation in data mining processes, model development, and deployment.
I am a passionate data scientist with hands-on experience in the end-to-end data mining process, including Exploratory Data Analysis (EDA), data preprocessing, model training, validation, and analysis. My expertise extends to deploying models into production using Azure Web Apps, FastAPI, and Docker, with an automated deployment pipeline via GitHub.
Despite facing challenges in aligning projects with direct business impacts, I have consistently sought to bridge the gap between technical AI model development and practical business applications. I am keen on projects that require deep understanding and alignment with production team needs, aiming for significant contributions beyond the realm of R&D.
Objective: Predict next week's shipping volume for the Busan-Qintao route.
Model & Tools: Linear Regression, feature selection, and engineering.
Insights: Overcame data scarcity with effective feature selection, achieving an RMSE of 56.9 and RMSPE of 10%.
Objective: Classify shipping items into 96 categories using the HS system.
Model & Tools: Distil-BERT and T5 models, PyTorch, data preprocessing.
Insights: Achieved 94% accuracy despite challenges in data acquisition and label consistency.
Objective: Distinguish between dangerous and non-dangerous shipping cargo. Model & Tools: Distil-BERT, data sampling, text preprocessing. Insights: Attained an 85% recall rate and 93% precision, balancing dataset representation.
Objective: Optimize shipping volume with linear programming and reinforcement learning. Model & Tools: Reinforcement Learning with DDQN. Insights: Devised a complex reward system and environment for route optimization.
Objective: Recover missing data in sparse shipping volume arrays. Model & Tools: SVD method, creation of a rating matrix. Insights: Utilized item classification to generate a meaningful rating matrix for SVD application.
Objective: Extract data from shipping invoices using prompt engineering. Model & Tools: Azure chatGPT, OCR, Python. Insights: Developed innovative prompt chaining techniques to process and interpret invoice data efficiently.
I am currently delving into ChatGPT Prompt Engineering, leveraging the power of AI to streamline data extraction from complex documents.
This work involves sophisticated prompt design, integration of OCR technologies, and strategic API usage to enhance data processing workflows.
Also I am very interested in RAG using vectorDB and function calling.
I am eager to collaborate on projects that not only challenge my technical skills but also have a tangible impact on business processes and efficiency.
My goal is to develop solutions that are in perfect harmony with the actual needs of production teams, driving meaningful advancements in the field.
Feel free to explore my projects and reach out for collaborations or discussions. You can contact me at kaizinpark@gmail.com
Thank you for visiting my repository!
Feel free to customize the template with your personal information, contact details, and any additional projects or skills you wish to highlight. This introduction aims to present a comprehensive view of your capabilities and aspirations to potential collaborators or employers browsing your GitHub repository.