This repository contains a set of practice assignments for Generative AI for Natural Language Processing. Each assignment covers a specific topic in Gen AI for NLP and has an accompanying dataset for analysis when required.
If you are looking for classical machine learning exercises, please check out the aimlpractice repo.
- Generative AI Foundations
- The Pathway to Generative AI
- Text Preprocessing
- Introduction to Generative AI and Prompt Engineering
- Generative AI Solutions for Natural Language Processing
- Transformers for Natural Language Processing
- How to Use
Goals: Explain the framework and components of a typical modern-day Generative AI solution
Skills & Tools Covered:
- Text Manipulation and Processing
- Model Utilization and Evaluation
- Utilizing OpenAI and YouTube APIs
- 🤗 Hub for model download
- 🤗 and 🦜🔗Libraries
Notebooks & Datasets:
- Using the OpenAI API and 🤗 for YouTube Transcription, Transcript Summarization & Transcript Quiz Generation
- Prompt Engineering w/ Llama 2 for NLP Tasks
- 🦜🔗 Document Q&A System
Goals: Understand and apply NLP text preprocessing techniques. Develop a hybrid machine learning model that integrates text data with tabular metadata for improved classification accuracy.
Skills & Tools Covered:
- Text Preprocessing
- Vectorization
- Ensemble and Stacking Techniques
- Model Evaluation
Notebooks & Datasets:
Goals: Utilize vectorization techniques such as Bag of Words (BoW) and TF-IDF, and effectively apply these methods within a ML pipeline for NLP tasks.
Skills & Tools Covered:
- Bag of Words
- TF-IDF
- LSTMs
- ML Pipelines
Notebooks & Datasets:
- Embeddings With Word2Vec
- Dataset: 3.2+new_reviews.csv
- Twitter Airline Sentiment Analysis
- Dataset: Tweets.csv
- Text Generation Using LSTMs
- Dataset: medium_data.csv
- Machine Translation Using LSTMs
- Dataset: fra.txt
Goals: Design, optimize, and apply prompts effectively to harness the full potential of advanced language models for a variety of tasks.
Skills & Tools Covered:
- Prompt Engineering
- Self Consistency
- Tree-of-Thought
- Rephrase and Respond
- Context Vectors (CoVe)
- Llama 2 CPP
- Mistral
Notebooks & Datasets:
- 4.01 Prompt Engineering With Llama2 CPP
- 4.02 Prompt Engineering Use Cases
- 4.03 Self-Consistency and Tree-of-Thought Prompting
- 4.04 - Rephrase and Repond & CoVe Prompting
Goals: Utilize LangChain for building AI-driven chatbots and to understand and implement large multimodal models (LMMs), leveraging diverse datasets ranging from textual content to visual data.
Skills & Tools Covered:
- Chatbot Development
- Multimodal Learning
Notebooks & Datasets:
- LangChain Essentials
- 5.02 LangChain Agent Chatbot Prototype
- 5.03 LMMs Large Multimodal Models
- Dataset: oat_drink.png, milk_shelf.png
Goals:
- Understand Sequential Deep Learning: Gain a comprehensive understanding of deep learning models designed for sequential data, such as recurrent neural networks (RNNs), long short-term memory networks (LSTMs), and gated recurrent units (GRUs).
- Master Transformer Models: Acquire knowledge about Transformer models, a revolutionary architecture for sequence processing, widely used in natural language processing (NLP) tasks.
- Grasp Attention Mechanisms: Understand the concept of attention mechanisms, which allow models to focus on specific parts of input sequences when making predictions.
- Explore Popular Transformer Architectures: Familiarize yourself with popular Transformer architectures such as BERT, GPT (Generative Pre-trained Transformer), and T5 (Text-To-Text Transfer Transformer).
Skills & Tools Covered:
- 🤗Transformers Library
Notebooks & Datasets:
- Transformer Based LLM Use Cases
- Sarcasm Detection with Transformers
- Building the Transformer in Pytorch
- Self-Attention
- Fine Tuning
- Semantic Search with Transformer Embeddings
Goals:
- Working With LangChain Agents: Gain an understanding of the process involved in creating LangChain Pandas DataFrame agents, and how these are utilized in creating applications such as Data Science Assistants
- Fine tuning with QLoRA: Obtain an insight into Fine-tuning Open-Source LLMs by using QLoRA (Quantized Low-rank Adaptation) on the Llama 2 7B chat model
Skills & Tools Covered:
- Integration of techniques such as NLP for database querying and RAG for contextual question answering
- Leveraging external resources like the OpenAI API to enhance project capabilities
- Identifying potential biases, privacy considerations, and implications for diverse stakeholders
- Mitigating risks and upholding ethical principles
Notebooks & Datasets:
- Developing AI Assistants with LangChain
- Fine-tuning Llama 2 for Dialog Summarization with QLoRA
- LangChain Data Science Assistant
For more fun with LangChain, please check out the Austin LangChain Group repo.
To use the datasets in this repository, please follow the steps below:
- Copy the dataset to your own Google Drive account by clicking on the dataset link in the assignment description above.
- Open the dataset in your Google Drive account and click on the "Add shortcut to Drive" button in the upper right corner.
- In the popup window, select the folder in your Google Drive where you want to add the dataset shortcut, and then click on the "Add shortcut" button.
- In your local machine or Google Colab notebook, mount your Google Drive using the following code:
from google.colab import drive
drive.mount('/content/drive')
- Access the dataset using the file path of the dataset shortcut in your Google Drive, like this:
import pandas as pd
df = pd.read_csv('/content/drive/My Drive/<folder>/<dataset>')
Replace <folder>
with the name of the folder where you added the dataset shortcut in step 3, and <dataset>
with the name of the dataset file.
Thank you for checking out my repository. I hope that these assignments will help you gain more knowledge and expertise in AI/ML. If you have any suggestions or feedback, please feel free to contribute or contact me.