Welcome to the LLM Practice Repository

This repository contains a set of practice assignments for Generative AI for Natural Language Processing. Each assignment covers a specific topic in Gen AI for NLP and has an accompanying dataset for analysis when required.

If you are looking for classical machine learning exercises, please check out the aimlpractice repo.

Generative AI Foundations

Goals: Explain the framework and components of a typical modern-day Generative AI solution

Skills & Tools Covered:

Text Manipulation and Processing
Model Utilization and Evaluation
Utilizing OpenAI and YouTube APIs
🤗 Hub for model download
🤗 and 🦜🔗Libraries

Notebooks & Datasets:

The Pathway to Generative AI

Goals: Understand and apply NLP text preprocessing techniques. Develop a hybrid machine learning model that integrates text data with tabular metadata for improved classification accuracy.

Skills & Tools Covered:

Text Preprocessing
Vectorization
Ensemble and Stacking Techniques
Model Evaluation

Notebooks & Datasets:

Text Preprocessing - Fake News Detector
- Dataset: Fake_new_v1_dataset.csv

Text Preprocessing

Goals: Utilize vectorization techniques such as Bag of Words (BoW) and TF-IDF, and effectively apply these methods within a ML pipeline for NLP tasks.

Skills & Tools Covered:

Bag of Words
TF-IDF
LSTMs
ML Pipelines

Notebooks & Datasets:

Embeddings With Word2Vec
- Dataset: 3.2+new_reviews.csv
Twitter Airline Sentiment Analysis
- Dataset: Tweets.csv
Text Generation Using LSTMs
- Dataset: medium_data.csv
Machine Translation Using LSTMs
Dataset: fra.txt

Introduction to Generative AI and Prompt Engineering

Goals: Design, optimize, and apply prompts effectively to harness the full potential of advanced language models for a variety of tasks.

Skills & Tools Covered:

Prompt Engineering
Self Consistency
Tree-of-Thought
Rephrase and Respond
Context Vectors (CoVe)
Llama 2 CPP
Mistral

Notebooks & Datasets:

Generative AI Solutions for Natural Language Processing

Goals: Utilize LangChain for building AI-driven chatbots and to understand and implement large multimodal models (LMMs), leveraging diverse datasets ranging from textual content to visual data.

Skills & Tools Covered:

Chatbot Development
Multimodal Learning

Notebooks & Datasets:

Transformers for Natural Language Processing

Goals:

Understand Sequential Deep Learning: Gain a comprehensive understanding of deep learning models designed for sequential data, such as recurrent neural networks (RNNs), long short-term memory networks (LSTMs), and gated recurrent units (GRUs).
Master Transformer Models: Acquire knowledge about Transformer models, a revolutionary architecture for sequence processing, widely used in natural language processing (NLP) tasks.
Grasp Attention Mechanisms: Understand the concept of attention mechanisms, which allow models to focus on specific parts of input sequences when making predictions.
Explore Popular Transformer Architectures: Familiarize yourself with popular Transformer architectures such as BERT, GPT (Generative Pre-trained Transformer), and T5 (Text-To-Text Transfer Transformer).

Skills & Tools Covered:

🤗Transformers Library

Notebooks & Datasets:

Generative AI Applications with LangChain

Goals:

Working With LangChain Agents: Gain an understanding of the process involved in creating LangChain Pandas DataFrame agents, and how these are utilized in creating applications such as Data Science Assistants
Fine tuning with QLoRA: Obtain an insight into Fine-tuning Open-Source LLMs by using QLoRA (Quantized Low-rank Adaptation) on the Llama 2 7B chat model

Skills & Tools Covered:

Integration of techniques such as NLP for database querying and RAG for contextual question answering
Leveraging external resources like the OpenAI API to enhance project capabilities
Identifying potential biases, privacy considerations, and implications for diverse stakeholders
Mitigating risks and upholding ethical principles

Notebooks & Datasets:

For more fun with LangChain, please check out the Austin LangChain Group repo.

How to Use

To use the datasets in this repository, please follow the steps below:

Copy the dataset to your own Google Drive account by clicking on the dataset link in the assignment description above.
Open the dataset in your Google Drive account and click on the "Add shortcut to Drive" button in the upper right corner.
In the popup window, select the folder in your Google Drive where you want to add the dataset shortcut, and then click on the "Add shortcut" button.
In your local machine or Google Colab notebook, mount your Google Drive using the following code:

from google.colab import drive
drive.mount('/content/drive')

Access the dataset using the file path of the dataset shortcut in your Google Drive, like this:

import pandas as pd
df = pd.read_csv('/content/drive/My Drive/<folder>/<dataset>')

Replace <folder> with the name of the folder where you added the dataset shortcut in step 3, and <dataset> with the name of the dataset file.

Thank you for checking out my repository. I hope that these assignments will help you gain more knowledge and expertise in AI/ML. If you have any suggestions or feedback, please feel free to contribute or contact me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to the LLM Practice Repository

Table of Contents

Generative AI Foundations

The Pathway to Generative AI

Text Preprocessing

Introduction to Generative AI and Prompt Engineering

Generative AI Solutions for Natural Language Processing

Transformers for Natural Language Processing

Generative AI Applications with LangChain

How to Use

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
1.01 - YouTube Transcription with Hugging Face Pipeline.ipynb		1.01 - YouTube Transcription with Hugging Face Pipeline.ipynb
1.02 - LangChain Question Answering.ipynb		1.02 - LangChain Question Answering.ipynb
1.03 - Prompt Engineering with Llama2 CPP.ipynb		1.03 - Prompt Engineering with Llama2 CPP.ipynb
10.10 - Developing AI Assistants with LangChain.ipynb		10.10 - Developing AI Assistants with LangChain.ipynb
10.20 - Fine-tuning Llama 2 for Dialog Summarization with QLoRA		10.20 - Fine-tuning Llama 2 for Dialog Summarization with QLoRA
10.20 - LangChain Data Science Assistant.ipynb		10.20 - LangChain Data Science Assistant.ipynb
2.01 - Fake News Detector.ipynb		2.01 - Fake News Detector.ipynb
3.01 Embeddings With Word2Vec.ipynb		3.01 Embeddings With Word2Vec.ipynb
3.02 Document Modeling with Gensim.ipynb		3.02 Document Modeling with Gensim.ipynb
3.03 Text Generation Using LSTMs.ipynb		3.03 Text Generation Using LSTMs.ipynb
3.04 Machine Translation Using LSTMs.ipynb		3.04 Machine Translation Using LSTMs.ipynb
4.01 Prompt Engineering With Llama2 CPP.ipynb		4.01 Prompt Engineering With Llama2 CPP.ipynb
4.02 Prompt Engineering Use Cases.ipynb		4.02 Prompt Engineering Use Cases.ipynb
4.03 Self-Consistency and Tree-of-Thought Prompting.ipynb		4.03 Self-Consistency and Tree-of-Thought Prompting.ipynb
4.04 - Self-Consistency and Tree-of-Thought Prompting.ipynb		4.04 - Self-Consistency and Tree-of-Thought Prompting.ipynb
5.01 LangChain Essentials - Global Warming.ipynb		5.01 LangChain Essentials - Global Warming.ipynb
5.02 LangChain Agent Chatbot Prototype.ipynb		5.02 LangChain Agent Chatbot Prototype.ipynb
5.03 LMMs Large Multimodal Models.ipynb		5.03 LMMs Large Multimodal Models.ipynb
8.03 - Transformer-based LLM Use Cases.ipynb		8.03 - Transformer-based LLM Use Cases.ipynb
8.04 - Sarcasm Detection with Transformers.ipynb		8.04 - Sarcasm Detection with Transformers.ipynb
8.12 - Building the Transformer in PyTorch.ipynb		8.12 - Building the Transformer in PyTorch.ipynb
8.14 - Self-Attention.ipynb		8.14 - Self-Attention.ipynb
8.20 - Fine Tuning.ipynb		8.20 - Fine Tuning.ipynb
9.0 Semantic Search with Transformer Embeddings.ipynb		9.0 Semantic Search with Transformer Embeddings.ipynb
9.01 Semantic Search.ipynb		9.01 Semantic Search.ipynb
9.02 Canadian Law LLM.ipynb		9.02 Canadian Law LLM.ipynb
9.03 OpenAI Business Document Translator.py		9.03 OpenAI Business Document Translator.py
9.03 PDF Data Extraction		9.03 PDF Data Extraction
MultiModalRAG.ipynb		MultiModalRAG.ipynb
MultiModelRAG_data_processing.ipynb		MultiModelRAG_data_processing.ipynb
README.md		README.md

weprintmoney/LLMPractice

Folders and files

Latest commit

History

Repository files navigation

Welcome to the LLM Practice Repository

Table of Contents

Generative AI Foundations

The Pathway to Generative AI

Text Preprocessing

Introduction to Generative AI and Prompt Engineering

Generative AI Solutions for Natural Language Processing

Transformers for Natural Language Processing

Generative AI Applications with LangChain

How to Use

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages