[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Generative AI Practice Exercises for Natural Language Processing

Notifications You must be signed in to change notification settings

weprintmoney/LLMPractice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Welcome to the LLM Practice Repository

This repository contains a set of practice assignments for Generative AI for Natural Language Processing. Each assignment covers a specific topic in Gen AI for NLP and has an accompanying dataset for analysis when required.

If you are looking for classical machine learning exercises, please check out the aimlpractice repo.

Table of Contents

Generative AI Foundations

Goals: Explain the framework and components of a typical modern-day Generative AI solution

Skills & Tools Covered:

  • Text Manipulation and Processing
  • Model Utilization and Evaluation
  • Utilizing OpenAI and YouTube APIs
  • 🤗 Hub for model download
  • 🤗 and 🦜🔗Libraries

Notebooks & Datasets:

The Pathway to Generative AI

Goals: Understand and apply NLP text preprocessing techniques. Develop a hybrid machine learning model that integrates text data with tabular metadata for improved classification accuracy.

Skills & Tools Covered:

  • Text Preprocessing
  • Vectorization
  • Ensemble and Stacking Techniques
  • Model Evaluation

Notebooks & Datasets:

Text Preprocessing

Goals: Utilize vectorization techniques such as Bag of Words (BoW) and TF-IDF, and effectively apply these methods within a ML pipeline for NLP tasks.

Skills & Tools Covered:

  • Bag of Words
  • TF-IDF
  • LSTMs
  • ML Pipelines

Notebooks & Datasets:

Introduction to Generative AI and Prompt Engineering

Goals: Design, optimize, and apply prompts effectively to harness the full potential of advanced language models for a variety of tasks.

Skills & Tools Covered:

  • Prompt Engineering
  • Self Consistency
  • Tree-of-Thought
  • Rephrase and Respond
  • Context Vectors (CoVe)
  • Llama 2 CPP
  • Mistral

Notebooks & Datasets:

Generative AI Solutions for Natural Language Processing

Goals: Utilize LangChain for building AI-driven chatbots and to understand and implement large multimodal models (LMMs), leveraging diverse datasets ranging from textual content to visual data.

Skills & Tools Covered:

  • Chatbot Development
  • Multimodal Learning

Notebooks & Datasets:

Transformers for Natural Language Processing

Goals:

  • Understand Sequential Deep Learning: Gain a comprehensive understanding of deep learning models designed for sequential data, such as recurrent neural networks (RNNs), long short-term memory networks (LSTMs), and gated recurrent units (GRUs).
  • Master Transformer Models: Acquire knowledge about Transformer models, a revolutionary architecture for sequence processing, widely used in natural language processing (NLP) tasks.
  • Grasp Attention Mechanisms: Understand the concept of attention mechanisms, which allow models to focus on specific parts of input sequences when making predictions.
  • Explore Popular Transformer Architectures: Familiarize yourself with popular Transformer architectures such as BERT, GPT (Generative Pre-trained Transformer), and T5 (Text-To-Text Transfer Transformer).

Skills & Tools Covered:

  • 🤗Transformers Library

Notebooks & Datasets:

Generative AI Applications with LangChain

Goals:

  • Working With LangChain Agents: Gain an understanding of the process involved in creating LangChain Pandas DataFrame agents, and how these are utilized in creating applications such as Data Science Assistants
  • Fine tuning with QLoRA: Obtain an insight into Fine-tuning Open-Source LLMs by using QLoRA (Quantized Low-rank Adaptation) on the Llama 2 7B chat model

Skills & Tools Covered:

  • Integration of techniques such as NLP for database querying and RAG for contextual question answering
  • Leveraging external resources like the OpenAI API to enhance project capabilities
  • Identifying potential biases, privacy considerations, and implications for diverse stakeholders
  • Mitigating risks and upholding ethical principles

Notebooks & Datasets:

For more fun with LangChain, please check out the Austin LangChain Group repo.

How to Use

To use the datasets in this repository, please follow the steps below:

  1. Copy the dataset to your own Google Drive account by clicking on the dataset link in the assignment description above.
  2. Open the dataset in your Google Drive account and click on the "Add shortcut to Drive" button in the upper right corner.
  3. In the popup window, select the folder in your Google Drive where you want to add the dataset shortcut, and then click on the "Add shortcut" button.
  4. In your local machine or Google Colab notebook, mount your Google Drive using the following code:
from google.colab import drive
drive.mount('/content/drive')
  1. Access the dataset using the file path of the dataset shortcut in your Google Drive, like this:
import pandas as pd
df = pd.read_csv('/content/drive/My Drive/<folder>/<dataset>')

Replace <folder> with the name of the folder where you added the dataset shortcut in step 3, and <dataset> with the name of the dataset file.

Thank you for checking out my repository. I hope that these assignments will help you gain more knowledge and expertise in AI/ML. If you have any suggestions or feedback, please feel free to contribute or contact me.

About

Generative AI Practice Exercises for Natural Language Processing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published