NotebookLlama: An Open Source version of NotebookLM

Audio Example

https://peregrine-results.s3.amazonaws.com/pigeon/Rg3RFytruIL15cOV3x_0.mp3

This is a guided series of tutorials/notebooks that can be taken as a reference or course to build a PDF to Podcast workflow with voice cloning capabilities.

You will learn from experiments using Text to Speech Models and voice cloning technology.

It assumes zero knowledge of LLMs, prompting, audio models, and voice cloning - everything is covered in their respective notebooks.

Outline:

Here is step by step thought (pun intended) for the task:

Step 1: Pre-process PDF: Use Llama-3.2-1B-Instruct to pre-process the PDF and save it in a .txt file.
Step 2: Transcript Writer: Use Llama-3.1-70B-Instruct model to write a podcast transcript from the text
Step 3: Dramatic Re-Writer: Use Llama-3.1-8B-Instruct model to make the transcript more dramatic
Step 4: Text-To-Speech Workflow: Use parler-tts/parler-tts-mini-v1, bark/suno, or PlayHT to generate a conversational podcast with optional voice cloning

Note 1: In Step 1, we prompt the 1B model to not modify the text or summarize it, strictly clean up extra characters or garbage characters that might get picked due to encoding from PDF. Please see the prompt in Notebook 1 for more details.

Note 2: For Step 2, you can also use Llama-3.1-8B-Instruct model, we recommend experimenting and trying if you see any differences. The 70B model was used here because it gave slightly more creative podcast transcripts for the tested examples.

Note 3: For Step 4, please try to extend the approach with other models. These models were chosen based on a sample prompt and worked best, newer models might sound better. Please see Notes for some of the sample tests.

Note 4: For voice cloning capabilities, we've integrated PlayHT which allows you to either use preset voices or clone your own voice for more personalized audio generation. The voice cloning feature requires a PlayHT account and API credentials.

Detailed steps on running the notebook:

Requirements:

GPU server or an API provider for using 70B, 8B and 1B Llama models
For running the 70B model, you will need a GPU with aggregated memory around 140GB to infer in bfloat-16 precision
PlayHT account and API credentials (for voice cloning)
Audio sample for voice cloning (optional)

Note: For our GPU Poor friends, you can also use the 8B and lower models for the entire pipeline. There is no strong recommendation. The pipeline below is what worked best on first few tests. You should try and see what works best for you!

Before getting started:

Login using the huggingface cli and launch your jupyter notebook server
Get your Hugging Face access token from here
Run huggingface-cli login and paste your access token
Set up your PlayHT credentials if using voice cloning

Installation:

git clone https://github.com/meta-llama/llama-recipes
cd llama-recipes/recipes/quickstart/NotebookLlama/
pip install -r requirements.txt

Notebook Walkthrough:

Notebook 1:

This notebook processes the PDF using the new Feather light model into a .txt file.

Update the first cell with your PDF link
Experiment with Llama-3.2-1B-Instruct model prompts

Notebook 2:

Takes processed output and creates a podcast transcript using Llama-3.1-70B-Instruct.

Try the 405B model if GPU-rich
Experiment with System prompts
Compare with 8B model results

Notebook 3:

Adds dramatization using Llama-3.1-8B-Instruct.

Creates conversation tuples for easier processing
Customizes speaker-specific prompts
Test with 3B and 1B models

Notebook 4:

Converts to podcast audio using multiple options:

Traditional approach:
- parler-tts/parler-tts-mini-v1
- bark/suno models
Voice Cloning approach (PlayHT):
- Upload voice sample (30s - 1min recommended)
- Use generated voice ID or preset voices
- Generate high-quality TTS with cloned voice

Note: Parler requires transformers 4.43.3 or earlier, while steps 1-3 need the latest version.

Next-Improvements/Further ideas:

Speech Model experimentation with advanced voice cloning
LLM vs LLM Debate feature
Testing 405B for transcripts
Better prompting strategies
Support for website, audio, YouTube inputs
Enhanced voice cloning capabilities
Multi-speaker voice cloning support

Resources for further learning:

This project welcomes community contributions and PRs for any improvements!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
app.py		app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NotebookLlama: An Open Source version of NotebookLM

Audio Example

Outline:

Detailed steps on running the notebook:

Notebook Walkthrough:

Notebook 1:

Notebook 2:

Notebook 3:

Notebook 4:

Next-Improvements/Further ideas:

Resources for further learning:

About

Releases

Packages

Languages

ahkamboh/NotebookLlama

Folders and files

Latest commit

History

Repository files navigation

NotebookLlama: An Open Source version of NotebookLM

Audio Example

Outline:

Detailed steps on running the notebook:

Notebook Walkthrough:

Notebook 1:

Notebook 2:

Notebook 3:

Notebook 4:

Next-Improvements/Further ideas:

Resources for further learning:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages