How to Turn Notebook LM Podcasts into Realistic Talking Head Videos

Introduction

This tutorial walks you through turning a Notebook LM-generated podcast into a realistic, side-by-side talking head video—matching original speech, expressions, and even your own face and voice. We cover how to split speakers, create digital avatars, sync audio and video, change voices, and more—all step by step.

Prerequisites

Step 1: Generate Your Podcast with Notebook LM

  1. Go to Notebook LM and log in.
  2. Click Try Notebook LMCreate new.
  3. Paste the URLs of your source materials (for example, articles about AI automation vs AI agents).
  4. After processing, set the length of the podcast using the menu (default: ~20 mins; set shorter for testing).
  5. Click Generate and wait 5–10 minutes. When ready, listen and download the audio overview.

Notebook LM automatically creates engaging, multi-voice podcast-style audio from your chosen sources, perfect for further transformation.

Step 2: Separate Speakers with Speaker Split

  1. Visit Speaker Split and upload your Notebook LM audio file.
  2. Select the number of speakers (typically 2 for a podcast format) and click Process Audio.
  3. Download the separated tracks (Speaker A and Speaker B), plus the transcript.

Separating speakers maintains individual timing and silences—crucial for synchronizing two talking head videos.

Step 3: Create a Digital Twin Avatar with Hegen

  1. Go to Heygen and sign up.
  2. Click Create Your Avatar.
  3. Record or upload a short, well-lit video of yourself (Hen provides guidance).
  4. Follow the prompts for verification (may need webcam/microphone access).
  5. Wait for avatar processing (a few minutes)—you are notified when ready.

Step 4: Sync Avatar to Audio and Generate Video

  1. In Hen’s AI Studio, click Landscape mode, and load your avatar.
  2. Upload the separated Speaker A audio.
  3. Ensure Voice from recording is selected.
  4. Click Generate, give the video a title, and submit.
  5. Repeat for Speaker B, using either another digital avatar or a library avatar. Choose avatars with natural, minimal gestures.

Step 5: Edit Avatar Videos Side-by-Side

  1. Download both avatar videos from Hen.
  2. Open CapCut (or your preferred video editor) and create a new project.
  3. Import the two avatar videos.
  4. Place them on separate video tracks. Use the crop/reposition tool to align them side by side.
  5. Export the synchronized, talking head podcast video.

Editing with both videos lets you present a dynamic, natural conversation between hosts.

Step 6: Advanced – Voice Cloning with Elevenlabs.io

  1. Sign up at elevenlabs.io (paid Creator plan recommended for high-quality clones).
  2. Go to the Voices tab → Create/Clone a Voice, select Professional Clone.
  3. Upload at least 30 minutes (up to 2 hours) of your own speech as sample data.
  4. Verify and wait for processing.
  5. Once your voice clone is ready, test it in the Text-to-Speech section.
  6. For dubbed podcasts: Open the elevenlabs.io Dubbing Studio, upload the speaker audio, set source & target languages to English, and process.
  7. Assign your professional clone to the target audio track. Regenerate, review, and export audio.
// Example: Using Elevenlabs.io Dubbing Studio to synchronize cloned voice
const audioOutput = elevenLabs.dub({
inputAudio: ‘notebook-lm-speakerA.wav’,
targetVoice: ‘myClone’,
language: ‘en’,
preserveTiming: true,
});

Step 7: Gender/Voice Changing Options

  1. Use Heygen’s “voice mirroring” feature to apply a different voice (e.g., swap gender) to a library avatar.
  2. Upload the original speaker audio, enable “mirroring,” and select your desired voice.
  3. If limited by mirroring’s 5-min cap, process in segments and stitch together in your video editor.
  4. Alternatively, use elevenlabs.io to dub with a different voice or accent.

Conclusion

With this guide, you can transform any Notebook LM-generated podcast into a fully-animated, hyper-realistic side-by-side talking head video—using your own or any face and voice, with total control over synchronization and appearance. This is ideal for training, content creation, or powerful podcast reformatting.

  • Key takeaway: The crucial steps are accurate speaker separation, natural-looking avatars, and precise audio synchronization—especially if using your own cloned voice.
  • Next step: Experiment with different avatar looks, voice settings, or podcast topics to create engaging content for your audience!