8000 Add groundingMetadata to Gemini Multimodal Live Service by getchannel · Pull Request #1932 · pipecat-ai/pipecat · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add groundingMetadata to Gemini Multimodal Live Service #1932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

getchannel
Copy link
@getchannel getchannel commented May 30, 2025

This Pull Request introduces support for grounding metadata from the Google Gemini Multimodal Live API, enabling client applications (e.g., iOS apps) to 8000 display Google Search grounding links and related information.

Changes Implemented:

  1. Extended Event Models (src/pipecat/services/gemini_multimodal_live/events.py):

    • Added new Pydantic models to represent the structure of grounding metadata as received from the Gemini Live API:
      • SearchEntryPoint
      • WebSource
      • GroundingChunk
      • GroundingSegment
      • GroundingSupport
      • GroundingMetadata
    • Updated ServerContent to include an optional groundingMetadata field.
  2. Enhanced Gemini Service Logic (src/pipecat/services/gemini_multimodal_live/gemini.py):

    • Introduced _search_result_buffer and _accumulated_grounding_metadata to track relevant data across streamed events.
    • Added a new event handler _handle_evt_grounding_metadata to specifically process serverContent messages that only contain groundingMetadata.
    • Modified existing event handlers (_handle_evt_model_turn, _handle_evt_output_transcription, _handle_evt_turn_complete) to capture and store groundingMetadata if present.
    • Implemented _process_grounding_metadata():
      • This method converts the received events.GroundingMetadata into Pipecat's standard LLMSearchResponseFrame (from pipecat.services.google.frames).
      • It populates the search_result (accumulated text), origins (including site_uri with Vertex AI Search redirect links and site_title), and rendered_content (HTML for search suggestions).
      • The populated LLMSearchResponseFrame is then pushed down the pipeline.
  3. Logging Refinements:

    • Adjusted logging levels across gemini.py and events.py to be more PR-friendly. Please delete logging if excessive and not inline with best practices for the main repo on Pipecat.

How to Test:

  1. Use the examples/foundational/26g-gemini-multimodal-live-grounding.py example.
  2. Ensure the bot is configured with google_search as a tool and a system instruction that encourages its use (e.g., asking for current events).
  3. Observe the logs for LLMSearchResponseFrame being emitted.
  4. A client application (or a custom processor in the pipeline) can then consume this frame to access:
    • frame.search_result (the LLM's textual answer)
    • frame.rendered_content (HTML for search suggestions/chips)
    • frame.origins (list of sources, each with site_title, site_uri, and results snippets)

Key Files Changed:

  • src/pipecat/services/gemini_multimodal_live/gemini.py
  • src/pipecat/services/gemini_multimodal_live/events.py

Added Foundational Example 26g for testing:

  • examples/foundational/26g-gemini-multimodal-live-groundingMetadata.py

Let me know if you'd like any part of this adjusted!

@getchannel
Copy link
Author

@markbackman Let me know your thoughts on this, and if you want it on the main. waiting for this to push so I can launch my product. (I know its busy at the conference this week)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0