Add groundingMetadata to Gemini Multimodal Live Service #1932

getchannel · 2025-05-30T22:12:16Z

This Pull Request introduces support for grounding metadata from the Google Gemini Multimodal Live API, enabling client applications (e.g., iOS apps) to 8000 display Google Search grounding links and related information.

Changes Implemented:

Extended Event Models (src/pipecat/services/gemini_multimodal_live/events.py):
- Added new Pydantic models to represent the structure of grounding metadata as received from the Gemini Live API:
  - SearchEntryPoint
  - WebSource
  - GroundingChunk
  - GroundingSegment
  - GroundingSupport
  - GroundingMetadata
- Updated ServerContent to include an optional groundingMetadata field.
Enhanced Gemini Service Logic (src/pipecat/services/gemini_multimodal_live/gemini.py):
- Introduced _search_result_buffer and _accumulated_grounding_metadata to track relevant data across streamed events.
- Added a new event handler _handle_evt_grounding_metadata to specifically process serverContent messages that only contain groundingMetadata.
- Modified existing event handlers (_handle_evt_model_turn, _handle_evt_output_transcription, _handle_evt_turn_complete) to capture and store groundingMetadata if present.
- Implemented _process_grounding_metadata():
  - This method converts the received events.GroundingMetadata into Pipecat's standard LLMSearchResponseFrame (from pipecat.services.google.frames).
  - It populates the search_result (accumulated text), origins (including site_uri with Vertex AI Search redirect links and site_title), and rendered_content (HTML for search suggestions).
  - The populated LLMSearchResponseFrame is then pushed down the pipeline.
Logging Refinements:
- Adjusted logging levels across gemini.py and events.py to be more PR-friendly. Please delete logging if excessive and not inline with best practices for the main repo on Pipecat.

How to Test:

Use the examples/foundational/26g-gemini-multimodal-live-grounding.py example.
Ensure the bot is configured with google_search as a tool and a system instruction that encourages its use (e.g., asking for current events).
Observe the logs for LLMSearchResponseFrame being emitted.
A client application (or a custom processor in the pipeline) can then consume this frame to access:
- frame.search_result (the LLM's textual answer)
- frame.rendered_content (HTML for search suggestions/chips)
- frame.origins (list of sources, each with site_title, site_uri, and results snippets)

Key Files Changed:

src/pipecat/services/gemini_multimodal_live/gemini.py
src/pipecat/services/gemini_multimodal_live/events.py

Added Foundational Example 26g for testing:

examples/foundational/26g-gemini-multimodal-live-groundingMetadata.py

Let me know if you'd like any part of this adjusted!

added proper .py to file name.

This is an example to test usage of the Files API integration. Specifically with the Gemini Multimodal Live Service.

getchannel · 2025-06-06T14:17:57Z

@markbackman Let me know your thoughts on this, and if you want it on the main. waiting for this to push so I can launch my product. (I know its busy at the conference this week)

getchannel added 11 commits May 9, 2025 10:50

add FileAPI to gemini.py

cd4a893

Create file_api

949971d

add FileData class events.py

59c7744

add file_api __init__.py

d86502e

Rename file_api to file_api.py

e27da96

added proper .py to file name.

Update gemini.py

40c7e3c

Create 26f-gemini-multimodal-live-files-api.py

f2d5b9a

This is an example to test usage of the Files API integration. Specifically with the Gemini Multimodal Live Service.

update correct upload endpoint file_api.py

7263d11

Create 26g-gemini-multimodal-live-groundingMetadata.py

f53f544

Add groundingMetadata and logging gemini.py

43c6f1f

Add groundingMetadata events.py

8070e15

Merge branch 'main' into groundingMetadata

737e8e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add groundingMetadata to Gemini Multimodal Live Service #1932

Add groundingMetadata to Gemini Multimodal Live Service #1932

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add groundingMetadata to Gemini Multimodal Live Service #1932

Are you sure you want to change the base?

Add groundingMetadata to Gemini Multimodal Live Service #1932

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!