8000 Chatbot assistant on AWS by hammerhead · Pull Request #989 · crate/cratedb-examples · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Chatbot assistant on AWS #989

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Chatbot assistant on AWS #989

wants to merge 2 commits into from

Conversation

hammerhead
Copy link
Member
@hammerhead hammerhead commented Jun 17, 2025

About

Start using the cratedb-mcp package within a Jupyter Notebook, accompanied by language models served by Amazon Bedrock.

Preview

https://colab.research.google.com/github/crate/cratedb-examples/blob/hammerhead/chatbot-aws/topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb

Status

In testing ahead of the workshop on July 2nd.

Copy link
coderabbitai bot commented Jun 17, 2025

Walkthrough

A new Jupyter notebook has been introduced that demonstrates querying and analyzing synthetic industrial telemetry data using CrateDB and AWS Bedrock LLMs. The notebook covers data generation, schema setup, SQL querying via natural language, visualization, and introduces an agentic approach for intelligent, context-aware diagnostics by integrating LLMs with CrateDB metadata and machine manuals.

Changes

File(s) Change Summary
topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb New notebook implementing an end-to-end workflow for synthetic telemetry data analysis, LLM-based SQL generation, visualization, and agentic reasoning using CrateDB and AWS Bedrock. Adds utility functions for schema fetching, LLM prompting, SQL extraction, agent orchestration, and connection string conversion.
topic/multi-model/multi-model-offshore-wind-farms.ipynb Changed dependency installation from shell to Jupyter magic command; enhanced database connection code with pandas, explicit connection string, try-except block, test query with output, and improved error handling; minor cleanup of redundant imports and execution count reset.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Notebook
    participant CrateDB
    participant AWS_Bedrock_LLM
    participant InlineAgent

    User->>Notebook: Ask question in natural language
    Notebook->>CrateDB: Fetch table schema
    Notebook->>AWS_Bedrock_LLM: Send prompt with schema and question
    AWS_Bedrock_LLM-->>Notebook: Return SQL query (and reasoning)
    Notebook->>CrateDB: Execute SQL query
    CrateDB-->>Notebook: Return results
    Notebook->>User: Display answer and visualization

    Note over Notebook,InlineAgent: For advanced agentic queries
    User->>InlineAgent: Ask complex question
    InlineAgent->>CrateDB: Retrieve schema/manuals as tools
    InlineAgent->>AWS_Bedrock_LLM: Multi-step reasoning & SQL generation
    InlineAgent->>CrateDB: Execute queries as needed
    InlineAgent-->>User: Return synthesized answer
Loading

Possibly related PRs

  • tag-chat #902: Adds standalone synthetic timeseries data generation and storage scripts, which relate directly to the data generation steps in the new notebook.
  • Chatbot: Latest changes for workshop notebook #921: Updates telemetry diagnostics notebook with improved error handling and output formatting, related to telemetry/manual diagnostics workflows but distinct from this new integrated workshop notebook.

Poem

In a notebook’s warren, data flows anew,
Machines hum numbers, manuals in view.
CrateDB stores it, LLMs translate,
Questions to queries, results that elate.
With agents so clever and charts full of cheer,
The rabbit hops forward—insight is near!
🐰📊✨

✨ Finishing Touches
🧪 Generate Unit Tests
  • Create PR with Unit Tests
  • Post Copyable Unit Tests in Comment
  • Commit Unit Tests in branch hammerhead/chatbot-aws

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@hammerhead hammerhead force-pushed the hammerhead/chatbot-aws branch from c0f5e44 to 17a42f1 Compare June 17, 2025 07:44
@hammerhead hammerhead marked this pull request as ready for review June 17, 2025 07:44
@hammerhead hammerhead force-pushed the hammerhead/chatbot-aws branch from 17a42f1 to 159dcae Compare June 17, 2025 07:50
@hammerhead
Copy link
Member Author

@coderabbitai full review

Copy link
coderabbitai bot commented Jun 17, 2025
✅ Actions performed

Full review triggered.

@hammerhead

This comment was marked as off-topic.

This comment was marked as off-topic.

@hammerhead hammerhead force-pushed the hammerhead/chatbot-aws branch from 159dcae to ca72287 Compare June 17, 2025 08:05
Copy link
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (1)

745-751: Runtime brittleness – hard dependency on VIRTUAL_ENV & clear-text creds

os.environ['VIRTUAL_ENV'] may be unset (e.g., system Python, docker) leading to KeyError. Also, embedding the password in the resulting HTTP URL logs it in process lists.

  • Guard against missing env var:
    venv_bin = Path(sys.executable).parent  # robust detector
  • Use parse.quote_plus(parsed.password or "") and consider retrieving credentials from a secure store instead of echoing them into env vars.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c868086 and ca72287.

📒 Files selected for processing (1)
  • topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Python: 3.10 CrateDB: nightly on ubuntu-latest
  • GitHub Check: Python: 3.13 CrateDB: nightly on ubuntu-latest

Copy link
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (3)

84-86: Hard-coded local wheel path still breaks portability

The pip install points to a developer-specific wheel on the local filesystem, so the notebook fails everywhere except your machine or CI worker with that exact path.


265-267: if_exists="replace" silently drops the schema you just defined

Using replace re-creates the table with inferred generic types, wiping the PRIMARY KEY that was set earlier. Use append (or COPY) after creating the table explicitly.


521-525: Fragile regex will raise AttributeError on unexpected LLM output

re.search("(?<=sql)([^]*)(?=)", …).group(1)assumes the LLM always emits fenced ```sql blocks with lower-casesql`. Any deviation breaks extraction. Introduce a tolerant fallback as suggested previously.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ca72287 and 62b0121.

📒 Files selected for processing (1)
  • topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Python: 3.10 CrateDB: nightly on ubuntu-latest
  • GitHub Check: Python: 3.13 CrateDB: nightly on ubuntu-latest
🔇 Additional comments (2)
topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (2)

396-416: Good job switching to bind parameters

The query now uses SQLAlchemy’s bindparams, eliminating the positional ? placeholder and reducing injection risk.


730-735: VIRTUAL_ENV environment variable may be undefined

os.environ['VIRTUAL_ENV'] raises KeyError outside a virtual-env context (e.g., in Docker or CI). Guard with os.getenv or allow an override.

-            command=f"{os.environ['VIRTUAL_ENV']}/bin/cratedb-mcp",
+            command=f"{os.getenv('VIRTUAL_ENV', '/usr/local')}/bin/cratedb-mcp",

Copy link
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (3)

265-267: if_exists="replace" still drops the schema – previous advice not applied
Re-loading via replace recreates motor_readings without the PRIMARY KEY you defined earlier, undoing constraints & indexes. Use append (or COPY) instead.


521-523: Fragile regex for SQL extraction remains
re.search("(?<=\sql)([^]*)(?=\)", …)will raiseAttributeError` whenever the LLM deviates from the exact fence pattern. A tolerant fallback (first ``` block or raw response) is still recommended.


717-724: sqlalchemy_to_http crashes when password is absent
parse.quote(parsed.password) dereferences None if the connection string omits a password (common for local clusters). Guard against missing username/password as suggested earlier.

🧹 Nitpick comments (1)
topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (1)

82-87: Pin dependency versions to ensure notebook reproducibility

Running !pip install -U … with un-pinned versions means every execution can silently pull newer (possibly incompatible) releases, breaking the workshop. Capture exact versions in a requirements.txt (or lock-file) and install once.

-!pip install -U \
-    pandas matplotlib ipython-sql tqdm \
-    crate cratedb-mcp==0.0.0 sqlalchemy-cratedb \
-    "inlineagent @ git+https://github.com/awslabs/amazon-bedrock-agent-samples@4a5d72a#subdirectory=src/InlineAgent"
+# requirements.txt (excerpt)
+pandas==2.2.2
+matplotlib==3.9.0
+ipython-sql==0.5.2
+tqdm==4.66.4
+crate==0.32.1
+sqlalchemy-cratedb==0.7.1
+cratedb-mcp==0.0.0
+inlineagent @ git+https://github.com/awslabs/amazon-bedrock-agent-samples@4a5d72a#subdirectory=src/InlineAgent
+
+!pip install -r requirements.txt
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62b0121 and 750f63f.

📒 Files selected for processing (1)
  • topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Python: 3.10 CrateDB: nightly on ubuntu-latest
  • GitHub Check: Python: 3.13 CrateDB: nightly on ubuntu-latest

Copy link
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (3)

510-515: Fragile regex to extract SQL remains unfixed

Previous review already covered this; risk of AttributeError if the LLM deviates from the exact sql … fence.


729-736: sqlalchemy_to_http() still crashes when user/password are absent

Same concern as in the earlier review: parse.quote(parsed.password) will raise when the password is None.


750-753: Hard-coded VIRTUAL_ENV lookup can raise KeyError on non-venv setups

The comment mirrors the outstanding issue from the previous review.

🧹 Nitpick comments (1)
topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (1)

238-258: Consider vectorising the synthetic-data generator

The nested Python loop generates ~4 million rows for the default settings and will be noticeably slow in Jupyter.
A NumPy / pandas-native approach (e.g. pre-allocating arrays and using np.random.* on the full array) will cut runtime by orders of magnitude.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 750f63f and 1b9cdd3.

📒 Files selected for processing (1)
  • topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Python: 3.10 CrateDB: nightly on ubuntu-latest
  • GitHub Check: Python: 3.13 CrateDB: nightly on ubuntu-latest
🔇 Additional comments (1)
topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (1)

265-266: 👍 Switched to if_exists="append" – schema now preserved

The earlier replace would silently drop PK / index definitions; append avoids that.
Looks good.

Copy link
Member
@amotl amotl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do an extensive review, but I guess it will be good to go?
When applicable, @hlcianfagna, @surister or @kneth may have an additional review on it?

Excellent stuff. Approving, thanks!

Copy link
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (1)

511-515: Fragile regex pattern still present - duplicate issue.

The SQL extraction still uses the fragile regex re.search("(?<=sql)([^]*)(?=)", response)` that will crash on unexpected LLM output. This was previously identified as a critical issue but hasn't been addressed.

🧹 Nitpick comments (2)
topic/multi-model/multi-model-offshore-wind-farms.ipynb (1)

344-344: Improve consistency in database connection usage.

The notebook inconsistently uses CONNECTION_STRING directly with pandas in some places and the engine/connection objects in others. For better connection management and consistency, consider using the engine object throughout.

-df = pd.read_sql(query, CONNECTION_STRING)
+df = pd.read_sql(query, engine)

Apply similar changes to all pandas read_sql calls for consistency.

Also applies to: 484-484

topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (1)

1-1092: Overall excellent workshop notebook with most issues addressed.

This comprehensive notebook successfully demonstrates LLM integration with CrateDB for timeseries analysis. Most previous critical issues have been properly addressed:

✅ Modern git+https package installation
✅ Preserved table schema with append mode
✅ Secure parameterized queries
✅ Portable command execution

The content is well-structured for educational purposes and provides a complete end-to-end example of AI agents working with structured data.

Consider adding a brief troubleshooting section in the markdown to help users if the LLM returns unexpected output formats, especially given the remaining regex fragility issue.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e758c01 and 62d36e1.

📒 Files selected for processing (2)
  • topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (1 hunks)
  • topic/multi-model/multi-model-offshore-wind-farms.ipynb (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Python: 3.10 CrateDB: nightly on ubuntu-latest
🔇 Additional comments (9)
topic/multi-model/multi-model-offshore-wind-farms.ipynb (4)

30-30: Excellent improvement using Jupyter magic command.

Changing from ! pip install to %pip install is a best practice as it ensures packages are installed in the correct Python environment that the kernel is using.


54-54: Good addition of required import.

The pandas import is necessary for the connection validation logic added below.


56-57: Improved documentation and clarity.

The connection string examples and comments are now more explicit and user-friendly, making it easier for users to understand how to configure their database connections.

Also applies to: 60-60, 63-64


254-254: Good notebook hygiene.

Resetting the execution count to null is good practice for committing notebooks, removing execution artifacts.

topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (5)

82-86: LGTM! Package installation properly addressed.

The installation now uses the modern git+https syntax with commit hash pinning, which properly addresses the previous hard-coded wheel path issue and ensures reproducibility across environments.


266-266: Good fix! Using append preserves the schema.

The change from if_exists="replace" to if_exists="append" properly addresses the previous issue where the explicitly defined table schema (including PRIMARY KEY) would be dropped during data loading.


355-365: Excellent security improvement with parameterized queries.

The schema fetching function now properly uses SQLAlchemy's text() and bindparams() for safe query construction, which addresses the previous SQL injection risk from unsafe parameter handling.


752-752: Good fix! More portable command execution.

Using "cratedb-mcp" directly instead of the hard-coded VIRTUAL_ENV path properly addresses the previous portability issue and makes the notebook work across different environments.


735-735: ```shell
#!/bin/bash
python3 - << 'EOF'
import json, sys

notebook_path = "topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb"
try:
with open(notebook_path, 'r', encoding='utf-8') as f:
nb = json.load(f)
except FileNotFoundError:
sys.exit(f"ERROR: {notebook_path} not found")

for cell in nb.get('cells', []):
if cell.get('cell_type') == 'code':
src = cell.get('source', [])
for i, line in enumerate(src):
if 'def sqlalchemy_to_http' in line:
print("".join(src[i:i+50]))
sys.exit(0)
print("Function sqlalchemy_to_http not found in code cells.")
EOF


</details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment on lines +66 to +77
"# Try to connect\n",
"try:\n",
" engine = sa.create_engine(CONNECTION_STRING)\n",
" connection = engine.connect()\n",
"\n",
" # Run a simple query to validate connection\n",
" result = pd.read_sql(\"SELECT mountain FROM sys.summits LIMIT 1\", con=engine)\n",
" print(\"✅ Successfully connected to CrateDB!\")\n",
" print(\"Sample query result from sys.summits:\", result.iloc[0][\"mountain\"])\n",
"except Exception as e:\n",
" print(\"❌ Failed to connect to CrateDB. Please check your connection string.\")\n",
" print(\"Error:\", e)"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Excellent connection validation with a scope concern.

The try-except block significantly improves user experience with proper error handling and clear feedback. However, if the connection fails, the connection variable won't be defined but is used throughout the rest of the notebook (e.g., line 102).

Consider either exiting early on connection failure or ensuring the connection variable is always defined.

-try:
-    engine = sa.create_engine(CONNECTION_STRING)
-    connection = engine.connect()
-
-    # Run a simple query to validate connection
-    result = pd.read_sql("SELECT mountain FROM sys.summits LIMIT 1", con=engine)
-    print("✅ Successfully connected to CrateDB!")
-    print("Sample query result from sys.summits:", result.iloc[0]["mountain"])
-except Exception as e:
-    print("❌ Failed to connect to CrateDB. Please check your connection string.")
-    print("Error:", e)
+try:
+    engine = sa.create_engine(CONNECTION_STRING)
+    connection = engine.connect()
+
+    # Run a simple query to validate connection
+    result = pd.read_sql("SELECT mountain FROM sys.summits LIMIT 1", con=engine)
+    print("✅ Successfully connected to CrateDB!")
+    print("Sample query result from sys.summits:", result.iloc[0]["mountain"])
+except Exception as e:
+    print("❌ Failed to connect to CrateDB. Please check your connection string.")
+    print("Error:", e)
+    print("Please fix the connection string and restart the kernel.")
+    raise  # Stop execution if connection fails
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"# Try to connect\n",
"try:\n",
" engine = sa.create_engine(CONNECTION_STRING)\n",
" connection = engine.connect()\n",
"\n",
" # Run a simple query to validate connection\n",
" result = pd.read_sql(\"SELECT mountain FROM sys.summits LIMIT 1\", con=engine)\n",
" print(\"✅ Successfully connected to CrateDB!\")\n",
" print(\"Sample query result from sys.summits:\", result.iloc[0][\"mountain\"])\n",
"except Exception as e:\n",
" print(\"❌ Failed to connect to CrateDB. Please check your connection string.\")\n",
" print(\"Error:\", e)"
# Try to connect
try:
engine = sa.create_engine(CONNECTION_STRING)
connection = engine.connect()
# Run a simple query to validate connection
result = pd.read_sql("SELECT mountain FROM sys.summits LIMIT 1", con=engine)
print("✅ Successfully connected to CrateDB!")
print("Sample query result from sys.summits:", result.iloc[0]["mountain"])
except Exception as e:
print("❌ Failed to connect to CrateDB. Please check your connection string.")
print("Error:", e)
print("Please fix the connection string and restart the kernel.")
raise # Stop execution if connection fails
🤖 Prompt for AI Agents
In topic/multi-model/multi-model-offshore-wind-farms.ipynb around lines 66 to
77, the try-except block handles connection errors but leaves the connection
variable undefined on failure, causing issues later in the notebook. To fix
this, either exit or stop execution immediately after a failed connection
attempt inside the except block, or define the connection variable as None or a
safe default before the try block to ensure it is always defined regardless of
success or failure.

Smart assistant with MCP server integration and inline agent definition
@hammerhead hammerhead force-pushed the hammerhead/chatbot-aws branch from 62d36e1 to 66cf891 Compare June 24, 2025 06:32
Copy link
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (1)

511-515: Fragile regex pattern still present.

The regex re.search("(?<=sql)([^]*)(?=)", response)` remains fragile and will fail if the LLM output format varies. As noted in the previous review, this needs tolerant parsing with fallback handling.

🧹 Nitpick comments (1)
topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (1)

748-758: Improved command resolution but could be more robust.

Using command="cratedb-mcp" is better than the previous VIRTUAL_ENV approach, but consider adding fallback path resolution to handle cases where the command isn't in PATH.

-            command="cratedb-mcp",
+            command=shutil.which("cratedb-mcp") or "cratedb-mcp",

Add import for shutil at the top of the cell if implementing this suggestion.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62d36e1 and 66cf891.

📒 Files selected for processing (2)
  • topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (1 hunks)
  • topic/multi-model/multi-model-offshore-wind-farms.ipynb (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • topic/multi-model/multi-model-offshore-wind-farms.ipynb
🔇 Additional comments (8)
topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (8)

82-86: Package installation approach looks good.

The updated installation using git+https://github.com/awslabs/amazon-bedrock-agent-samples@4a5d72a#subdirectory=src/InlineAgent addresses the previous concern about hard-coded local wheel paths and provides proper version pinning with the commit hash.


136-162: Robust connection setup with proper error handling.

The connection validation approach using a try-catch block with a test query against sys.summits is well-implemented and provides clear feedback on connection status.


183-268: Table creation and data loading properly implemented.

The explicit table schema definition followed by if_exists="append" correctly preserves the schema structure including the PRIMARY KEY, addressing the previous concern about schema dropping.


346-378: Schema fetching function properly secured.

The use of text() with bindparams(tbl=table_name) correctly addresses the previous SQL injection concern by using proper parameter binding instead of string interpolation.


610-669: Well-implemented visualization logic.

The conditional plotting based on timestamp column presence and proper handling of different timestamp formats (epoch milliseconds vs datetime) makes this visualization code robust and user-friendly.


846-935: Well-designed manual generation for workshop purposes.

The fictional manual generation creates realistic maintenance documentation with varied operational limits and procedures. The inclusion of REFRESH TABLE ensures immediate data availability in CrateDB.


950-976: Clean manual display implementation.

The HTML-formatted scrollable display with monospace font provides excellent readability for the machine manual content in the Jupyter environment.


722-736: Inconsistency with past review resolution.

The sqlalchemy_to_http function still contains parse.quote(parsed.password) which will crash when password is None, despite the past review comment being marked as addressed. The fix for handling None passwords appears to be missing.

Please verify if the fix for handling None passwords was properly applied:

#!/bin/bash
# Check if there are any recent changes to this function that handle None passwords
rg -A 10 -B 5 "sqlalchemy_to_http|parse\.quote.*password"

Copy link
Contributor
@hlcianfagna hlcianfagna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great content

"print(f\"✅ Generated {len(df_ts)} rows of synthetic timeseries data.\")\n",
"\n",
"# --- Load to CrateDB ---\n",
"df_ts.to_sql(\"motor_readings\", con=engine, if_exists=\"append\", index=False)\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method=insert_bulk,
chunksize=5000,

?

Copy link
Contributor
@kneth kneth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. It's a good example on how to use AI with CrateDB.

Copy link
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (2)

511-515: Fragile regex pattern remains unchanged - potential runtime failure.

The regex "(?<=sql)([^]*)(?=)"still assumes exact LLM output formatting and will crash withAttributeError` if the pattern doesn't match. While noted as acceptable for workshop purposes in past reviews, this remains a runtime risk.


731-737: Password handling still vulnerable to None values.

Despite past comments indicating this was addressed, the current code still uses parse.quote(parsed.password) directly without None checking, which will crash if the connection string lacks a password.

Apply this fix to handle None values safely:

-    return f"{protocol}://{parsed.username}:{parse.quote(parsed.password)}@{parsed.hostname}:4200"
+    user = parsed.username or ""
+    pwd = parse.quote(parsed.password) if parsed.password else ""
+    creds = f"{user}:{pwd}@" if user or pwd else ""
+    return f"{protocol}://{creds}{parsed.hostname}:4200"
🧹 Nitpick comments (1)
topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (1)

970-979: Consider HTML escaping for defensive programming.

While the synthetic manual content is safe, directly embedding text in HTML without escaping could be risky if this pattern is reused with user-generated content.

For better defensive programming:

+import html
 display(
     HTML(
         f"""
 <h4>📘 Manual for Machine ID: {machine_id}</h4>
 <div style="border:1px solid #ccc; padding:10px; max-height:400px; overflow:auto; font-family:monospace; white-space:pre-wrap;">
-{manual_text}
+{html.escape(manual_text)}
 </div>
 """
     )
 )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 66cf891 and 7f1bc0f.

📒 Files selected for processing (2)
  • topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (1 hunks)
  • topic/multi-model/multi-model-offshore-wind-farms.ipynb (12 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • topic/multi-model/multi-model-offshore-wind-farms.ipynb
🔇 Additional comments (7)
topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb (7)

82-86: LGTM! Package installation follows best practices.

The git+https syntax with commit hash pinning provides good reproducibility while avoiding the previous local wheel file dependency issue.


136-162: Excellent database connection setup with proper validation.

The combination of environment variable configuration, exception handling, and connection testing with a simple query demonstrates solid database connectivity practices.


261-267: Correct use of if_exists="append" preserves table schema.

Good practice using "append" mode which maintains the explicitly defined table schema and constraints from the previous step.


352-377: Schema fetching function properly implements secure parameter binding.

The use of SQLAlchemy's bindparams() approach addresses previous security concerns and provides better compatibility across different database APIs.


636-671: Well-structured visualization logic with proper format handling.

The conditional plotting based on timestamp presence, automatic format detection for epoch vs datetime, and graceful error handling make this a robust visualization implementation.


750-761: MCP server configuration properly avoids hard-coded paths.

Good improvement using just the command name "cratedb-mcp" rather than hard-coded virtual environment paths, making this more portable across different environments.


870-938: Comprehensive manual generation with appropriate data handling.

The synthetic manual generation creates realistic content with configurable parameters, and the data storage approach correctly uses append mode with proper CrateDB refresh handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0