8000 feat: live repo indexing + CI integration · Issue #37 · cased/kit · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
feat: live repo indexing + CI integration #37
Open
@tnm

Description

@tnm

Description

Currently, kit's indexing capabilities (e.g., DocstringIndexer, VectorSearcher) are primarily designed for local, on-demand use. To enhance kit's utility for teams and automated workflows, add "live" or continuous repository indexing, integrated with CI/CD pipelines.

Goals

  1. Automated Index Updates: Enable kit indexes (docstring summaries, semantic vector indexes) for specified repositories to be updated automatically as the codebase evolves.
  2. CI/CD Integration: Leverage CI/CD workflows (e.g., GitHub Actions) to trigger and manage these indexing processes.
  3. Shared Index Access: Ensure that the updated indexes are stored in a location accessible to relevant services or users (e.g., for a shared semantic search tool, an AI-powered Q&A bot over the codebase, etc.).

Potential Approaches for "Live" Indexing

The following approaches will be explored:

  • Webhook-Triggered: Indexing is initiated by repository events (e.g., push to main, merge of a PR)
  • Periodic/Scheduled: Indexing runs at regular intervals (e.g., nightly)
  • Incremental Updates: Focus on efficiently updating indexes based on changes (diffs) rather than full re-indexes where possible

Key Challenges & Considerations

Index Storage & Accessibility

  • Where should shared indexes be stored (e.g., dedicated ChromaDB server, cloud object storage, etc.)?
  • How will different parts of kit (or tools built with kit) access these shared indexes?
  • This will likely involve using configurable backends like RedisCacheBackend for shared caching and a persistent, network-accessible solution for VectorDBBackend

Scalability & Performance

  • Indexing large repositories or frequent updates can be resource-intensive
  • Optimizing indexing speed (e.g., effective caching, parallel processing, incremental updates) will be crucial

Configuration Management

  • How will users configure which repositories are indexed, how often, and with what kit settings (LLM models, embedding functions, etc.)?
  • Securely managing credentials (e.g., Git tokens, LLM API keys, database credentials) for CI jobs

Error Handling & Monitoring

  • Robust error handling for indexing jobs
  • Monitoring for indexing status and health

Resource Management for CI

  • Managing the cost and execution time of indexing jobs within CI/CD systems

Use Cases

  • Powering a constantly up-to-date semantic search service for a team's codebase
  • Providing fresh context to LLM-based developer tools (Q&A bots, code assistants) that operate on evolving repositories
  • Automated generation of code summaries or documentation artifacts as code changes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0