8000 Additional bindings logic by shabani1 · Pull Request #12 · lexy-ai/lexy · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Additional bindings logic #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 18, 2023
Merged

Additional bindings logic #12

merged 2 commits into from
Oct 18, 2023

Conversation

shabani1
Copy link
Contributor

What

This PR adds logic for processing a new binding.

  • When a new binding is added, tasks are generated for each applicable document.
  • The output of tasks are saved using a DB task which does not contain hard coded names for index fields
  • This PR does not address the issue of serialization for embeddings
    • There are currently two DB tasks that need to be combined into one
      • save_result_to_index works only for embedding fields
      • save_records_to_index works only for non-embedding fields
    • Will address this issue in the next PR

Test plan

  • Create a new transformer with the following payload
{
  "transformer_id": "text.counter.word_counter",
  "path": "lexy.transformers.counter.word_counter",
  "description": "Returns count of words and the longest word"
}
  • Create a new index with the following payload
{
  "index_id": "word_counts",
  "description": "Word counts",
  "index_table_schema": {},
  "index_fields": {
      "word_count": {"type": "int"}, 
      "longest_word": {"type": "string", "optional": true}
  }
}
  • Ensure that the index table zzidx__word_counts has been created
    • This is a manual step right now, run lexy.core.events.create_new_index_table('word_counts')
  • Create a new binding with the following payload
{
  "collection_id": "default",
  "transformer_id": "text.counter.word_counter",
  "index_id": "word_counts",
  "description": "New binding for word counts",
  "execution_params": {},
  "transformer_params": {},
  "filters": {}
}

The test should produce tasks for each document in the default collection using the word_counter transformer, and save the results in the index table zzidx__word_counts.

@shabani1 shabani1 requested a review from jnnnthnn October 17, 2023 23:44
@shabani1 shabani1 merged commit 873a123 into main Oct 18, 2023
@shabani1 shabani1 deleted the additional-bindings-logic branch October 18, 2023 16:57
shabani1 added a commit that referenced this pull request Oct 18, 2023
# What

This is a follow up to #12 allowing for the use of
`save_results_to_index` instead of having to use two different DB tasks
to save embeddings and non-embeddings.

- It uses a very inefficient conversion of Numpy arrays to lists
    - Will update this when updating to Pydantic 2.0
- It requires use of `text_embedding_transformer` instead of
`text_embedding`
- The former simply returns the result of the latter as `{'embedding':
result}`
- Need to create a decorator to wrap output with column names as part of
a future PR

# Test plan

Following the test plan of #12, adding a new document to the default
collection should now generate two tasks, one for text embedding and one
for word counts.
shabani1 added a commit that referenced this pull request Oct 22, 2023
# What

This PR adds the `lexy_transformer` decorator. The decorator can be
imported and applied as follows.

```python
from lexy.transformers import lexy_transformer

@lexy_transformer(name="text.embeddings.minilm")
def text_embeddings(sentences: list[str]) -> torch.Tensor:
    ...
```

When applied to a function, the decorator will:
* Register the function as a celery shared task with the name
`lexy.transformers.{name}`
* Add the kwarg `lexy_index_fields` to the function signature
* When run without the argument, the decorated function behaves as it
normally would
* With the argument, the decorated function returns its output with the
labels specified in `lexy_index_fields` (see example below)

```python
@lexy_transformer(name="add_and_subtract")
def add_and_subtract(a, b):
    return a + b, a - b

add_and_subtract(5, 3)
# returns (8, 2)

add_and_subtract(5, 3, lexy_index_fields=["sum", "difference"])
# returns [{'sum': 8, 'difference': 2}]
```

# Test plan

Similar to the test plan for #12, though bindings need the additional
keyword argument `lexy_index_fields`.

- Create a new transformer with the following payload:
```json
{
  "transformer_id": "text.counter.word_counter",
  "path": "lexy.transformers.counter.word_counter",
  "description": "Returns count of words and the longest word"
}
```
- Create a new index with the following payload:
```json
{
  "index_id": "word_counts",
  "description": "Word counts",
  "index_table_schema": {},
  "index_fields": {
      "word_count": {"type": "int"}, 
      "longest_word": {"type": "string", "optional": true}
  }
}
```
- Ensure that the index table `zzidx__word_counts` has been created
- This is a manual step right now, run
`lexy.core.events.create_new_index_table('word_counts')`
- Create a new binding with the following payload:
```json
{
  "collection_id": "default",
  "transformer_id": "text.counter.word_counter",
  "index_id": "word_counts",
  "description": "New binding for word counts",
  "execution_params": {},
  "transformer_params": {
    "lexy_index_fields": ["word_count", "longest_word"]
  },
  "filters": {}
}
```

The test should produce tasks for each document in the default
collection using the `word_counter` transformer, and save the results in
the index table `zzidx__word_counts`.

- Finally, the existing binding for `default_text_embeddings` (i.e.,
binding with `id=1`) should be patched with the following payload:
```json
{
  "transformer_params": {
    "lexy_index_fields": ["embedding"]
  }
}
```

Now, any new documents added to the default collection should trigger
two jobs, one for each binding.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0