[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flattened dot-separated collection fields not present in highlights when path matches nested object #2071

Open
tharropoulos opened this issue Nov 18, 2024 · 1 comment

Comments

@tharropoulos
Copy link
Contributor

Description

If a collection schema includes a nested object field that matches a certain path:

"fields": [
  {"name": "obj", "type": "object"},
  {"name": "obj.nested", "type": "object"},
  {"name": "obj.nested.normal", "type": "string"}
],

And there's another field that matches the final path of the nested schema:

{"name": "obj.nested.normal.flattened", "type": "string"}

When searching by the flattened field, highlights aren't displayed afterwards.

This request

❯ curl -X GET "http://localhost:8108/collections/collection/documents/search?q=value&query_by=obj.nested.normal.flattened" \
     -H "Content-Type: application/json" \
     -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}"

Nets this result:

{
  "facet_counts": [],
  "found": 1,
  "hits": [
    {
      "document": {
        "id": "1",
        "obj": { "nested": { "normal": "nested value" } },
        "obj.nested.normal.flattened": "normal value"
      },
      "highlight": {}, // No highlights
      "highlights": [],
      "text_match": 578730123365187705,
      "text_match_info": {
        "best_field_score": "1108091338752",
        "best_field_weight": 15,
        "fields_matched": 1,
        "num_tokens_dropped": 0,
        "score": "578730123365187705",
        "tokens_matched": 1,
        "typo_prefix_score": 0
      }
    }
  ],
  "out_of": 2,
  "page": 1,
  "request_params": {
    "collection_name": "collection",
    "first_q": "value",
    "per_page": 10,
    "q": "value"
  },
  "search_cutoff": false,
  "search_time_ms": 0
}

Steps to reproduce

Create the underlying collection:

❯ curl -X POST http://localhost:8108/collections \
     -H "Content-Type: application/json" \
     -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
     -d '{
           "name": "collection",
           "fields": [
             {"name": "obj", "type": "object"},
	     {"name": "obj.nested", "type": "object"},
	     {"name": "obj.nested.normal", "type": "string"},
	     {"name": "obj.nested.normal.flattened", "type": "string"}
           ],
           "enable_nested_fields": true
         }'

Create a document:

❯ curl -X POST http://localhost:8108/collections/collection/documents \
     -H "Content-Type: application/json" \
     -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
     -d '
         {
           "obj": {
             "nested": {
               "normal": "nested value"
             }
           },
           "obj.nested.normal.flattened": "normal value"
         }
         '

Search by the normal nested field:

❯  curl -X GET "http://localhost:8108/collections/collection/documents/search?q=value&query_by=obj.nested.normal" \
     -H "Content-Type: application/json" \
     -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}"
{
  "facet_counts": [],
  "found": 1,
  "hits": [
    {
      "document": {
        "id": "1",
        "obj": { "nested": { "normal": "nested value" } },
        "obj.nested.normal.flattened": "normal value"
      },
      "highlight": {
        "obj": {
          "nested": {
            "normal": { "matched_tokens": ["value"], "snippet": "nested <mark>value</mark>" }
          }
        }
      },
      "highlights": [],
      "text_match": 578730123365187706,
      "text_match_info": {
        "best_field_score": "1108091338752",
        "best_field_weight": 15,
        "fields_matched": 2,
        "num_tokens_dropped": 0,
        "score": "578730123365187706",
        "tokens_matched": 1,
        "typo_prefix_score": 0
      }
    }
  ],
  "out_of": 2,
  "page": 1,
  "request_params": {
    "collection_name": "collection",
    "first_q": "value",
    "per_page": 10,
    "q": "value"
  },
  "search_cutoff": false,
  "search_time_ms": 0
}

And then search by the flattened field:

❯ curl -X GET "http://localhost:8108/collections/collection/documents/search?q=value&query_by=obj.nested.normal.flattened" \
     -H "Content-Type: application/json" \
     -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}"

Expected Behavior

Having a highlight as it would if there weren't any nesting:

      "highlight": {
        "obj.nested.normal.flattened": {
          "matched_tokens": ["value"],
          "snippet": "flattened <mark>value</mark>"
        }
      }

Actual Behavior

It returns an empty highlight object.

Metadata

Typesense Version: 27.1

OS: Docker on EndeavourOS x86_64 Linux 6.11.5-arch1-1

@tharropoulos
Copy link
Contributor Author

It may have to do with this (:2623):

            if(!highlight_items.empty()) {
                copy_highlight_doc(highlight_items, enable_nested_fields, document, highlight_res);
                remove_flat_fields(highlight_res);
                remove_reference_helper_fields(highlight_res);
                highlight_res.erase("id");
            }

and:

void Collection::copy_highlight_doc(std::vector<highlight_field_t>& hightlight_items,
                                    const bool nested_fields_enabled,
                                    const nlohmann::json& src, nlohmann::json& dst) {
    for(const auto& hightlight_item: hightlight_items) {
        if(!nested_fields_enabled && src.count(hightlight_item.name) != 0) {
            dst[hightlight_item.name] = src[hightlight_item.name];
            continue;
        }

        std::string root_field_name;

        for(size_t i = 0; i < hightlight_item.name.size(); i++) {
            if(hightlight_item.name[i] == '.') {
                break;
            }

            root_field_name += hightlight_item.name[i];
        }

        if(dst.count(root_field_name) != 0) {
            // skip if parent "foo" has already has been copied over in e.g. foo.bar, foo.baz
            continue;
        }

        // root field name might not exist if object has primitive field values with "."s in the name
        if(src.count(root_field_name) != 0) {
            // copy whole sub-object
            dst[root_field_name] = src[root_field_name];
        } else if(src.count(hightlight_item.name) != 0) {
            dst[hightlight_item.name] = src[hightlight_item.name];
        }
    }
}

copy_highlght_doc will ignore the check as nested_fields_enabled will be true in this case, so the highlight won't be copied over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant