[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

searching Chinese doesn't yield results even in simple cases #2085

Open
dannylin108 opened this issue Dec 1, 2024 · 1 comment
Open

searching Chinese doesn't yield results even in simple cases #2085

dannylin108 opened this issue Dec 1, 2024 · 1 comment

Comments

@dannylin108
Copy link

Description

Search is not working with this simple configuration:

const testData = [
    {
        lvl0: "Zhuang Zi",
        lvl1: "Chapter 2",
        content: "昔者莊周夢為胡蝶,栩栩然胡蝶也。自喻適志與,不知周也。俄然覺,則蘧蘧然周也,不知周之夢為胡蝶與,胡蝶之夢為周與?周與胡蝶,則必有分矣。此之謂物化。",
        url: "/zhuangzi-2#p269",
        url_without_anchor: "/zhuangzi-2",
        item_priority: 100,
        comment: false,
    }
]

const indexName = "zhuangzi-2";

const schema: CollectionCreateSchema = {
    name: indexName,
    fields: [
        { name: "lvl0", type: "string" },
        { name: "lvl1", type: "string" },
        { name: "content", type: "string", locale: "zh" }, // <--- !!!!
        { name: "url", type: "string" },
        { name: "url_without_anchor", type: "string" },
        { name: "item_priority", type: "int32" },
        { name: "comment", type: "bool" },
    ],
}; 


async function main() {
    await typesenseClient.collections().create(schema);
    await typesenseClient.collections(indexName).documents().import(testData);

    const searchResult = await typesenseClient
        .collections(indexName)
        .documents()
        .search({
            q: "蝶",
            query_by: "content",
        });
    console.log(searchResult.hits);
}

main();

Steps to reproduce

Repo for reproduction

https://github.com/dannylin108/typesense-bug-report

Expected Behavior

Should find the only document, which actually contains the character we were looking for.

Actual Behavior

Result is []

Metadata

Typesense Version: 0.27.1

OS: Linux

@19920716
Copy link

I have also encountered such a problem, which is not friendly to Chinese. text_search is OK. The embedding model I use is jina-embedding-v3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants