8000 [EN] Match articles "a" and "an" for <the> by tannisroot · Pull Request #3014 · home-assistant/intents · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[EN] Match articles "a" and "an" for <the> #3014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tannisroot
Copy link
Contributor
@tannisroot tannisroot commented Feb 25, 2025

Whisper (for me) seems to always put article a after vacuum commands start or return.
Someone might also actually say it this way so let's handle that.

@tannisroot tannisroot marked this pull request as draft February 25, 2025 07:02
@tannisroot tannisroot marked this pull request as ready for review February 25, 2025 07:10
@tetele
Copy link
Contributor
tetele commented Feb 25, 2025

First of all, I don't think it's a good idea to prefix <name> (which can contain <the>) with an indefinite article. You would be able to say start a the roborock, which doesn't make grammatical sense.

Second of all, if we're doing this, I don't see why there wouldn't be the an form as well in there.

Third, I don't see why this would apply strictly to vacuums and not every other entity.

Fourth, to counter the the issues above and create new ones, why not add a[n] to <the>?

Finally, like I said numerous times before, I don't think it's wise to add incorrect sentences just to please Whisper or any other STT. The proper solution here would be to fix Whisper.

I'd like to hear the other language leaders' comments on this.

@tannisroot
Copy link
Contributor Author
tannisroot commented Feb 25, 2025

First of all, I don't think it's a good idea to prefix <name> (which can contain <the>) with an indefinite article. You would be able to say start a the roborock, which doesn't make grammatical sense.

Second of all, if we're doing this, I don't see why there wouldn't be the an form as well in there.

Third, I don't see why this would apply strictly to vacuums and not every other entity.

Fourth, to counter the the issues above and create new ones, why not add a[n] to <the>?

Finally, like I said numerous times before, I don't think it's wise to add incorrect sentences just to please Whisper or any other STT. The proper solution here would be to fix Whisper.

I'd like to hear the other language leaders' comments on this.

Oh I agree on most of this, and I can change the PR to have a[n] instead of just for the vacuum, I just thought it would an issue to have a change that would affect other commands. I would be more than happy to add the change directly to if that's what would be preferred.
And yes in my case it is fixing Whisper, but since Whisper is, like LLMs, a statistical model, it means in the data it was trained on, people would often say it with "a", or at least say it in a way that would make it sound like an "a". I am certainly guilty of mushing the "the" in such sentences in a way that makes it sound almost like "a".
It's not grammatically correct, sure, but the point of intents is to understand all people, not just people with good grammar knowledge, good pronunciation or the right accent/dialect, and it seems innocent enough?

@tannisroot
Copy link
Contributor Author

Also, as much as I would love to fix Whisper to follow grammar, but there is so much you can do with it to influence the output. OpenAI probably trained it on mega powerful datacenters with all the speech data they sucked off the internet, it's not really realistic to be able to somehow fix all the edge cases like this without the expertise and resources they had.
And yes, there is speech-to-phrase as an alternative, but in my testing bigger Whisper models are far, far better at understanding noisy, imperfect audio that you typically get out of assist satellites, as well as telling apart "turn off" and "turn on", so I am afraid it is here to stay for those who need local STT.

@tannisroot tannisroot changed the title [EN] Add articles to the vacuum intents [EN] Match articles "a" and "an" for <the> Feb 26, 2025
@tannisroot
Copy link
Contributor Author

I went ahead and added "a[n]" directly to , since adding it to the intents themselves leads to wonky matching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0