8000 GitHub - djalilhebal/namaene: Namaene - IPA vocalizer. Pronouncing names and stuff.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

djalilhebal/namaene

Repository files navigation

Namaene

Namaene: IPA vocalizer. Pronouncing names and stuff.

A web app that helps you tell the world how to pronounce your name or whatever.

Main page Counter history

Features

Functionalities or possible improvements.

  • IPA pronunciation using TTS.

  • Shareable URLs (saving state in the URL).

  • Caching.

    • Cache-Control public immutable forever for one year.
    • ETag for conditional requests (revalidation) to /api/speak.
    • Last-Modified for conditional requests to /api/voices.
      • You know, in case the list of supported voices changes during the lifetime of the current deployment.
      • For now, we are letting Next handle this: Caching the first response. The route is not dynamic by default.
      • It returns something with Cache-Control: public, max-age=0, must-revalidate and a strong ETag.
  • Analytics (Vercel's).

Usage

Getting Started

Assuming Azure Speech service's environment variables (SPEECH_REGION and SPEECH_KEY) defined...

Running the development server:

npm run dev
# List supported voices
curl -v http://localhost:3000/api/voices

# Before speaking (counter)
curl -v http://localhost:3000/api/counter

# Speaking
ffplay "http://localhost:3000/api/speak?voice=en-US-JennyNeural&ipa=ˈraɪzli"

# Speaking: Scorpion, 3a9rab
ffplay "http://localhost:3000/api/speak?voice=ar-DZ-IsmaelNeural&ipa=ʕaq.rab"

# After speaking (counter)
curl -v http://localhost:3000/api/counter

REST API

GET /speak?voice={voice}&ipa={ipa}
    returns binary data
    Content-Type: audio/ogg
    Cache-Control: public, immutable, max-age=31536000
    ETag: strong hash of the request params (IPA and voice)
    Codes: 200, 400

GET /voices
    returns Array<{name, locale, ipaSymbols: null | string[]}>
    Codes: 200

GET /counter
    returns {currentCount: number}
    Codes: 200

GET /counter-history
    returns Array<{[YearMonth]: counterAsString}>

Technologies used

Name

  • Namaene = the pronounciation or sound of a name.
    • Namae means "name".
    • Ne means "sound" as in Miku Hatsune (first-sound) or Len Kagamine (mirror-sound; mirrored sound as in echo?).

The project's name is a nod to Vocaloid.

  • NaNe: Simplified version, uses Na as in Kimi no Na wa.
    • The nane subdomain is not available on Vercel.

Choosing a TTS provider

Requirements:

  • Impresses employers.
  • SSML support.
  • Free enough. (Requires credit card? Fine. Limited resources? Fine. Automatically charged after exceeding the free quota? Sucks, but may be manageable.)

Options:

Fallback

Use some fallback if the API is unusable for whatever reason?

  • Local. Use Web Speech API. It should work.

The text may be provided as plain text, or a well-formed SSML document. The SSML tags will be stripped away by devices that don't support SSML.

-- https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesisUtterance/text

However, when testing with Chrome/Edge v119 on Windows 10 Pro using various builtin engines (local and online), it did not seem to work. Either the text is spoken as if tags were stripped (like not considering emphasis and IPA phonemes) or it reads the XML file literally. As of 2023-11-XX, it's buggy, unpredictable, and ultimately unreliable. See:

Consider using some standalone TTS engine.

TTS engines

Design

Constraints

  • Limited quota. We want to optimize and cache whenever possible.

  • No money to spend. Meaning, we should not Google Cloud's Apigee (non-free) or Billing API (it can and prob will pause our usage after we exceed the quota, check the docs).

Assumptions

  • Only this project uses the TTS resource.

    • Note: Can be improved, but let's keep it simple for now.
  • The client respects cache control directives.

    • Not required, but nice.
    • Browsers respect them (unless overriden).
    • Other clients like curl and ffplay do not.

How it should work

Fail fast.

  • Assert that the request is valid, otherwise return a client error (400 Bad Request)

  • Let cacheKey = the checksum of the request params (ipaText, voice, etc.) using cannonical JSON maybe.

  • If request is conditional, and if its ETag equals cacheKey, return a response indicating it's not changed (304 Not Modified). Also, include whatever headers that tell the client to cache it forever.

  • If reached quota, return a server error (503 Service Unavailable).

  • Call the TTS API, it returns some audio data. Return it as body plus the content-type header.

System arch overview

Handling quota

Using Redis.

What to count

  • Google says <mark> are stripped, but that d 9210 oesn't matter. We don't care since we are only using a single <phoneme> tag.
  • We can just use TextEncoder and count bytes. Again, better be safe than sorry.

Note: For WaveNet and Standard voices, the number of characters will be equal to or less than the number of bytes represented by the text. This includes alphanumeric characters, punctuation, and white spaces. Some character sets use more than one byte for a character. For example, Japanese (ja-JP) characters in UTF-8 typically require more than one byte each. In this case, you are only charged for one character, not multiple bytes.

-- Pricing | Cloud Text-to-Speech | Google Cloud

For example, the string その名前をさあ、言ってごらん このぼくの名前を!

  • 24 characters (if we count using UTF-16, which is what JavaScript uses to encode string).
  • 70 bytes (if we encode the string in UTF-8 and then count).

IPA texts won't be this "extreme" though.

Counting

Whenever a request arrives, count how characters it will use (charsInRequest) then INCRBY charactersUsed charsInRequest. If the total charactersUsed is >= CHARACTERS_MAX, we should must not make a call to Google Cloud's TTS API.

Resetting

Thoughts:

  • Google Cloud quotas reset monthy.
  • So we can either use a cron job (supported by Vercel) to reset it at the start of each month. Or we could store the current month in Redis and run a transaction to do something like the following pseudocode:
MULTI

Get current month as YYYY-MM
Get stored month
If current month > stored month
    set stored month = current month
    set charactersUsed = 0

EXEC

Where and when to run it though? Again, TBD.

Notes:

  • "KV databases owned by users on a Hobby plan will be deleted after 30 days of being idle." - Vercel KV Limits

    • This is fine. Google Cloud's quota will reset, assuming only this project uses it.
  • I am aware that INCRBY returns the value after incrementing it.

    • This is fine. Better be safe than sorry. Suppose the limit is 1000 chars. We already used 900, but the request will only use 100 chars. We could make it, but, eh, might as well stop at N-1. One last request is not important.

Is using INCRBY counter:yyyy-mm a waste of storage (space quota)? Not really, we can think of it as an opportunity to implement another feature: Maintaining a historical record of monthly counts

Gotchas

You may need to find a language or voice that pronounces the IPA properly. Not every TTS/voice has all sounds.

We are doing something similar to this:

Details

Related projects

Further reading

Credits

License

CC BY 4.0 © Abdeldjalil HEBAL

About

Namaene - IPA vocalizer. Pronouncing names and stuff.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0