Namaene

Namaene: IPA vocalizer. Pronouncing names and stuff.

A web app that helps you tell the world how to pronounce your name or whatever.


Main page	Counter history

Features

Functionalities or possible improvements.

IPA pronunciation using TTS.
Shareable URLs (saving state in the URL).
Caching.
- Cache-Control public immutable ~~forever~~ for one year.
- ETag for conditional requests (revalidation) to /api/speak.
- Last-Modified for conditional requests to /api/voices.
  - You know, in case the list of supported voices changes during the lifetime of the current deployment.
  - For now, we are letting Next handle this: Caching the first response. The route is not dynamic by default.
  - It returns something with Cache-Control: public, max-age=0, must-revalidate and a strong ETag.
Analytics (Vercel's).

Usage

Getting Started

Assuming Azure Speech service's environment variables (SPEECH_REGION and SPEECH_KEY) defined...

Running the development server:

npm run dev

# List supported voices
curl -v http://localhost:3000/api/voices

# Before speaking (counter)
curl -v http://localhost:3000/api/counter

# Speaking
ffplay "http://localhost:3000/api/speak?voice=en-US-JennyNeural&ipa=ˈraɪzli"

# Speaking: Scorpion, 3a9rab
ffplay "http://localhost:3000/api/speak?voice=ar-DZ-IsmaelNeural&ipa=ʕaq.rab"

# After speaking (counter)
curl -v http://localhost:3000/api/counter

REST API

GET /speak?voice={voice}&ipa={ipa}
    returns binary data
    Content-Type: audio/ogg
    Cache-Control: public, immutable, max-age=31536000
    ETag: strong hash of the request params (IPA and voice)
    Codes: 200, 400

GET /voices
    returns Array<{name, locale, ipaSymbols: null | string[]}>
    Codes: 200

GET /counter
    returns {currentCount: number}
    Codes: 200

GET /counter-history
    returns Array<{[YearMonth]: counterAsString}>

Technologies used

Name

Namaene = the pronounciation or sound of a name.
- Namae means "name".
- Ne means "sound" as in Miku Hatsune (first-sound) or Len Kagamine (mirror-sound; mirrored sound as in echo?).

The project's name is a nod to Vocaloid.

~~NaNe~~: Simplified version, uses Na as in Kimi no Na wa.
- The nane subdomain is not available on Vercel.

Choosing a TTS provider

Requirements:

Impresses employers.
SSML support.
Free enough. (Requires credit card? Fine. Limited resources? Fine. Automatically charged after exceeding the free quota? Sucks, but may be manageable.)

Options:

Amazon Web Service (AWS)
- SSML? Yes.
  - Generating Speech from SSML Documents - Amazon Polly
- Free? Not really. "12 months free". Not "free forever".
Microsoft Azure
- SSML? Yes.
  - Speech Synthesis Markup Language (SSML) overview - Speech service - Azure AI services | Microsoft Learn
  - Pronunciation with Speech Synthesis Markup Language (SSML) - Speech service - Azure AI services | Microsoft Learn
- Free? Yes, but is it enough?
  - Text to Speech / 0.5 million characters free per month. Speech Services Pricing
Google Cloud
- SSML? Yes.
- Free? Yes, enough. Requires enabling billing, but you gotta ensure you don't go above the free monthly quota!
  - Free per month / Standard voices / 0 to 4 million characters" (https://cloud.google.com/text-to-speech/pricing)
  - Also, "new customers get $300 in free credits to spend on Text-to-Speech." (https://cloud.google.com/text-to-speech/)

Fallback

Use some fallback if the API is unusable for whatever reason?

Local. Use Web Speech API. It should work.

The text may be provided as plain text, or a well-formed SSML document. The SSML tags will be stripped away by devices that don't support SSML.

-- https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesisUtterance/text

However, when testing with Chrome/Edge v119 on Windows 10 Pro using various builtin engines (local and online), it did not seem to work. Either the text is spoken as if tags were stripped (like not considering emphasis and IPA phonemes) or it reads the XML file literally. As of 2023-11-XX, it's buggy, unpredictable, and ultimately unreliable. See:

Consider using some standalone TTS engine.

TTS engines

Design

Constraints

Limited quota. We want to optimize and cache whenever possible.
No money to spend. Meaning, we should not Google Cloud's Apigee (non-free) or Billing API (it can and prob will pause our usage after we exceed the quota, check the docs).

Assumptions

Only this project uses the TTS resource.
- Note: Can be improved, but let's keep it simple for now.
The client respects cache control directives.
- Not required, but nice.
- Browsers respect them (unless overriden).
- Other clients like curl and ffplay do not.

How it should work

Fail fast.

Assert that the request is valid, otherwise return a client error (400 Bad Request)
Let cacheKey = the checksum of the request params (ipaText, voice, etc.) using cannonical JSON maybe.
If request is conditional, and if its ETag equals cacheKey, return a response indicating it's not changed (304 Not Modified). Also, include whatever headers that tell the client to cache it forever.
If reached quota, return a server error (503 Service Unavailable).
Call the TTS API, it returns some audio data. Return it as body plus the content-type header.

Handling quota

Using Redis.

What to count

Google says <mark> are stripped, but that d 9210 oesn't matter. We don't care since we are only using a single <phoneme> tag.
We can just use TextEncoder and count bytes. Again, better be safe than sorry.

Note: For WaveNet and Standard voices, the number of characters will be equal to or less than the number of bytes represented by the text. This includes alphanumeric characters, punctuation, and white spaces. Some character sets use more than one byte for a character. For example, Japanese (ja-JP) characters in UTF-8 typically require more than one byte each. In this case, you are only charged for one character, not multiple bytes.

-- Pricing | Cloud Text-to-Speech | Google Cloud

For example, the string その名前をさあ、言ってごらんこのぼくの名前を！

24 characters (if we count using UTF-16, which is what JavaScript uses to encode string).
70 bytes (if we encode the string in UTF-8 and then count).

IPA texts won't be this "extreme" though.

Counting

Whenever a request arrives, count how characters it will use (charsInRequest) then INCRBY charactersUsed charsInRequest. If the total charactersUsed is >= CHARACTERS_MAX, we ~~should~~ must not make a call to Google Cloud's TTS API.

Resetting

Thoughts:

Google Cloud quotas reset monthy.
So we can either use a cron job (supported by Vercel) to reset it at the start of each month. Or we could store the current month in Redis and run a transaction to do something like the following pseudocode:

MULTI

Get current month as YYYY-MM
Get stored month
If current month > stored month
    set stored month = current month
    set charactersUsed = 0

EXEC

Where and when to run it though? Again, TBD.

Notes:

"KV databases owned by users on a Hobby plan will be deleted after 30 days of being idle." - Vercel KV Limits
- This is fine. Google Cloud's quota will reset, assuming only this project uses it.
I am aware that INCRBY returns the value after incrementing it.
- This is fine. Better be safe than sorry. Suppose the limit is 1000 chars. We already used 900, but the request will only use 100 chars. We could make it, but, eh, might as well stop at N-1. One last request is not important.

Is using INCRBY counter:yyyy-mm a waste of storage (space quota)? Not really, we can think of it as an opportunity to implement another feature: Maintaining a historical record of monthly counts

Gotchas

You may need to find a language or voice that pronounces the IPA properly. Not every TTS/voice has all sounds.

We are doing something similar to this:

Details

Related projects

IPA Reader by Katie
- http://ipa-reader.xyz/
- Pronouncing Things with Amazon's Polly - Cuttlesoft, Custom Software Developers
- https://gist.github.com/katie7r/f6e51f345e4c46738514cdeba31fb167
- Notes:
  - Uses AWS (Polly, S3, and idk).
  - Written in Python and uses Amazon AWS SDK for Python (Boto3).
  - They pay for it.
- Others:
  - Excellent IPA reader website I found: : r/conlangs
  - About IPA-Reader.xyz, phoneme-synthesis, and Pink Trombone https://www.reddit.com/r/linguistics/comments/w8mxuc/ipa_reader/
phoneme-synthesis https://github.com/itinerarium/phoneme-synthesis
- "A browser-based tool to convert International Phonetic Alpha (IPA) phonetic notation to speech using the meSpeak.js package"
- https://itinerarium.github.io/phoneme-synthesis/
  - "The IPA phonetic notation is translated into phonemes understood by eSpeak using correspondences and logic found in lexconvert. The translated phonemes (e.g., [[mUm'baI]] for /mʊmˈbaɪ/) are then provided to meSpeak.js, a revised Emscripten'd version of eSpeak for output."
  - KAITO: Doesn't eSpeak support IPA?
- RECHECK: Feedback wanted: convert phonetic notation text to speech : linguistics https://www.reddit.com/r/linguistics/comments/5lwuc7/
IPA to Speech (IPA Reader) by Anton Vasetenkov
- https://www.antvaset.com/ipa-to-speech
- 2023-12-11 Visited.
- Not open source, uses React I guess.
- "The tool uses the Google Cloud Text-to-Speech API, Amazon Polly, and Microsoft Azure Azure Cognitive Services Speech Service to generate the speech audio from the phonetic input."
- Liked the intro.

Credits

Developed with Next.js, bootstrapped with create-next-app.
Speech by Deivid Sáenz from Noun Project (CC BY 3.0)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
docs		docs
public		public
src/app		src/app
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE.txt		LICENSE.txt
README.md		README.md
Ramblings.md		Ramblings.md
TODO.md		TODO.md
next.config.js		next.config.js
package.json		package.json
post.md		post.md
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Namaene

Features

Usage

Getting Started

REST API

Technologies used

Name

Choosing a TTS provider

Fallback

Design

Constraints

Assumptions

How it should work

Handling quota

Gotchas

Details

Related projects

Further reading

Credits

License

About

Releases

Packages

Languages

License

djalilhebal/namaene

Folders and files

Latest commit

History

Repository files navigation

Namaene

Features

Usage

Getting Started

REST API

Technologies used

Name

Choosing a TTS provider

Fallback

Design

Constraints

Assumptions

How it should work

Handling quota

Gotchas

Details

Related projects

Further reading

Credits

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages