8000 GitHub - jamesbmayr/bluejay: bluejay is an action-voice engine
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

jamesbmayr/bluejay

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

bluejay

an action-voice engine by James Mayr




INTRODUCTION

Welcome to bluejay, a voice assistant that isn't always listening, that doesn't compile huge data troves of your speech and history, and that doesn't have any idea who you are. The trade-off is that you need to have some familiarity with coding to customize it to your needs - including creating developer accounts with any APIs you intend to use. But, at least in theory, bluejay is infinitely expandable - think of this as an engine that can power your use cases... with a ton of built-in functionality already.

Why did I build this? I wanted a voice assistant to perform various platform-specific searches, to read me the news, to control my smart home devices, etc. There are, of course, existing solutions to this... but I just don't trust the big players. I imagine they're using the audio recordings and transcripts for their own purposes, whether that's market research or personalizing ads or building new AI products. I wanted a system that provides more control over the flow of information and allows me to interact with APIs without creating a unified record of my preferences and history. But more than that, I built this because it seemed like fun.




TECHNOLOGIES

nodeJS

A simple nodeJS application running locally is used for serving front-end files and proxying requests to external APIs.

JS (ES6)

The front-end is a vanilla Javascript application (using some ES6 syntax). The core engine lives in a single script, with an additional script containing the library of recognized phrases and executable actions.

LocalStorage API

Since there are no user accounts, all settings are stored locally in the browser's LocalStorage cache. This includes credentials to external APIs - nothing is stored in the code.

WebAudio API

The native WebAudio API powers a simple pitch detection, which only stores frequency data - not words.

SpeechRecognition API

Another native API, SpeechRecognition kicks in once a whistle is detected, and converts speech into text. Note that Chrome technically sends this audio to Google servers for (presumably anonymous) processing.

SpeechSynthesis API

This leverages the speech synthesis functionality of the device to transform response text into spoken words.




SET-UP

back-end: localhost

  1. Download and install the latest version of nodeJS.
  2. Download the bluejay .zip, open it, and move the files to the folder of your choice.
  3. Navigate to this location in Terminal, then to the folder localhost.
  4. npm start

back-end: Firebase

  1. Alternately, create a new project on Google Firebase.
  2. This may incur minimal costs (pennies). If you want to use Firebase to call external APIs, such as to fetch information or control smart devices, Google requires a pay-as-you-go plan.
  3. Download and install the latest version of Firebase.
  4. Download the bluejay .zip, open it, and move the files to the folder of your choice.
  5. Navigate to this location in Terminal, then to the folder firebase.
  6. firebase init
  7. Follow the prompts to set up your Firebase project.
  8. firebase deploy
  9. When the project is deployed, the logs will indicate the url of your Firebase project.

front-end

  1. Download and install the latest version of Google Chrome. Note: Safari and Firefox do not support SpeechRecognition API.
  2. If you have deployed using Firebase, simply navigate to the url of your project. Otherwise, read on...
  3. Option 1: easy, then annoying
    1. In index.js, in getEnvironment, set ssl to false.
    2. Open http://localhost:3000 in your browser. Note: since this is not https, the webpage will constantly ask you to grant audio permissions.
  4. Option 2: annoying, then easy
    1. On chrome://flags/#unsafely-treat-insecure-origin-as-secure, add localhost in the text field and set the select to Enabled.
    2. On chrome://flags/#allow-insecure-localhost, switch to Enabled.
    3. Create a security certificate for localhost. (optional, to remove the red warning) You can do that with this command, from https://letsencrypt.org/docs/certificates-for-localhost/
      openssl req -x509 -out localhost.crt -keyout localhost.key \
        -newkey rsa:2048 -nodes -sha256 \
        -subj '/CN=localhost' -extensions EXT -config <( \
         printf "[dn]\nCN=localhost\n[req]\ndistinguished_name = dn\n[EXT]\nsubjectAltName=DNS:localhost\nkeyUsage=digitalSignature\nextendedKeyUsage=serverAuth")

      Then use your computer's Keychain Access or equivalent to Alsways Trust this certificate.
    4. Open https://localhost:3000 in your browser. Note: on startup, Chrome may show you a reminder that you are treating localhost as https.




HOW TO USE THIS

This is an overview of the user experience - how to interact with bluejay. Subsequent sections will explain how it works, how to set up developer accounts with APIs, and how to add your own functionality.

Input

There are 4 input methods:
  • Type a phrase into the text box.
  • Click the circular button and speak a phrase.
  • Whistle any interval (that is, two distinct notes), up or down, within an octave, to get bluejay's attention; then speak your phrase.
  • After responding, most actions will cause bluejay to follow up and listen again.

Settings

The interface also provides a few settings, which you can access by clicking the "hamburger" icon.
  • upload: Use this to import a simple JSON file of key/value pairs of configurations (such as api keys or favorite websites). Note that this information is stored in LocalStorage only.
  • on: Uncheck this to turn off whistle detection. (While unchecked, input method #3 is unavailable.)
  • listen (sec): How long should bluejay listen for words before processing them?
  • volume: How loud should the speech synthesis response be?
  • voice: Which voice (provided by the device) should the speech synthesis use?

Response Structure

Each response should include the following components:
  1. visual
    • phrase: the exact user-input phrase detected
    • icon: an emoji representing the action that the phrase matched to
    • action: the name of the action that the phrase matched to; all actions are in the form of infinitive verb + details, such as "get + the time"
    • timestamp: the hh:mm:ss time the response was generated
    • responseHTML: the output generated by the action function, including:
      • an <h2> with the primary output (often wrapped in an <a>)
      • additional unstructured information
      • additional structured information, such as a <ul>, <table>, <svg>, <img>, or <iframe>
  2. auditory
    • message: the spoken text that is fed to the SpeechSynthesis API
  3. contextual
    • (Note: These items can add information to the CONTEXT_LIBRARY or change the format of the response.)
    • followup: set this to false to prevent bluejay from automatically triggering SpeechRecognition again
    • auto: makes the action bar blue
    • error: makes the action bar red, saves this action as incomplete and follows up
    • number: the main number of the response, if applicable
    • word: the main word or short phrase of the response, if applicable
    • time: the main date/timestamp of the response, if applicable
    • url: the main url of the response, if applicable
    • video: the id of the <iframe> or <video> of the response, if applicable
    • language: the alternate language for the spoken response, if applicable
    • results: a list of additional responses, if applicable, such as for a search action




HOW IT WORKS

Whistling

  1. Clicking the large logo button on load creates a new AudioContext() which connects to the device microphone.
  2. A setInterval runs a function every X ms which analyzes the latest microphone data. It attempts to understand the complexity, frequency, and energy (volume) of input sound.
  3. If certain criteria are met across these categories (simple sine wave, frequency in the whistling range, sufficient loudness) then the frequency is converted to a pitch and stored. Only the last 10 iterations are saved.
  4. If an interval (that is, a difference between two pitches) of at least a minor second and at most a perfect fourth is detected, speech recognition is triggered.
  5. It also plays a chirp sound in an <audio> element. (Note: this is actually an edited recording of bluejays!)

Speech Recognition

  1. As indicated above, this can be triggered by whistling, clicking the button, or a followup command from a previous action.
  2. The speechRecognition API listens for X seconds or until it detects silence.
  3. The audio is technically processed through a Google API, but this is all built-in to Chrome.
  4. The resultant object contains multiple possible text matches; only the first is used.

Pre-Matching

  1. A phrase is entered, either through speech recognition or by being manually typed in the text box.
  2. If the phrase exactly matches one of the elements in ERROR_LIBRARY["stop-phrases"], the system will set the action to stop. This will flush out any previously incomplete action or flow, as well as stop any playing video or speech synthesis. The system will not follow up.
  3. If the last 6 characters of the phrase are cancel (case-insensitive), the system will abort this entirely and produce no output. The system will not follow up.
  4. If a flow was set (that is, a multi-step action), then the entire phrase will be fed into that action. (Note that only a stop command will exit a flow prematurely.)

Matching

  1. Otherwise, the full phrase (trimmed and cleaned up a bit) is matched against every key in the PHRASE_LIBRARY. If there are no matches, the last word (broken at spaces) is moved to a remainder string, and the preceding phrase is matched against the PHRASE_LIBRARY. This continues until a match is found.
  2. If a match is found, the remainder is fed into the action function indicated by the PHRASE_LIBRARY.
  3. If no match was found, but there was a previously incomplete action, the entire phrase is fed into that action function (similar to a flow above).
  4. If no match was found and there was no previously incomplete action, the system selects an error from ERROR_LIBRARY["noaction-responses"] and follows up to try again.

Action

  1. If there was an action, the remainder (or null) will be fed into the function of that name in the ACTION_LIBRARY.
  2. Generally, these actions will either transform the remainder, use it to search for information on an API, or send it to an API and return the result.

History

  1. The phrase, action, remainder, and response object are all fed into createHistory.
  2. This function updates the CONTEXT_LIBRARY with the latest phrase, action, and remainder, as well as the response icon, message, html, and any additional parameters.
  3. A history block is created (see Response Structure above).
  4. The SpeechSynthesis API will speak the response message (see Response Structure above).
  5. Unless there was an explicit command not to, the system will then follow up for another phrase.

Libraries

  • PHRASE_LIBRARY: the object populated by library.js containing all key-value pairs of spoken phrase to action name.
  • ACTION_LIBRARY: the object populated by library.js containing all the actions that can be invoked by the user.
  • FUNCTION_LIBRARY: the list of helper functions used throughout the front-end, such as getAverage and sortRandom; all of the form handlers, such as changeWhistleOn and changeRecognitionDuration; all of the initialize functions for the other libraries, such as initializeAudio and initializeRecognition; all of the functions that communicate externally, such as proxyRequest and sendPost; and all of the functions described in this flow, such as matchPhrase and createHistory. Basically, this is all functions except the ones users invoke in the ACTION_LIBRARY.
  • ELEMENT_LIBRARY: an object to more easily access DOM elements, such as the settings inputs.
  • AUDIO_LIBRARY: the object containing all data and functions powering the whistle detection.
  • SOUND_LIBRARY: the object containing all information powering the chirp sound.
  • RECOGNITION_LIBRARY: the object containing all data and functions powering the SpeechRecognition.
  • VOICE_LIBRARY: the object containing all data and functions powering the SpeechSynthesis.
  • ERROR_LIBRARY: an object of arrays of stop-phrases, noaction-responses, and error-responses.
  • CONTEXT_LIBRARY: an object to store temporary values pertaining to the last phrase, action, response, etc.; this also contains information about the current flow and any current alarms.
  • CONFIGURATION_LIBRARY: an object saved in localStorage that contains settings related to the whistling, speech recognition, and speech synthesis components, as well as any user-provided API credentials, favorite websites and RSS feeds, and location information.
  • NUMBER_WORD_LIBRARY: a key-value mapping of number words to digits, such as "one": 1.
  • LETTER_WORD_LIBRARY: a key-value mapping of letter words to letters, such as "sea": "c".




CUSTOMIZING FUNCTIONALITY

You can easily add your own functions to library.js:

Add phrases to the PHRASE_LIBRARY

  • The key represents part of the user's spoken phrase, matched from the beginning.
  • The value represents the name of the function to run.

Add action to the ACTION_LIBRARY

  • The name of the function should be all lowercase, only alphabetical, an imperative command from bluejay's point of view (ex: "get the time" or "search google").
  • All action functions take two inputs:
    • The remainder is the rest of the user's spoken phrase, everything that did not match in the PHRASE_LIBRARY, through to the end of the string.
    • The callback is the function to run with the output; this is generally going to be FUNCTION_LIBRARY.createHistory.
  • Wrap the contents of the function in a try { } catch (error) { }.
    • The default error handler is: callback({icon: icon, error: true, message: "I was unable to " + arguments.callee.name + ".", html: "<h2>Unknown error in <b>" + arguments.callee.name + "</b>:</h2>" + error})
  • Identify the icon to display in the response.
    • This should be in the format \&\#x1f426; where the characters between x and ; represent the HTML code for an emoji character.
  • Format the remainder as necessary, such as .trim() or using a .replace(/some regex/gi, "").
  • Perform the action logic, including accessing external APIs.
    • Generally, use a "errorcallback and return" structure. For example, check for a required configuration before attempting to use it, and callback with an error message if it's missing.
    • Use FUNCTION_LIBRARY functions wherever possible, such as proxyRequest to have the server make an API request, or getDigits to transform number words ("one") into numerals (1).
    • If the action requires multiple steps, set CONTEXT_LIBRARY.flow to the name of this function, and store all temporary information within a new object at CONTEXT_LIBRARY[name of this functon]. Make sure to delete this and unset the flow at the end.
  • Format the output. This will usually be: callback({icon: icon, message: message, html: responseHTML}) (See the Response Structure section for more options.)




ACCESSING EXTERNAL DATA

Many of bluejay's actions require fetching information from external data sources.

HTML/XML/RSS

Some actions can get information from a publicly accessible HTML page, or an XML or RSS feed:

Etymonline

The Free Dictionary

Snapple Facts

Various RSS Feeds

  • "get the latest post" → RSS feeds
  • "get a random post" → RSS feeds
  • "get all posts" → RSS feeds
  • "get the headlines" → RSS feeds

Open APIs

Other actions involve fetching information from an external API. Several of these APIs require no authentication, and will therefore work right out of the box:

DataMuse

PoetryDB

ICanHazDadJoke

Forismatic

Yerkee

Complimentr

EvilInsult

Trivia

Wikipedia

Open Library

OpenTrivia Database

Random Word API

Mad Libz

Sunrise-Sunset

MBTA API

JamesMayr.com


APIs Requiring an Account

Other APIs will require you to create a developer account and add your credentials into LocalStorage, either using the "change configuration" action or by uploading a JSON file through the interface.

IFTTT


Open Weather


Spoonacular


Alphavantage


Online Movie Database


Stands4


Google APIs


Google Custom Search

  • actions:
    • "search google" → {custom search url}&key={key}&q={search}
  • setup:
    1. Use your existing Google account or create one here: https://www.google.com/
    2. Create a new Custom Search Engine here: https://cse.google.com/cse/all
    3. Add a new search engine to search any random website, such as https://www.example.com.
    4. Edit the search engine to remove this website from "Sites to search".
    5. Instead, set "Search the entire web" to ON.
    6. Save your "Public URL" as google custom search.
    7. You will also need the google api key from above.

Creating an API with Google Apps Script

If you're anything like me, you have a lot of information within Google applications, such as Docs, Sheets, Gmail, Calendar, and Contacts. For that, you can create use Google Apps Script to publish a script to the web to serve as an API endpoint into your account.
  • actions:
    • "edit wish list" → {url}&action=listWish&item={name}&cost={cost}&type={type}b>
    • "get balance" → {url}&action=getBalance&account={name}
    • "log purchase" → {url}&action=logPurchase&category={name}&description={description}&amount={cost}
    • "fetch calendar" → {url}&action=fetchEvents&startDate={date}&endDate={date}
    • "find event" → {url}&action=findEvent&name={name}
    • "add event" → {url}&action=addEvent&title={name}&startDate={startDate}&startTime={startTime}&endDate={endDate}&endTime={endTime}&location={location}
    • "get a list" → {url}&action=getList&list={name}
    • "add an item to a list" → {url}&action=addTask&list={name}&task={description}
    • "get contacts" → {url}&action=getContacts&name={name}
    • "get birthday" → {url}&action=getContacts&name={name}
    • "get phone number" → {url}&action=getContacts&name={name}
    • "get email" → {url}&action=getContacts&name={name}
    • "get address" → {url}&action=getContacts&name={name}
    • "draft email" → {url}&action=draftEmail&recipient={name}&subject={subject}&body={body}
    • "log gratitude" → {url}&action=logGratitude&text={text}
  • setup:
    1. Use your existing Google account or create one here: https://www.google.com/
    2. Go to https://script.google.com/home/my to create a new script.
    3. See below for an example script that would fetch your Google Tasks. Note that this involves generating a secret key to send along with each request, ensuring only you can access this.
    4. Save the script and select Publish > Deploy as a web app...
    5. Set Execute the app as: to yourself and Who has access to the app: to Anyone, even anonymous. For Project version:, select New and add a commit message, then click Update.
    6. Approve whatever new access the application needs. Your browser may warn you that this is unsafe, because it's an unknown developer... but that "unknown developer" is you. So proceed anyway.
    7. For each script, save the public url as google apps script or google apps script {#}. (I use one script url, with ?action= as a query parameter.)
function doGet(event) {
  if (event && event.parameter && event.parameter.key == {secret key}) {
    if (event.action == "getList") {
      return ContentService.createTextOutput(JSON.stringify(getList(event)))
    }
    return ContentService.createTextOutput(JSON.stringify({
      success: false,
      message: "Unknown action."
    }))
  }
  return ContentService.createTextOutput(JSON.stringify({
    success: false,
    message: "Unauthorized."
  }))
}

function getList(event) {
  var allLists = Tasks.Tasklists.list()
  var listName = (event.parameter.list || "Default List").toLowerCase().trim()
      
  for (var i in allLists.items) {
    if (allLists.items[i].title.toLowerCase().trim() == listName) {
      var list = allLists.items[i]
      break
    }
  }
      
  if (list) {
    var listContents = Tasks.Tasks.list(list.id) || {items: []}
    return {success: true, listName: list.title, list: listContents.items}
  }
  return {success: false, message: "Unknown list."}
}


APIs Requiring Oauth

Finally, some APIs require Oauth, because you could be writing information to a user's account (yours, presumably). Broadly speaking, Oauth involves sending users to another platform's website where they authenticate and are then redirected back to your site. Of course, since bluejay lives at localhost, that doesn't work. Here's the bizarre workaround I engineered:
  1. Parse the user response or look in the CONFIGURATION_LIBRARY for the platform's key, secret, and redirect.
  2. Create a popup window of the platform's auth screen; the state query param will include the bluejay url and platform API key and/or secret, encrypted as required for later.
  3. The user (also you) clicks through and completes the auth flow in the popup window.
  4. The external platform then redirects the popup to the redirect parameter, which must match the one on file in your developer settings. Believe it or not, I'm using a Google Apps Script for this. The platform will send the authorization code as a query parameter.
  5. I have a Google Apps Script that captures and logs this request. It splits the state parameter into the bluejay url and the platform API secret, and uses the authorization code parameter sent from the platform.
  6. The Google Apps Script page returns a tiny HTML page with a <script> that automatically redirects to bluejay's /authorization endpoint, with an embeddedPost parameter.
  7. This page sends uses proxyRequest to send the embeddedPost to the bluejay server, which sends an API request to the external platform with the authorization code received earlier.
  8. The platform finally responds with the actual access_token, refresh_token, and expiration.
  9. The bluejay server sends these results back to the /authorization page, which immediately stores them in localStorage.
  10. The main bluejay window has actually been checking localStorage this whole time, and now that there is a value set for this data, the page finally saves these values to CONFIGURATION_LIBRARY.
  11. It also announces that it was a success, and automatically closes the popup window from before.

Wink

function doGet(event) { return authorize(event) }
function doPost(event) { return authorize(event) }
function authorize(event) {
  try {  
    var code = event.parameter.code || ""
    var state = (event.parameter.state || "").split(";;;") || []
    var bluejayUrl = state[0].replace("http://","https://")
    var authorization = state[1]
  
    var postUrl = "https://api.wink.com/oauth2/token"
    var data = {
      "method": "post",
      "url": postUrl,
      "Content-Type": "application/json",
      "body": {
        "grant_type": "authorization_code",
        "code": code,
        "client_secret": authorization
      }
    }
   
    var link = bluejayUrl + "authorization?embeddedPost=" +
      encodeURIComponent(JSON.stringify(data))
    var response = HtmlService.createHtmlOutput("Redirecting to<br>" +
      "<a href='" + link + "'>" + link + "</a>" + 
      "<script>window. { " + 
      "window.location = '" + link + "'" + 
      " }</script>")
        response.setXFrameOptionsMode(HtmlService.XFrameOptionsMode.ALLOWALL)
    return response
  }
  catch (error) {
    return ContentService.createTextOutput(error)
  }
}

Sonos

function doGet(event) { return authorize(event) }
function doPost(event) { return authorize(event) }
function authorize(event) {
  try {  
    var code = event.parameter.code || ""
    var state = (event.parameter.state || "").split(";;;") || []
    var bluejayUrl = state[0].replace("http://","https://")
    var authorization = state[1]
  
    var redirectURI = encodeURIComponent({this Google Apps Script's public url})
    var postUrl = "https://api.sonos.com/login/v3/oauth/access?grant_type=authorization_code&code=" +
      code + "&redirect_uri=" + redirectURI
    var data = {
      "method": "post",
      "url": postUrl,
      "Authorization": "Basic " + authorization,
      "Content-Type": "application/x-www-form-urlencoded;charset=utf-8"
    }
   
    var link = bluejayUrl + "authorization?embeddedPost=" +
      encodeURIComponent(JSON.stringify(data))
    var response = HtmlService.createHtmlOutput("Redirecting to<br>" +
      "<a href='" + link + "'>" + link + "</a>" + 
      "<script>window. { " + 
      "window.location = '" + link + "'" + 
      " }</script>")
        response.setXFrameOptionsMode(HtmlService.XFrameOptionsMode.ALLOWALL)
    return response
  }
  catch (error) {
    return ContentService.createTextOutput(error)
  }
}

Reddit

  • actions:
  • setup:
    1. Create a Google Apps Script to handle Oauth redirects (see below).
    2. Go to https://reddit.com/ and create an account.
    3. Go to https://www.reddit.com/prefs/apps/ and click"create app....
    4. Save the Google Apps Script url to redirect uri on the developer page.
    5. Save the random string below the application name as reddit key, secret as reddit secret, and this Google Apps Script url as reddit redirect.
    6. Say "authorize reddit" and follow the Oauth flow as a user.
function doGet(event) { return authorize(event) }
function doPost(event) { return authorize(event) }
function authorize(event) {
  try {
    var code = event.parameter.code || ""
    var state = (event.parameter.state || "").split(";;;") || []
    var bluejayUrl = state[0].replace("http://","https://")
    var authorization = state[1]
    
    var redirectURI = encodeURIComponent({this Google Apps Script's public url}) 
    var postUrl = "https://www.reddit.com/api/v1/access_token?grant_type=authorization_code&code=" +
      code + "&redirect_uri=" + redirectURI
    var data = {
      "method": "post",
      "url": postUrl,
      "Authorization": "Basic " + authorization,
      "Content-Type": "application/x-www-form-urlencoded;charset=utf-8"
    }
   
    var link = bluejayUrl + "authorization?embeddedPost=" +
      encodeURIComponent(JSON.stringify(data))
    var response = HtmlService.createHtmlOutput("Redirecting to<br>" +
      "<a href='" + link + "'>" + link + "</a>" + 
      "<script>window. { " + 
      "window.location = '" + link + "'" + 
      " }</script>")
        response.setXFrameOptionsMode(HtmlService.XFrameOptionsMode.ALLOWALL)
    return response
  }
  catch (error) {
    return ContentService.createTextOutput(error)
  }
}

About

bluejay is an action-voice engine

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0