Speech-To-Text API
In previous post, I have given understanding of Text-to-Speech feature of Web Speech API. In this post, I will give detail of Speech-To-Text feature of this API.
We will create a demo lightning component. This component will get voice command and salesforce object record will open.
Voice command can be integrated using many APIs. Below are some important APIs which can be used for Speech Recognition.
We will use Web Speech API for speech Recognition. This API use browser’s audio stream to convert speech into text.
SpeechRecognition : This is speech recognition interface and it is available in browser’s window object. This object is present as SpeechRecognition in Firefox and as webkitSpeechRecognition in Chrome.
Below code will set recognition interface to SpeechRecognition.
window.SpeechRecognition = window.webkitSpeechRecognition || window.SpeechRecognition;
After setting recognition, create SpeechRecognition object using window.SpeechRecognition()
const recognition = new window.SpeechRecognition();
This recognition object has many properties, methods and event handlers.
Methods:
abort() | This method stops the speech recognition service from listening to incoming audio |
start() | This method starts the speech recognition service listening to incoming audio with intent to recognize grammars associated with the current SpeechRecognition |
stop() | This method Stops the speech recognition service from listening to incoming audio. |
Properties:
grammars | Gets and sets a collection of SpeechGrammar objects that represent the grammars that will be understood by the current SpeechRecognition object. |
lang | Gets and sets the language of the current SpeechRecognition. |
interimResults | Controls whether interim results should be returned (true) or not (false). By default value is false. |
maxAlternatives | Sets the maximum number of SpeechRecognitionAlternatives provided per result. The default value is 1. |
serviceURI | Specifies the location of the speech recognition service used by the current SpeechRecognition to handle the actual. |
Events
onstart | Fired when the speech recognition service has begun listening to incoming audio with intent to recognize grammars associated with the current SpeechRecognition |
onaudiostart | Fired when the user agent has started to capture audio |
onaudioend | Fired when the user agent has finished capturing audio |
onend | Fired when the speech recognition service has disconnected |
onerror | Fired when a speech recognition error occurs |
onnomatch | Fired when the speech recognition service returns a final result with no significant recognition |
onresult | Fired when the speech recognition service returns a result — a word or phrase has been positively recognized and this has been communicated back to the app |
onsoundstart | Fired when any sound — recognisable speech or not — has been detected |
onsoundend | Fired when any sound — recognisable speech or not — has stopped being detected |
onspeechstart | Fired when sound that is recognised by the speech recognition service as speech has been detected |
onspeechend | Fired when speech recognised by the speech recognition service has stopped being detected |
Let us create voice app which open record page based on what we speak using above methods, properties and events.
voiceCommand.cmp
<aura:component controller="VoiceCommandController" implements="force:appHostable,flexipage:availableForAllPageTypes,flexipage:availableForRecordHome,force:hasRecordId,forceCommunity:availableForAllPageTypes,force:lightningQuickAction" access="global" > <aura:attribute name="value" type="string" default=""></aura:attribute> <lightning:card title="Speech-to-Text"> <lightning:textarea aura:id="speechText1" value="{!v.value}"></lightning:textarea> <lightning:button variant="success" label="Click to Speak" title="Speak" onclick="{! c.handleSpeechText }"/> </lightning:card> </aura:component>
voiceCommandController.js
({ handleSpeechText: function(component, event, helper) { window.SpeechRecognition = window.webkitSpeechRecognition || window.SpeechRecognition; if ('SpeechRecognition' in window) { console.log('supported speech') } else { console.error('speech not supported') } const recognition = new window.SpeechRecognition(); recognition.lang = 'en-IN'; recognition.continuous = true; recognition.onresult = (event) => { component.set("v.value",event.results[event.results.length -1][0].transcript); var commandText=event.results[event.results.length -1][0].transcript; var commands=commandText.split(' '); if(commands.length>0) { var obj=commands[1]; var condition=commands[3]; helper.getRecords(component,obj,condition); } } recognition.start(); } });
event.results[event.results.length -1][0].transcript this will give voice transcript. After that we will split those transcript into tags/terms which need to fire in salesforce to open records.
We will get command in format of “open {object} for/of {objectname}”. For example “open account of dhanik” or “open lead for dhanik” . In this transcript, we have two important term: object name and data value like account and dhanik. We will split text transcript to get object name and data values using commandText.split(‘ ‘);.
voiceCommandHelper.js
Apex method is called to get specific record id. if multiple records are matching then it will return first record. Based on record id, application is navigated to record.
getRecords : function(cmp,objName,name) { // create a one-time use instance of the serverEcho action // in the server-side controller var action = cmp.get("c.getRecord"); action.setParams( { objectName:objName, names : name }); // Create a callback that is executed after // the server-side action returns action.setCallback(this, function(response) { var state = response.getState(); debugger; if (state === "SUCCESS") { var data=response.getReturnValue(); var navEvt = $A.get("e.force:navigateToSObject"); navEvt.setParams({ "recordId": data.Id }); navEvt.fire(); } else if (state === "INCOMPLETE") { // do something } else if (state === "ERROR") { var errors = response.getError(); if (errors) { if (errors[0] && errors[0].message) { console.log("Error message: " + errors[0].message); } } else { console.log("Unknown error"); } } }); $A.enqueueAction(action); }
Above helper class will call Apex method to get record id of first matching records. This can be changed based on requirement.
public class VoiceCommandController { @Auraenabled public static sObject getRecord(string objectName, string names) { String name= '%'+names+'%'; string soql='select id from '+objectName+' where name like : name'; List<sObject> sobjList = Database.query(soql); if(sobjList.size()>0) { return sobjList[0]; } return null; } }
Demo Time
12 comments
Hello DHANIK ,
I am Facing API version issues while saving code because of web api in lightning component design .
can you please let me know how to implement it in visual force page and lightning component using correct version
You can get lightning code in Visual Code from org. Change API version in Visual code and push change.It will work.
Hi Dhanik,
I want to use Einstein Voice for something similar. Can you guide me on this that how should I approach to this.
Thanks,
Sucharita, We have implemented that using Alexa with Einstein Bots. You can also try with that.
Hi Dhanak,
I am getting the below error if I try to use the similar code in lwc
window.SpeechRecognition || window.webkitSpeechrecognition) is not a constructor
Can you guide me on this.
Thanks
You can not use window in LWC. In LWC, better you use Google Speech-To-Text api or other API mentioned in post.
How to trigger the functionality on a keyword like “Hey Einstein” instead of manually clicking button to activate?
Hey Nevin,
Run method handleSpeechText in component load. Use init handler for this.
Thank You,
Dhanik Sahni
http://salesforcecodex.com/
Hi,
I’m getting Invalid SecureWindow API, webkitSpeechRecognition was blacklisted in LC.
Any idea?
Thanks,
Jayant
Have you changed LC version to 39? This API only work till 39 after that it is blocked by locker service.
Please check with LC version 39.
Thank You,
Dhanik
I have changed the LC API to 39 but it says,
This page has an error. You might just need to refresh it.
Action failed: c:helloWorld$controller$handleClick [window.SpeechRecognition is not a constructor]
Failing descriptor: {c:helloWorld$controller$handleClick}
Hello Harshal,
Please confirm once again, your LC is updated to version 39 or not. This error will only throw when API is not set to version 39.
Thank You,
Dhanik