Speech-To-Text API

In previous post, I have given understanding of Text-to-Speech feature of Web Speech API. In this post, I will give detail of Speech-To-Text feature of this API.

We will create a demo lightning component. This component will get voice command and salesforce object record will open.
Voice command can be integrated using many APIs. Below are some important APIs which can be used for Speech Recognition.

We will use Web Speech API for speech Recognition. This API use browser’s audio stream to convert speech into text.

SpeechRecognition : This is speech recognition interface and it is available in browser’s window object. This object is present as SpeechRecognition in Firefox and as webkitSpeechRecognition in Chrome.

Below code will set recognition interface to SpeechRecognition.

window.SpeechRecognition = window.webkitSpeechRecognition || window.SpeechRecognition;

After setting recognition, create SpeechRecognition object using window.SpeechRecognition()

const recognition = new window.SpeechRecognition();

This recognition object has many properties, methods and event handlers.
Methods:

abort()	This method stops the speech recognition service from listening to incoming audio
start()	This method starts the speech recognition service listening to incoming audio with intent to recognize grammars associated with the current SpeechRecognition
stop()	This method Stops the speech recognition service from listening to incoming audio.

Properties:

grammars	Gets and sets a collection of SpeechGrammar objects that represent the grammars that will be understood by the current SpeechRecognition object.
lang	Gets and sets the language of the current SpeechRecognition.
interimResults	Controls whether interim results should be returned (true) or not (false). By default value is false.
maxAlternatives	Sets the maximum number of SpeechRecognitionAlternatives provided per result. The default value is 1.
serviceURI	Specifies the location of the speech recognition service used by the current SpeechRecognition to handle the actual.

Events

onstart	Fired when the speech recognition service has begun listening to incoming audio with intent to recognize grammars associated with the current SpeechRecognition
onaudiostart	Fired when the user agent has started to capture audio
onaudioend	Fired when the user agent has finished capturing audio
onend	Fired when the speech recognition service has disconnected
onerror	Fired when a speech recognition error occurs
onnomatch	Fired when the speech recognition service returns a final result with no significant recognition
onresult	Fired when the speech recognition service returns a result — a word or phrase has been positively recognized and this has been communicated back to the app
onsoundstart	Fired when any sound — recognisable speech or not — has been detected
onsoundend	Fired when any sound — recognisable speech or not — has stopped being detected
onspeechstart	Fired when sound that is recognised by the speech recognition service as speech has been detected
onspeechend	Fired when speech recognised by the speech recognition service has stopped being detected

Let us create voice app which open record page based on what we speak using above methods, properties and events.

voiceCommand.cmp

<aura:component controller="VoiceCommandController" implements="force:appHostable,flexipage:availableForAllPageTypes,flexipage:availableForRecordHome,force:hasRecordId,forceCommunity:availableForAllPageTypes,force:lightningQuickAction" access="global" >
    <aura:attribute name="value" type="string" default=""></aura:attribute> 
    <lightning:card title="Speech-to-Text">
        <lightning:textarea aura:id="speechText1" value="{!v.value}"></lightning:textarea>
        <lightning:button variant="success" label="Click to Speak" title="Speak" onclick="{! c.handleSpeechText }"/>
    </lightning:card>   
</aura:component>

voiceCommandController.js

({
    handleSpeechText: function(component, event, helper) {
      window.SpeechRecognition = window.webkitSpeechRecognition || window.SpeechRecognition;
        
        if ('SpeechRecognition' in window) {
            console.log('supported speech')
        } else {
            console.error('speech not supported')
        }
        const recognition = new window.SpeechRecognition();
        recognition.lang = 'en-IN';
        recognition.continuous = true;
        recognition.onresult = (event) => {
            component.set("v.value",event.results[event.results.length -1][0].transcript);
            var commandText=event.results[event.results.length -1][0].transcript;
            var commands=commandText.split(' ');
            if(commands.length>0)
            {
            	var obj=commands[1];
            	var condition=commands[3];
            	helper.getRecords(component,obj,condition);
        	}
        }
        recognition.start();
    }
});

event.results[event.results.length -1][0].transcript this will give voice transcript. After that we will split those transcript into tags/terms which need to fire in salesforce to open records.

We will get command in format of “open {object} for/of {objectname}”. For example “open account of dhanik” or “open lead for dhanik” . In this transcript, we have two important term: object name and data value like account and dhanik. We will split text transcript to get object name and data values using commandText.split(‘ ‘);.

voiceCommandHelper.js

Apex method is called to get specific record id. if multiple records are matching then it will return first record. Based on record id, application is navigated to record.

getRecords : function(cmp,objName,name) {
     	// create a one-time use instance of the serverEcho action
        // in the server-side controller
        var action = cmp.get("c.getRecord");
       	action.setParams(
           {
               objectName:objName,
               names : name
           });

        // Create a callback that is executed after 
        // the server-side action returns
        action.setCallback(this, function(response) {
            var state = response.getState();
            debugger;
            if (state === "SUCCESS") {
                  var data=response.getReturnValue();
                  var navEvt = $A.get("e.force:navigateToSObject");
                    navEvt.setParams({
                    "recordId": data.Id
                });
                navEvt.fire();
            }
            else if (state === "INCOMPLETE") {
                // do something
            }
            else if (state === "ERROR") {
                var errors = response.getError();
                if (errors) {
                    if (errors[0] && errors[0].message) {
                        console.log("Error message: " + 
                                 errors[0].message);
                    }
                } else {
                    console.log("Unknown error");
                }
            }
        });
        $A.enqueueAction(action);
   }

Above helper class will call Apex method to get record id of first matching records. This can be changed based on requirement.

public class VoiceCommandController {
    @Auraenabled
	public static sObject getRecord(string objectName, string names)
    {
        String name= '%'+names+'%';
        string soql='select id from '+objectName+' where name like : name';
        List<sObject> sobjList = Database.query(soql);
	if(sobjList.size()>0)
        {
    		return sobjList[0];
        }
        return null;
    }
}

Demo Time

The Speech Recognition API is can be useful for data entry, record navigation and other useful commands. We can create application to capture instant transcripts.

View 12 Comments

12 Comments

Gopal Giri on October 23, 2019 5:31 pm

Hello DHANIK ,
I am Facing API version issues while saving code because of web api in lightning component design .
can you please let me know how to implement it in visual force page and lightning component using correct version

- Dhanik Lal Sahni on October 31, 2019 2:36 am
  
  You can get lightning code in Visual Code from org. Change API version in Visual code and push change.It will work.
  
SUCHARITA MONDAL on November 9, 2019 9:40 pm

Hi Dhanik,
I want to use Einstein Voice for something similar. Can you guide me on this that how should I approach to this.

Thanks,

- Dhanik Lal Sahni on November 11, 2019 9:22 pm
  
  Sucharita, We have implemented that using Alexa with Einstein Bots. You can also try with that.
  
Harikesh on February 4, 2020 10:48 am

Hi Dhanak,

I am getting the below error if I try to use the similar code in lwc
window.SpeechRecognition || window.webkitSpeechrecognition) is not a constructor
Can you guide me on this.

Thanks

- Dhanik Lal Sahni on February 4, 2020 2:37 pm
  
  You can not use window in LWC. In LWC, better you use Google Speech-To-Text api or other API mentioned in post.
  
Nevin on May 8, 2020 1:03 pm

How to trigger the functionality on a keyword like “Hey Einstein” instead of manually clicking button to activate?

- Dhanik Lal Sahni on May 8, 2020 5:15 pm
  
  Hey Nevin,
  
  Run method handleSpeechText in component load. Use init handler for this.
  
  Thank You,
  Dhanik Sahni
  http://salesforcecodex.com/
  
Jayant on November 25, 2020 11:08 am

Hi,
I’m getting Invalid SecureWindow API, webkitSpeechRecognition was blacklisted in LC.
Any idea?
Thanks,
Jayant

- Dhanik Lal Sahni on November 25, 2020 1:26 pm
  
  Have you changed LC version to 39? This API only work till 39 after that it is blocked by locker service.
  
  Please check with LC version 39.
  
  Thank You,
  Dhanik
  
Harshal on September 14, 2021 6:58 pm

I have changed the LC API to 39 but it says,
This page has an error. You might just need to refresh it.
Action failed: c:helloWorld$controller$handleClick [window.SpeechRecognition is not a constructor]
Failing descriptor: {c:helloWorld$controller$handleClick}

- Dhanik Lal Sahni on September 15, 2021 12:06 pm
  
  Hello Harshal,
  
  Please confirm once again, your LC is updated to version 39 or not. This error will only throw when API is not set to version 39.
  
  Thank You,
  Dhanik