Runing Salesforce App using Voice command – Speech-To-Text API

    Speech-To-Text API

    In previous post, I have given understanding of Text-to-Speech feature of Web Speech API. In this post, I will give detail of Speech-To-Text feature of this API.

    We will create a demo lightning component. This component will get voice command and salesforce object record will open.
    Voice command can be integrated using many APIs. Below are some important APIs which can be used for Speech Recognition.

    We will use Web Speech API for speech Recognition. This API use browser’s audio stream to convert speech into text.

    SpeechRecognition : This is speech recognition interface and it is available in browser’s window object. This object is present as SpeechRecognition in Firefox and as webkitSpeechRecognition in Chrome.

    Below code will set recognition interface to SpeechRecognition.

    window.SpeechRecognition = window.webkitSpeechRecognition || window.SpeechRecognition;

    After setting recognition, create SpeechRecognition object using window.SpeechRecognition()

    const recognition = new window.SpeechRecognition();

    This recognition object has many properties, methods and event handlers.

    abort() This method stops the speech recognition service from listening to incoming audio
    start() This method starts the speech recognition service listening to incoming audio with intent to recognize grammars associated with the current SpeechRecognition
    stop() This method Stops the speech recognition service from listening to incoming audio.


    grammars Gets and sets a collection of SpeechGrammar objects that represent the grammars that will be understood by the current SpeechRecognition object.
    lang Gets and sets the language of the current SpeechRecognition.
    interimResults Controls whether interim results should be returned (true) or not (false). By default value is false.
    maxAlternatives Sets the maximum number of SpeechRecognitionAlternatives provided per result. The default value is 1.
    serviceURI Specifies the location of the speech recognition service used by the current SpeechRecognition to handle the actual.


    onstart Fired when the speech recognition service has begun listening to incoming audio with intent to recognize grammars associated with the current SpeechRecognition
    onaudiostart Fired when the user agent has started to capture audio
    onaudioend Fired when the user agent has finished capturing audio
    onend Fired when the speech recognition service has disconnected
    onerror Fired when a speech recognition error occurs
    onnomatch Fired when the speech recognition service returns a final result with no significant recognition
    onresult Fired when the speech recognition service returns a result — a word or phrase has been positively recognized and this has been communicated back to the app
    onsoundstart Fired when any sound — recognisable speech or not — has been detected
    onsoundend Fired when any sound — recognisable speech or not — has stopped being detected
    onspeechstart Fired when sound that is recognised by the speech recognition service as speech has been detected
    onspeechend Fired when speech recognised by the speech recognition service has stopped being detected

    Let us create voice app which open record page based on what we speak using above methods, properties and events.


    <aura:component controller="VoiceCommandController" implements="force:appHostable,flexipage:availableForAllPageTypes,flexipage:availableForRecordHome,force:hasRecordId,forceCommunity:availableForAllPageTypes,force:lightningQuickAction" access="global" >
        <aura:attribute name="value" type="string" default=""></aura:attribute> 
        <lightning:card title="Speech-to-Text">
            <lightning:textarea aura:id="speechText1" value="{!v.value}"></lightning:textarea>
            <lightning:button variant="success" label="Click to Speak" title="Speak" onclick="{! c.handleSpeechText }"/>


        handleSpeechText: function(component, event, helper) {
          window.SpeechRecognition = window.webkitSpeechRecognition || window.SpeechRecognition;
            if ('SpeechRecognition' in window) {
                console.log('supported speech')
            } else {
                console.error('speech not supported')
            const recognition = new window.SpeechRecognition();
            recognition.lang = 'en-IN';
            recognition.continuous = true;
            recognition.onresult = (event) => {
                component.set("v.value",event.results[event.results.length -1][0].transcript);
                var commandText=event.results[event.results.length -1][0].transcript;
                var commands=commandText.split(' ');
                	var obj=commands[1];
                	var condition=commands[3];

    event.results[event.results.length -1][0].transcript this will give voice transcript.  After that we will split those transcript into tags/terms which need to fire in salesforce to open records.

    We will get command in format of “open {object} for/of {objectname}”. For example “open account of dhanik” or “open lead for dhanik” .  In this transcript, we have two important term: object name and data value like account and dhanik.  We will split text transcript to get object name and data values using commandText.split(‘ ‘);.


    Apex method is called to get specific record id. if multiple records are matching then it will return first record. Based on record id, application is navigated to record.

    getRecords : function(cmp,objName,name) {
         	// create a one-time use instance of the serverEcho action
            // in the server-side controller
            var action = cmp.get("c.getRecord");
                   names : name
            // Create a callback that is executed after 
            // the server-side action returns
            action.setCallback(this, function(response) {
                var state = response.getState();
                if (state === "SUCCESS") {
                      var data=response.getReturnValue();
                      var navEvt = $A.get("e.force:navigateToSObject");
                        "recordId": data.Id
                else if (state === "INCOMPLETE") {
                    // do something
                else if (state === "ERROR") {
                    var errors = response.getError();
                    if (errors) {
                        if (errors[0] && errors[0].message) {
                            console.log("Error message: " + 
                    } else {
                        console.log("Unknown error");

    Above helper class will call Apex method to get record id of first matching records. This can be changed based on requirement.

    public class VoiceCommandController {
    	public static sObject getRecord(string objectName, string names)
            String name= '%'+names+'%';
            string soql='select id from '+objectName+' where name like : name';
            List<sObject> sobjList = Database.query(soql);
        		return sobjList[0];
            return null;

    Demo Time

    The Speech Recognition API is can be useful for data entry, record navigation and other useful commands. We can create application to capture instant transcripts. 
    August 12, 2019 4 comments
    0 Facebook Twitter Google + Pinterest