Create OCR App using Salesforce Einstein OCR API

By Dhanik Lal SahniJune 26, 2020Updated:June 11, 202363 Comments5 Mins Read

Create OCR App using Salesforce Einstein OCR API

Nowadays organizations are going for digital automation for most of their repetitive work like manual records entries from printed forms. This manual entry requirement can be for the application form, insurance forms, doctor prescription forms, examination forms, digitized business cards, and many more. This post will explain the OCR App using Salesforce Einstein to extract text from images and populate them in Salesforce objects.

For extracting text from images we have many API service available which gives almost 95% accuracy. Refer to my other post related to this service.

Salesforce announced OCR (Einstein Optical Character Recognition) service in Apr,2019. This API is now available for use.

Einstein Optical Character Recognition (OCR) leverages computer vision to analyze documents and extract relevant information, making repetitive tasks like data entry more efficient.

Let us integrate Einstein OCR in Salesforce for extracting form data. Below steps will be required to integrate it.

Create an account in Einstein Platform Services
Create a Private Key and Generate Token
Call Einstein OCR API from Apex
Extract image data in the Case object

1. Create an account in Einstein Platform Services

We have to consume Einstein OCR API so first create an API account. Create an account at https://api.einstein.ai/signup. This will send an email to your provided email. Confirm email to start working on OCR.

In the registration process, it will ask you to download a key file. Download that file, it will be used to generate tokens. The file will be saved as einstein_platform.pem. Upload this file in the Salesforce File object. Below is a screenshot of the file record of key file.

You can also follow the steps which are mentioned at Einstein Vision and Language

2. Create a Private Key and Generate Token

For integrating external API from Apex we need an API token which will require to authenticate requests. Einstein OCR API requires a valid JWT Token. This token will be generated from the above-mentioned key file.

The token can be generated online from https://api.einstein.ai/token as well but this will not work when we use API in Salesforce. We have to generate tokens at runtime before calling OCR API. We will use API https://api.einstein.ai/v2/oauth2/token for generating tokens from the apex.

Apex Class for generating Token

EinsteinController.getAccessToken() should be called to generate a token from the apex code before calling API.

3. Call Einstein OCR API from Apex

We have an API Token and API URL https://api.einstein.ai/v2/vision/ocr to extract texts from images. Let us call API from the apex with the required request data.

Request Details:

sampleLocation : This is the image URL. We can get a downloadable URL for our uploaded image. Refer Extract License Plate Number from Image In Salesforce for creating a downloadable URL for any uploaded image.
modelId : This parameter define which type of text need to be extracted from image like tabular data or business card. Value for this parameter can be OCRModel (for unstructured data) and tabulatev2 (for tabular data)

Einstein OCR API can be called using multipart/form-data and request parameter will be passed in body as blob.

blob formBlob = EncodingUtil.base64Decode(form64);
string contentLength = string.valueOf(formBlob.size());
req.setBodyAsBlob(formBlob);

Apex Code for calling API

Note: Add API URL (https://api.einstein.ai) in the remote site setting or you can use a named credential to avoid this.

4. Extract image data in the Case object

Now we are ready with consuming API service to extract images. For this post, I have created one sample image form where some field information is present. Using the above Einstein OCR API we will extract data from the image and put that in the Case object.

OCR App using Salesforce Einstein - SalesforceCodex

We need to extract field data from above-mentioned image. Similar to this we can have different forms or business cards. To extract information from above mentioned image we have to map this information in one mapping object.

Object Creation:

Create one custom object OCRTemplateMapping__c with the below fields.

Field Name	Data Type	Size
Name	Text	100
MinX__c	Number	5
MaxX__c	Number	5
MinY__c	Number	5
MaxY__c	Number	5

Below are sample record for above mentioned image.

If you need to use other form then put these X,Y coordinates accordingly. You can check X and Y coordinates from https://yangcha.github.io/iview/iview.html . For my sample image X, Y coordinates will be like the below image.

As we have to extract image data and put that in case object so add below fields in Case object.

Field Name	Data Type	Size
FirstName__c	Text	100
LastName__c	Text	100
Email__c	Text	100
Mobile__c	Text	12

Now we ready with object creation. Let us write Apex code that will extract proper image data and put that in case record.

Flow to call above Apex Action:

Create a flow which call above apex method. You can refer post (Extract License Plate Number from Image In Salesforce) for flow and content url creation.

Button to call Flow:

Add one action button in case object which will call above flow.

Demo App:

References:

Einstein Vision and Language

https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/apex_intro_what_is_apex.htm

Previous ArticleExtract License Plate Number from Image In Salesforce

Next Article Add Icon In Lightning Web Component Tab

Dhanik Lal Sahni

With over 18 years of experience in web-based application development, I specialize in Salesforce technology and its ecosystem. My journey has equipped me with expertise in a diverse range of technologies including .NET, .NET Core, MS Dynamics CRM, Azure, Oracle, and SQL Server. I am dedicated to staying at the forefront of technological advancements and continuously researching new developments in the Salesforce realm. My focus remains on leveraging technology to create innovative solutions that drive business success.

View 63 Comments

63 Comments

Dhananjay Kumar on August 1, 2020 6:11 pm

Amazing write-up!
Very very Useful post to follow and learn as new concept.

Mr Dhanik is also very approachable on call/email, if you have any doubt or stuck in between implementation.

Reply
Dhananjay on August 1, 2020 6:28 pm

Amazing write-up!
Very very Useful post to follow and learn as new concept.

Mr Dhanik is also very approachable on call/email, if you have any doubt or stuck in between implementation.

Reply
Jarosław Kołodziejczyk on September 7, 2020 8:21 pm

Is there any way of testing this on a sandbox environment?

Reply
- Dhanik Lal Sahni on September 8, 2020 2:03 pm
  
  Yes, we can test that in the sandbox. Mentioned steps will work. Let me now, if you face any difficulty in implementing it.
  
  Thank You,
  Dhanik
  
  Reply
Shayeedha on September 9, 2020 3:58 pm

Getting this error on click of Action in record Level.

An unhandled fault has occurred in this flow
An unhandled fault has occurred while processing the flow. Please contact your system administrator for more information.

When checked on the Dev Console, this is the Exception I’m getting.
implementation restriction: ContentDocumentLink requires a filter by a single Id on Content Document or LinkedEntityId using the equals operator or multiple Id’s using the IN operator.

Reply
- Dhanik Lal Sahni on September 9, 2020 4:23 pm
  
  Hello Shayeedha,
  You have to use the where clause in SOQL in the below code otherwise you will get that error.
  List links=[SELECT ContentDocumentId,LinkedEntityId FROM ContentDocumentLink where LinkedEntityId=:recodId];
  
  Thank You,
  Dhanik
  
  Reply
  - Shayeedha on September 9, 2020 4:35 pm
    
    Yes, I have already used the where class and added the same condition as you have stated above.
    
    Reply
    - Dhanik Lal Sahni on September 9, 2020 6:22 pm
      
      Let us connect to resolve your issue. Check your email.
      
      Thank You,
      Dhanik
      
      Reply
    - Dhanik Lal Sahni on September 18, 2020 6:51 pm
      
      This issue is resolved. Issue was button was not getting placed at proper place so record id was not fetching. There was alternate solution to use lightning component instead of flow to extract required information.
      
      Reply
    - harish on February 17, 2023 8:44 pm
      
      Can you tell me how this issue is resolved.. Same error, Query is right may be button placement is wrong. Please help. Thank you.
      
      Reply
      - Dhanik Lal Sahni on February 26, 2023 7:25 pm
        
        Hello Harish,
        Please check the recordid has an associated attachment. If you are not able to check the issue, let us connect over LinkedIn.
        
        Thank You,
        Dhanik
Sandhya on October 12, 2020 4:14 pm

I am uploading 2 attachments with different mappings. It gives an error

Reply
- Dhanik Lal Sahni on October 12, 2020 5:38 pm
  
  Hello Sandhya,
  
  What error you are getting? share screenshot for that. We can connect to resolve your issue.
  
  Thank You,
  Dhanik
  
  Reply
  - rihan on August 28, 2023 12:56 pm
    
    its throwing this error when we upload two documents ” FLOW_ELEMENT_ERROR An Apex error occurred: System.CalloutException: You have uncommitted work pending. Please commit or rollback before calling out”
    
    Reply
    - Dhanik Lal Sahni on September 3, 2023 5:12 pm
      
      Hello Rihan,
      
      It might give this error in getImageText when you have multiple files. Code line EinsteinOCR.extractText(imageUrl, token, 'OCRModel') is calling API and then we are updating response detail in object. When we having multiple files, this process will continue again. So you can try updating response detail once at the last when all file content is received. This way, first all API work will be done and in last one update statement will update all response detial.
      Try this, if you are unable to do this, we can connect to resolve your issue.
      
      Thank You,
      Dhanik
      
      Thank You,
      Dhanik
      
      Reply
smriti on December 6, 2020 9:57 pm

ContentDistribution is not fetching any value. Can you please guide.

Reply
- smriti on December 6, 2020 10:25 pm
  
  @dhaniksahni thanks a lot for quick help.
  
  Reply
  - Dhanik Lal Sahni on December 7, 2020 6:15 pm
    
    Glad to help you, Smriti.
    
    Thank You,
    Dhanik
    
    Reply
- Dhanik Lal Sahni on December 7, 2020 6:14 pm
  
  Hey Smriti,
  
  Please check this link
  
  Thank You,
  Dhanik
  
  Reply
Christian on December 30, 2020 12:01 am

Hi Dhanik,

Thank you for the write-up. I’m having challenges with the extracted values populating correctly on the case. I think that it’s a problem with my flow, but I’m not sure where I have gone wrong. Would you be able to help me?

Reply
- Dhanik Lal Sahni on December 31, 2020 4:21 pm
  
  As per email communication, you need to change email in EinsteinController. Instead of salesforcecodex@gmail.com, need to use your email which used for setting up Einstein.
  
  Thank You,
  Dhanik
  
  Reply
arshad on January 11, 2021 11:41 am

is it possible to create a model that can predict labels in invoices/bills or bank statements in einstein ocr?

Reply
- Dhanik Lal Sahni on January 11, 2021 3:48 pm
  
  Yes, it can predict if we specify correct index of label.
  
  Thank You,
  Dhanik
  
  Reply
Sunil on February 4, 2021 12:43 pm

Hi Dhanik,

ContentDistribution is not fetching any value. Checked the Line provided above to Smriti. That Feature is Enabled in the Org and all options are checked. What am I missing? Please guide.

Reply
- Sunil on February 4, 2021 12:57 pm
  
  It worked! Thank you! I had to Enable public link.
  
  Reply
Raksha H R on June 1, 2021 12:38 pm

Dear Dhanik,
How to get einstein RSA private key for a sandbox?
I get below error:

“An error occurred while serving your request
It looks like this org doesn’t allow access to the Einstein.ai connected app. Contact your Salesforce admin to allow access, or sign up with a different org.”

It says oauth error. Please help on this

Reply
- Dhanik Lal Sahni on June 5, 2021 7:28 pm
  
  Hello Raksha,
  
  It should work in all org. Please check possible solution at https://metamind.readme.io/page/troubleshooting
  
  If that not work, please ping me in LinkedIn or telegram group. We can join and resolve issue.
  
  Thank You,
  Dhanik
  
  Reply
Raksha H R on June 1, 2021 6:15 pm

Dear Dhanik,
Thanks for explaining in detail. How to establish the connection with sandbox and generate RSA private key?

Regards,
Raksha

Reply
Viresh Patnaik on June 30, 2021 6:18 pm

Hi Dhanik,

I am getting the following error–> “Error Occurred: An Apex error occurred: System.QueryException: Implementation restriction: ContentDocumentLink requires a filter by a single Id on ContentDocumentId or LinkedEntityId using the equals operator or multiple Id’s using the IN operator.”
Please help me out.

Regards.

Reply
- Dhanik Lal Sahni on July 3, 2021 1:33 pm
  
  Hello Vinesh,
  
  Have you checked that you are getting recodId to filter using query? Check that SOQL, how many records are being returned.
  
  Thank You,
  Dhanik
  
  Reply
  - Viresh on July 5, 2021 4:00 pm
    
    Hi Dhanik,
    
    I had issue with button placement, got it resolved, data is populated but the issue now i am facing is ; if the name is VIRESH PATNAIK, it is populating in the field as PATNAIKVIRESH.
    
    Regards,
    Viresh
    
    Reply
    - Dhanik Lal Sahni on July 8, 2021 5:05 pm
      
      Hello Viresh, Instead of merging here, add seprate columns and then create formula field.
      
      Thank You,
      Dhanik
      
      Reply
- Harish on February 17, 2023 8:17 pm
  
  Hi Dhanik,
  
  I am getting the following error–> “Error Occurred: An Apex error occurred: System.QueryException: Implementation restriction: ContentDocumentLink requires a filter by a single Id on ContentDocumentId or LinkedEntityId using the equals operator or multiple Id’s using the IN operator.”
  Please help me out.
  
  Regards.
  Same error as Viresh’s. If its resolved for him, Can you explain how to resolve the error.
  
  Reply
  - Dhanik Lal Sahni on February 26, 2023 7:23 pm
    
    Hello Harish,
    This issue is showing because SOQL cannot find the attachment record. Please check, you are passing the correct recordid to get the attachment.
    
    Thank You,
    Dhanik
    
    Reply
Pingback: Extract Driver License Detail from Image using Einstein API | SalesforceCodex
Christian on September 7, 2021 11:39 pm

Dhanik,

Thank you again for the write-up. Any guidance on the test classes that I’ll need in order to push this to production?

Reply
- Dhanik Lal Sahni on September 8, 2021 10:08 pm
  
  Hey Christian,
  
  What kind of guidance or support you need to test class?
  
  Thank You,
  Dhanik
  
  Reply
  - Dhanik Lal Sahni on November 6, 2021 5:16 am
    
    Hello Matheus,
    Have you tried using mock test class instead of using @isTest(SeeAllData=true)? It should work in this situation.
    
    Thank You,
    Dhanik
    
    Reply
ram on March 24, 2022 6:42 pm

@Dhanik , can you update with test classes to this solution. I set up solution and working properly extracting PDf Data.
But got validation proleems with no test data

Reply
- Dhanik Lal Sahni on March 25, 2022 8:55 pm
  
  Hello Ram,
  
  Can you share your test code that is not working so that I can help you?
  
  Thank You,
  Dhanik
  
  Reply
Raja on April 12, 2022 12:35 pm

Hi Dhanik,

This is helpful for document data extraction . As Salesforce introduced the “Intelligent Form Reader” for document data extraction, which one is the best appraoch? And do you have any samples for “Intelligent Form Reader”

Reply
- Dhanik Lal Sahni on April 18, 2022 11:15 am
  
  Hello Raja,
  
  Intelligent Form Reader is used especially for medical docs in the health cloud. So if your requirement is related to that then you can go ahead with Intelligent Form Reader, otherwise, you can proceed with the Einstein OCR API or any other appexchange app.
  
  Thank You
  Dhanik
  
  Reply
Pingback: Difference between SOAP and REST API? - SalesforceCodex
Sai on November 28, 2022 11:02 am

Hi Dhanik,
whenever we extracting the data from image to Text into Case , Iam getting the error
1. Image Url == Null
2. ContentDocument Id ={}

Reply
- Dhanik Lal Sahni on November 29, 2022 7:41 pm
  
  Hello Sai,
  
  Please check you are getting the image url that you are using to extract.
  
  Thank You,
  Dhanik
  
  Reply
Anamika Shinde on February 27, 2023 4:11 pm

Hello Dhanik

we are not receiving a token in response it return “403” error. can you please help me here.

Thanks
Anamika

Reply
- Dhanik Lal Sahni on February 28, 2023 12:14 pm
  
  Hello Anamika,
  
  Looks like you are not passing valid request information. Please check you are passing your correct API details. Even though it is not resolving, please ping me on LinkedIn.
  
  Thank You,
  Dhanik
  
  Reply
Abishek Datta Porandla on April 4, 2023 12:42 pm

Hi Dhanik,

We tried to implement it on the Partner Community site and it is throwing an error. When we checked the debug logs we observe that the user has no access to the einstein platform file which is causing the error. Can you please guide us here?

Regards & Thanks,
Abishek.

Reply
- Dhanik Lal Sahni on April 13, 2023 6:40 pm
  
  Hello Abhishek,
  
  If your issue is not resolved, let us connect on linked in.
  
  Thank You,
  Dhanik
  
  Reply
Ashish Sakhare on May 25, 2023 1:28 pm

hii dhanik ,
do we have to upload those jpg images in notes and attachment of case object?? and my another question is .. if instead of image we have to go with pdf.. what changes will it needed..

Reply
- Dhanik Lal Sahni on May 27, 2023 10:16 am
  
  Hello Ashish,
  
  You can use notes/attachement also for this. Only thing you have to take care is, generating public URL for API. Yes, you can use PDF also for this. Please refer document for this.
  
  Thank You,
  Dhanik
  
  Reply
Ashish Sakhare on May 26, 2023 6:58 am

Hii Dhanik,
u gave us reference for generating image url.. but its of license plate recognition.. so can we use License plate recognition api for this example also.. or we have to search for any other api on rapidapi.com..

Reply
- Dhanik Lal Sahni on May 27, 2023 10:10 am
  
  Hello Ashish,
  
  You need to take code for trigger ContentVersionExternalLink, ContentDocumentLinkTrigger and trigger handler ContentTriggerHandler for generating public url for any uploaded image. We will pass public URL to OCR API for processing.
  
  Thank You,
  Dhanik
  
  Reply
  - Ashish Sakhare on May 28, 2023 8:17 am
    
    thanks dhanik .. it worked for image..
    
    Reply
Ashish Sakhare on May 28, 2023 8:13 am

hii dhanik ,,
i want to use this ocr for pdf also.. what changes shall i need to make

Reply
- Dhanik Lal Sahni on June 12, 2023 11:31 am
  
  Hello Aashish,
  
  Please check Detect Text in PDFs with Einstein OCR (Generally Available)
  
  Thank You,
  Dhanik
  
  Reply
Ashish Sakhare on May 28, 2023 8:24 am

Hii Dhanik..
I want to perform this functionality for pdf.. but when i upload pdf, i get downloadable pdf url and json data but fields in case objects are not updated.. So can u tell me what changes shall i need to make so that it work for pdf as well… Since i got requirement for pdf and storing data in case object..

thanks
Ashish Sakhare

Reply
- Dhanik Lal Sahni on September 3, 2023 5:38 pm
  
  Hello Ashish,
  What issue you are facing in update? Are you getting value for fields? Let us discuss, if you still want to discuss this issue.
  
  Thank You,
  Dhanik
  
  Reply
namratha on July 21, 2023 1:17 pm

Hi I am unable to get the contentDocument ID tried the solutions mentioned above but didn’t work

Reply
- Dhanik Lal Sahni on July 23, 2023 9:56 pm
  
  Hello Namratha,
  Plese check our other post Generate Public Link for Salesforce file to resolve this issue.
  
  Thank You,
  Dhanik
  
  Reply
Pingback: Shopify integration with Salesforce using Webhook | SalesforceCodex
Dharmendra Bhuva on October 14, 2024 9:31 am

Hi Dhanik,

Thank you for this wonderful article. Somehow we are not able to access the link to signup in einstien ai website. Can you help us to figure out what could be the reason? we are getting below error,

Access to api.einstein.ai was denied
You don’t have authorization to view this page.
HTTP ERROR 403

Reply
- Dhanik Lal Sahni on October 27, 2024 9:51 pm
  
  Hello Dharmendra,
  403 is for forbidden access error. Please check credential for API access.
  
  Thank You,
  Dhanik
  
  Reply

Create OCR App using Salesforce Einstein OCR API

How to Set Up Single Sign-On (SSO) Between Okta and Salesforce

Salesforce Architect Guide to Mastering APIs for Scalable Integration

The Hidden Risks of Overusing Lookups in Salesforce

63 Comments

Quick Links

Salesforce Architect

Salesforce Developer

Create OCR App using Salesforce Einstein OCR API

1. Create an account in Einstein Platform Services

2. Create a Private Key and Generate Token

Apex Class for generating Token

3. Call Einstein OCR API from Apex

Apex Code for calling API

4. Extract image data in the Case object

Object Creation:

Flow to call above Apex Action:

Button to call Flow:

Demo App:

Related Posts

Related Posts

How to Set Up Single Sign-On (SSO) Between Okta and Salesforce

Salesforce Architect Guide to Mastering APIs for Scalable Integration

The Hidden Risks of Overusing Lookups in Salesforce

63 Comments

Quick Links

Salesforce Architect

Salesforce Developer

Subscribe to Updates