Home SalesforceApex Create OCR App using Salesforce Einstein OCR API

Create OCR App using Salesforce Einstein OCR API

by Dhanik Lal Sahni

Nowadays organizations are going for digital automation for most of their repetitive work like manual records entries from printed forms. This manual entry requirement can be for the application form, insurance forms, doctor prescription forms, examination forms, digitized business cards, and many more. This post will explain the OCR App using Salesforce Einstein to extract text from images and populate them in Salesforce objects.

For extracting text from images we have many API service available which gives almost 95% accuracy. Refer to my other post related to this service.

Salesforce announced OCR (Einstein Optical Character Recognition) service in Apr,2019. This API is now available for use.

Einstein Optical Character Recognition (OCR) leverages computer vision to analyze documents and extract relevant information, making repetitive tasks like data entry more efficient.

Let us integrate Einstein OCR in Salesforce for extracting form data. Below steps will be required to integrate it.

  1. Create an account in Einstein Platform Services
  2. Create a Private Key and Generate Token
  3. Call Einstein OCR API from Apex
  4. Extract image data in the Case object

1. Create an account in Einstein Platform Services

We have to consume Einstein OCR API so first create an API account. Create an account at https://api.einstein.ai/signup. This will send an email to your provided email. Confirm email to start working on OCR.

In the registration process, it will ask you to download a key file. Download that file, it will be used to generate tokens. The file will be saved as einstein_platform.pem. Upload this file in the Salesforce File object. Below is a screenshot of the file record of key file.

Salesforce Code OCR

You can also follow the steps which are mentioned at Einstein Vision and Language

2. Create a Private Key and Generate Token

For integrating external API from Apex we need an API token which will require to authenticate requests. Einstein OCR API requires a valid JWT Token. This token will be generated from the above-mentioned key file.

The token can be generated online from https://api.einstein.ai/token as well but this will not work when we use API in Salesforce. We have to generate tokens at runtime before calling OCR API. We will use API https://api.einstein.ai/v2/oauth2/token for generating tokens from the apex.

Apex Class for generating Token

EinsteinController.getAccessToken() should be called to generate a token from the apex code before calling API.

3. Call Einstein OCR API from Apex

We have an API Token and API URL https://api.einstein.ai/v2/vision/ocr to extract texts from images. Let us call API from the apex with the required request data.

Request Details:

  1. sampleLocation : This is the image URL. We can get a downloadable URL for our uploaded image. Refer Extract License Plate Number from Image In Salesforce for creating a downloadable URL for any uploaded image.
  2. modelId : This parameter define which type of text need to be extracted from image like tabular data or business card. Value for this parameter can be OCRModel (for unstructured data) and tabulatev2 (for tabular data)

Einstein OCR API can be called using multipart/form-data and request parameter will be passed in body as blob.

blob formBlob = EncodingUtil.base64Decode(form64);
string contentLength = string.valueOf(formBlob.size());
req.setBodyAsBlob(formBlob);

Apex Code for calling API

Note: Add API URL (https://api.einstein.ai) in the remote site setting or you can use a named credential to avoid this.

4. Extract image data in the Case object

Now we are ready with consuming API service to extract images. For this post, I have created one sample image form where some field information is present. Using the above Einstein OCR API we will extract data from the image and put that in the Case object.

OCR App using Salesforce Einstein - SalesforceCodex

We need to extract field data from above-mentioned image. Similar to this we can have different forms or business cards. To extract information from above mentioned image we have to map this information in one mapping object.

Object Creation:

Create one custom object OCRTemplateMapping__c with the below fields.

Field NameData TypeSize
NameText100
MinX__cNumber5
MaxX__cNumber5
MinY__cNumber5
MaxY__cNumber5

Below are sample record for above mentioned image.

OCR App using Salesforce Einstein - SalesforceCodex

If you need to use other form then put these X,Y coordinates accordingly. You can check X and Y coordinates from https://yangcha.github.io/iview/iview.html . For my sample image X, Y coordinates will be like the below image.

OCR App using Salesforce Einstein - SalesforceCodex

As we have to extract image data and put that in case object so add below fields in Case object.

Field NameData TypeSize
FirstName__cText100
LastName__cText100
Email__cText100
Mobile__cText12

Now we ready with object creation. Let us write Apex code that will extract proper image data and put that in case record.

Flow to call above Apex Action:

Create a flow which call above apex method. You can refer post (Extract License Plate Number from Image In Salesforce) for flow and content url creation.

OCR App using Salesforce Einstein - SalesforceCodex

Button to call Flow:

Add one action button in case object which will call above flow.

OCR App using Salesforce Einstein - SalesforceCodex

Demo App:

References:

Einstein Vision and Language

https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/apex_intro_what_is_apex.htm

Related Posts

  1. Named Entity Recognition using Salesforce Einstein API
  2. Extract Driver License Detail from Image using Einstein API
  3. Extract Text From Image using Google Cloud Vision

You may also like

61 comments

Dhananjay Kumar August 1, 2020 - 6:11 pm

Amazing write-up!
Very very Useful post to follow and learn as new concept.

Mr Dhanik is also very approachable on call/email, if you have any doubt or stuck in between implementation.

Reply
Dhananjay August 1, 2020 - 6:28 pm

Amazing write-up!
Very very Useful post to follow and learn as new concept.

Mr Dhanik is also very approachable on call/email, if you have any doubt or stuck in between implementation.

Reply
Jarosław Kołodziejczyk September 7, 2020 - 8:21 pm

Is there any way of testing this on a sandbox environment?

Reply
Dhanik Lal Sahni September 8, 2020 - 2:03 pm

Yes, we can test that in the sandbox. Mentioned steps will work. Let me now, if you face any difficulty in implementing it.

Thank You,
Dhanik

Reply
Shayeedha September 9, 2020 - 3:58 pm

Getting this error on click of Action in record Level.

An unhandled fault has occurred in this flow
An unhandled fault has occurred while processing the flow. Please contact your system administrator for more information.

When checked on the Dev Console, this is the Exception I’m getting.
implementation restriction: ContentDocumentLink requires a filter by a single Id on Content Document or LinkedEntityId using the equals operator or multiple Id’s using the IN operator.

Reply
Dhanik Lal Sahni September 9, 2020 - 4:23 pm

Hello Shayeedha,
You have to use the where clause in SOQL in the below code otherwise you will get that error.
List links=[SELECT ContentDocumentId,LinkedEntityId FROM ContentDocumentLink where LinkedEntityId=:recodId];

Thank You,
Dhanik

Reply
Shayeedha September 9, 2020 - 4:35 pm

Yes, I have already used the where class and added the same condition as you have stated above.

Reply
Dhanik Lal Sahni September 9, 2020 - 6:22 pm

Let us connect to resolve your issue. Check your email.

Thank You,
Dhanik

Reply
Dhanik Lal Sahni September 18, 2020 - 6:51 pm

This issue is resolved. Issue was button was not getting placed at proper place so record id was not fetching. There was alternate solution to use lightning component instead of flow to extract required information.

Reply
harish February 17, 2023 - 8:44 pm

Can you tell me how this issue is resolved.. Same error, Query is right may be button placement is wrong. Please help. Thank you.

Reply
Dhanik Lal Sahni February 26, 2023 - 7:25 pm

Hello Harish,
Please check the recordid has an associated attachment. If you are not able to check the issue, let us connect over LinkedIn.

Thank You,
Dhanik

Sandhya October 12, 2020 - 4:14 pm

I am uploading 2 attachments with different mappings. It gives an error

Reply
Dhanik Lal Sahni October 12, 2020 - 5:38 pm

Hello Sandhya,

What error you are getting? share screenshot for that. We can connect to resolve your issue.

Thank You,
Dhanik

Reply
rihan August 28, 2023 - 12:56 pm

its throwing this error when we upload two documents ” FLOW_ELEMENT_ERROR An Apex error occurred: System.CalloutException: You have uncommitted work pending. Please commit or rollback before calling out”

Reply
Dhanik Lal Sahni September 3, 2023 - 5:12 pm

Hello Rihan,

It might give this error in getImageText when you have multiple files. Code line EinsteinOCR.extractText(imageUrl, token, 'OCRModel') is calling API and then we are updating response detail in object. When we having multiple files, this process will continue again. So you can try updating response detail once at the last when all file content is received. This way, first all API work will be done and in last one update statement will update all response detial.
Try this, if you are unable to do this, we can connect to resolve your issue.

Thank You,
Dhanik

Thank You,
Dhanik

Reply
smriti December 6, 2020 - 9:57 pm

ContentDistribution is not fetching any value. Can you please guide.

Reply
smriti December 6, 2020 - 10:25 pm

@dhaniksahni thanks a lot for quick help.

Reply
Dhanik Lal Sahni December 7, 2020 - 6:15 pm

Glad to help you, Smriti.

Thank You,
Dhanik

Reply
Dhanik Lal Sahni December 7, 2020 - 6:14 pm

Hey Smriti,

Please check this link

Thank You,
Dhanik

Reply
Christian December 30, 2020 - 12:01 am

Hi Dhanik,

Thank you for the write-up. I’m having challenges with the extracted values populating correctly on the case. I think that it’s a problem with my flow, but I’m not sure where I have gone wrong. Would you be able to help me?

Reply
Dhanik Lal Sahni December 31, 2020 - 4:21 pm

As per email communication, you need to change email in EinsteinController. Instead of salesforcecodex@gmail.com, need to use your email which used for setting up Einstein.

Thank You,
Dhanik

Reply
arshad January 11, 2021 - 11:41 am

is it possible to create a model that can predict labels in invoices/bills or bank statements in einstein ocr?

Reply
Dhanik Lal Sahni January 11, 2021 - 3:48 pm

Yes, it can predict if we specify correct index of label.

Thank You,
Dhanik

Reply
Sunil February 4, 2021 - 12:43 pm

Hi Dhanik,

ContentDistribution is not fetching any value. Checked the Line provided above to Smriti. That Feature is Enabled in the Org and all options are checked. What am I missing? Please guide.

Reply
Sunil February 4, 2021 - 12:57 pm

It worked! Thank you! I had to Enable public link.

Reply
Raksha H R June 1, 2021 - 12:38 pm

Dear Dhanik,
How to get einstein RSA private key for a sandbox?
I get below error:

“An error occurred while serving your request
It looks like this org doesn’t allow access to the Einstein.ai connected app. Contact your Salesforce admin to allow access, or sign up with a different org.”

It says oauth error. Please help on this

Reply
Dhanik Lal Sahni June 5, 2021 - 7:28 pm

Hello Raksha,

It should work in all org. Please check possible solution at https://metamind.readme.io/page/troubleshooting

If that not work, please ping me in LinkedIn or telegram group. We can join and resolve issue.

Thank You,
Dhanik

Reply
Raksha H R June 1, 2021 - 6:15 pm

Dear Dhanik,
Thanks for explaining in detail. How to establish the connection with sandbox and generate RSA private key?

Regards,
Raksha

Reply
Viresh Patnaik June 30, 2021 - 6:18 pm

Hi Dhanik,

I am getting the following error–> “Error Occurred: An Apex error occurred: System.QueryException: Implementation restriction: ContentDocumentLink requires a filter by a single Id on ContentDocumentId or LinkedEntityId using the equals operator or multiple Id’s using the IN operator.”
Please help me out.

Regards.

Reply
Dhanik Lal Sahni July 3, 2021 - 1:33 pm

Hello Vinesh,

Have you checked that you are getting recodId to filter using query? Check that SOQL, how many records are being returned.

Thank You,
Dhanik

Reply
Viresh July 5, 2021 - 4:00 pm

Hi Dhanik,

I had issue with button placement, got it resolved, data is populated but the issue now i am facing is ; if the name is VIRESH PATNAIK, it is populating in the field as PATNAIKVIRESH.

Regards,
Viresh

Reply
Dhanik Lal Sahni July 8, 2021 - 5:05 pm

Hello Viresh, Instead of merging here, add seprate columns and then create formula field.

Thank You,
Dhanik

Reply
Harish February 17, 2023 - 8:17 pm

Hi Dhanik,

I am getting the following error–> “Error Occurred: An Apex error occurred: System.QueryException: Implementation restriction: ContentDocumentLink requires a filter by a single Id on ContentDocumentId or LinkedEntityId using the equals operator or multiple Id’s using the IN operator.”
Please help me out.

Regards.
Same error as Viresh’s. If its resolved for him, Can you explain how to resolve the error.

Reply
Dhanik Lal Sahni February 26, 2023 - 7:23 pm

Hello Harish,
This issue is showing because SOQL cannot find the attachment record. Please check, you are passing the correct recordid to get the attachment.

Thank You,
Dhanik

Reply
Extract Driver License Detail from Image using Einstein API | SalesforceCodex August 17, 2021 - 5:15 pm

[…] account at https://api.einstein.ai/signup. See step 1 from Create OCR App using Salesforce Einstein OCR API for more […]

Reply
Christian September 7, 2021 - 11:39 pm

Dhanik,

Thank you again for the write-up. Any guidance on the test classes that I’ll need in order to push this to production?

Reply
Dhanik Lal Sahni September 8, 2021 - 10:08 pm

Hey Christian,

What kind of guidance or support you need to test class?

Thank You,
Dhanik

Reply
Dhanik Lal Sahni November 6, 2021 - 5:16 am

Hello Matheus,
Have you tried using mock test class instead of using @isTest(SeeAllData=true)? It should work in this situation.

Thank You,
Dhanik

Reply
ram March 24, 2022 - 6:42 pm

@Dhanik , can you update with test classes to this solution. I set up solution and working properly extracting PDf Data.
But got validation proleems with no test data

Reply
Dhanik Lal Sahni March 25, 2022 - 8:55 pm

Hello Ram,

Can you share your test code that is not working so that I can help you?

Thank You,
Dhanik

Reply
Raja April 12, 2022 - 12:35 pm

Hi Dhanik,

This is helpful for document data extraction . As Salesforce introduced the “Intelligent Form Reader” for document data extraction, which one is the best appraoch? And do you have any samples for “Intelligent Form Reader”

Reply
Dhanik Lal Sahni April 18, 2022 - 11:15 am

Hello Raja,

Intelligent Form Reader is used especially for medical docs in the health cloud. So if your requirement is related to that then you can go ahead with Intelligent Form Reader, otherwise, you can proceed with the Einstein OCR API or any other appexchange app.

Thank You
Dhanik

Reply
Difference between SOAP and REST API? - SalesforceCodex September 5, 2022 - 10:37 am

[…] File in S3 using Named Credential OCR App using Salesforce Einstein OCR API Integrate Salesforce with WhatsApp using […]

Reply
Sai November 28, 2022 - 11:02 am

Hi Dhanik,
whenever we extracting the data from image to Text into Case , Iam getting the error
1. Image Url == Null
2. ContentDocument Id ={}

Reply
Dhanik Lal Sahni November 29, 2022 - 7:41 pm

Hello Sai,

Please check you are getting the image url that you are using to extract.

Thank You,
Dhanik

Reply
Anamika Shinde February 27, 2023 - 4:11 pm

Hello Dhanik

we are not receiving a token in response it return “403” error. can you please help me here.

Thanks
Anamika

Reply
Dhanik Lal Sahni February 28, 2023 - 12:14 pm

Hello Anamika,

Looks like you are not passing valid request information. Please check you are passing your correct API details. Even though it is not resolving, please ping me on LinkedIn.

Thank You,
Dhanik

Reply
Abishek Datta Porandla April 4, 2023 - 12:42 pm

Hi Dhanik,

We tried to implement it on the Partner Community site and it is throwing an error. When we checked the debug logs we observe that the user has no access to the einstein platform file which is causing the error. Can you please guide us here?

Regards & Thanks,
Abishek.

Reply
Dhanik Lal Sahni April 13, 2023 - 6:40 pm

Hello Abhishek,

If your issue is not resolved, let us connect on linked in.

Thank You,
Dhanik

Reply
Ashish Sakhare May 25, 2023 - 1:28 pm

hii dhanik ,
do we have to upload those jpg images in notes and attachment of case object?? and my another question is .. if instead of image we have to go with pdf.. what changes will it needed..

Reply
Dhanik Lal Sahni May 27, 2023 - 10:16 am

Hello Ashish,

You can use notes/attachement also for this. Only thing you have to take care is, generating public URL for API. Yes, you can use PDF also for this. Please refer document for this.

Thank You,
Dhanik

Reply
Ashish Sakhare May 26, 2023 - 6:58 am

Hii Dhanik,
u gave us reference for generating image url.. but its of license plate recognition.. so can we use License plate recognition api for this example also.. or we have to search for any other api on rapidapi.com..

Reply
Dhanik Lal Sahni May 27, 2023 - 10:10 am

Hello Ashish,

You need to take code for trigger ContentVersionExternalLink, ContentDocumentLinkTrigger and trigger handler ContentTriggerHandler for generating public url for any uploaded image. We will pass public URL to OCR API for processing.

Thank You,
Dhanik

Reply
Ashish Sakhare May 28, 2023 - 8:17 am

thanks dhanik .. it worked for image..

Reply
Ashish Sakhare May 28, 2023 - 8:13 am

hii dhanik ,,
i want to use this ocr for pdf also.. what changes shall i need to make

Reply
Dhanik Lal Sahni June 12, 2023 - 11:31 am

Hello Aashish,

Please check Detect Text in PDFs with Einstein OCR (Generally Available)

Thank You,
Dhanik

Reply
Ashish Sakhare May 28, 2023 - 8:24 am

Hii Dhanik..
I want to perform this functionality for pdf.. but when i upload pdf, i get downloadable pdf url and json data but fields in case objects are not updated.. So can u tell me what changes shall i need to make so that it work for pdf as well… Since i got requirement for pdf and storing data in case object..

thanks
Ashish Sakhare

Reply
Dhanik Lal Sahni September 3, 2023 - 5:38 pm

Hello Ashish,
What issue you are facing in update? Are you getting value for fields? Let us discuss, if you still want to discuss this issue.

Thank You,
Dhanik

Reply
namratha July 21, 2023 - 1:17 pm

Hi I am unable to get the contentDocument ID tried the solutions mentioned above but didn’t work

Reply
Dhanik Lal Sahni July 23, 2023 - 9:56 pm

Hello Namratha,
Plese check our other post Generate Public Link for Salesforce file to resolve this issue.

Thank You,
Dhanik

Reply
Shopify integration with Salesforce using Webhook | SalesforceCodex June 26, 2024 - 10:02 pm

[…] Create OCR App using Salesforce Einstein OCR API […]

Reply

Leave a Comment