Nowadays organizations are going for digital automation for most of their repetitive work like manual records entries from printed forms. This manual entry requirement can be for the application form, insurance forms, doctor prescription forms, examination forms, digitized business cards, and many more. This post will explain the OCR App using Salesforce Einstein to extract text from images and populate them in Salesforce objects.
For extracting text from images we have many API service available which gives almost 95% accuracy. Refer to my other post related to this service.
- Extract License Plate Number from Image In Salesforce
- Extract Text From Image using Google Cloud Vision
Salesforce announced OCR (Einstein Optical Character Recognition) service in Apr,2019. This API is now available for use.
Einstein Optical Character Recognition (OCR) leverages computer vision to analyze documents and extract relevant information, making repetitive tasks like data entry more efficient.
Let us integrate Einstein OCR in Salesforce for extracting form data. Below steps will be required to integrate it.
- Create an account in Einstein Platform Services
- Create a Private Key and Generate Token
- Call Einstein OCR API from Apex
- Extract image data in the Case object
1. Create an account in Einstein Platform Services
We have to consume Einstein OCR API so first create an API account. Create an account at https://api.einstein.ai/signup. This will send an email to your provided email. Confirm email to start working on OCR.
In the registration process, it will ask you to download a key file. Download that file, it will be used to generate tokens. The file will be saved as einstein_platform.pem. Upload this file in the Salesforce File object. Below is a screenshot of the file record of key file.
You can also follow the steps which are mentioned at Einstein Vision and Language
2. Create a Private Key and Generate Token
For integrating external API from Apex we need an API token which will require to authenticate requests. Einstein OCR API requires a valid JWT Token. This token will be generated from the above-mentioned key file.
The token can be generated online from https://api.einstein.ai/token as well but this will not work when we use API in Salesforce. We have to generate tokens at runtime before calling OCR API. We will use API https://api.einstein.ai/v2/oauth2/token for generating tokens from the apex.
Apex Class for generating Token
EinsteinController.getAccessToken() should be called to generate a token from the apex code before calling API.
3. Call Einstein OCR API from Apex
We have an API Token and API URL https://api.einstein.ai/v2/vision/ocr to extract texts from images. Let us call API from the apex with the required request data.
Request Details:
- sampleLocation : This is the image URL. We can get a downloadable URL for our uploaded image. Refer Extract License Plate Number from Image In Salesforce for creating a downloadable URL for any uploaded image.
- modelId : This parameter define which type of text need to be extracted from image like tabular data or business card. Value for this parameter can be OCRModel (for unstructured data) and tabulatev2 (for tabular data)
Einstein OCR API can be called using multipart/form-data and request parameter will be passed in body as blob.
blob formBlob = EncodingUtil.base64Decode(form64);
string contentLength = string.valueOf(formBlob.size());
req.setBodyAsBlob(formBlob);
Apex Code for calling API
Note: Add API URL (https://api.einstein.ai) in the remote site setting or you can use a named credential to avoid this.
4. Extract image data in the Case object
Now we are ready with consuming API service to extract images. For this post, I have created one sample image form where some field information is present. Using the above Einstein OCR API we will extract data from the image and put that in the Case object.
We need to extract field data from above-mentioned image. Similar to this we can have different forms or business cards. To extract information from above mentioned image we have to map this information in one mapping object.
Object Creation:
Create one custom object OCRTemplateMapping__c with the below fields.
Field Name | Data Type | Size |
Name | Text | 100 |
MinX__c | Number | 5 |
MaxX__c | Number | 5 |
MinY__c | Number | 5 |
MaxY__c | Number | 5 |
Below are sample record for above mentioned image.
If you need to use other form then put these X,Y coordinates accordingly. You can check X and Y coordinates from https://yangcha.github.io/iview/iview.html . For my sample image X, Y coordinates will be like the below image.
As we have to extract image data and put that in case object so add below fields in Case object.
Field Name | Data Type | Size |
FirstName__c | Text | 100 |
LastName__c | Text | 100 |
Email__c | Text | 100 |
Mobile__c | Text | 12 |
Now we ready with object creation. Let us write Apex code that will extract proper image data and put that in case record.
Flow to call above Apex Action:
Create a flow which call above apex method. You can refer post (Extract License Plate Number from Image In Salesforce) for flow and content url creation.
Button to call Flow:
Add one action button in case object which will call above flow.
Demo App:
References:
https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/apex_intro_what_is_apex.htm
63 Comments
Amazing write-up!
Very very Useful post to follow and learn as new concept.
Mr Dhanik is also very approachable on call/email, if you have any doubt or stuck in between implementation.
Amazing write-up!
Very very Useful post to follow and learn as new concept.
Mr Dhanik is also very approachable on call/email, if you have any doubt or stuck in between implementation.
Is there any way of testing this on a sandbox environment?
Yes, we can test that in the sandbox. Mentioned steps will work. Let me now, if you face any difficulty in implementing it.
Thank You,
Dhanik
Getting this error on click of Action in record Level.
An unhandled fault has occurred in this flow
An unhandled fault has occurred while processing the flow. Please contact your system administrator for more information.
When checked on the Dev Console, this is the Exception I’m getting.
implementation restriction: ContentDocumentLink requires a filter by a single Id on Content Document or LinkedEntityId using the equals operator or multiple Id’s using the IN operator.
Hello Shayeedha, links=[SELECT ContentDocumentId,LinkedEntityId FROM ContentDocumentLink where LinkedEntityId=:recodId];
You have to use the where clause in SOQL in the below code otherwise you will get that error.
List
Thank You,
Dhanik
Yes, I have already used the where class and added the same condition as you have stated above.
Let us connect to resolve your issue. Check your email.
Thank You,
Dhanik
This issue is resolved. Issue was button was not getting placed at proper place so record id was not fetching. There was alternate solution to use lightning component instead of flow to extract required information.
Can you tell me how this issue is resolved.. Same error, Query is right may be button placement is wrong. Please help. Thank you.
Hello Harish,
Please check the recordid has an associated attachment. If you are not able to check the issue, let us connect over LinkedIn.
Thank You,
Dhanik
I am uploading 2 attachments with different mappings. It gives an error
Hello Sandhya,
What error you are getting? share screenshot for that. We can connect to resolve your issue.
Thank You,
Dhanik
its throwing this error when we upload two documents ” FLOW_ELEMENT_ERROR An Apex error occurred: System.CalloutException: You have uncommitted work pending. Please commit or rollback before calling out”
Hello Rihan,
It might give this error in getImageText when you have multiple files. Code line
EinsteinOCR.extractText(imageUrl, token, 'OCRModel')
is calling API and then we are updating response detail in object. When we having multiple files, this process will continue again. So you can try updating response detail once at the last when all file content is received. This way, first all API work will be done and in last one update statement will update all response detial.Try this, if you are unable to do this, we can connect to resolve your issue.
Thank You,
Dhanik
Thank You,
Dhanik
ContentDistribution is not fetching any value. Can you please guide.
@dhaniksahni thanks a lot for quick help.
Glad to help you, Smriti.
Thank You,
Dhanik
Hey Smriti,
Please check this link
Thank You,
Dhanik
Hi Dhanik,
Thank you for the write-up. I’m having challenges with the extracted values populating correctly on the case. I think that it’s a problem with my flow, but I’m not sure where I have gone wrong. Would you be able to help me?
As per email communication, you need to change email in EinsteinController. Instead of salesforcecodex@gmail.com, need to use your email which used for setting up Einstein.
Thank You,
Dhanik
is it possible to create a model that can predict labels in invoices/bills or bank statements in einstein ocr?
Yes, it can predict if we specify correct index of label.
Thank You,
Dhanik
Hi Dhanik,
ContentDistribution is not fetching any value. Checked the Line provided above to Smriti. That Feature is Enabled in the Org and all options are checked. What am I missing? Please guide.
It worked! Thank you! I had to Enable public link.
Dear Dhanik,
How to get einstein RSA private key for a sandbox?
I get below error:
“An error occurred while serving your request
It looks like this org doesn’t allow access to the Einstein.ai connected app. Contact your Salesforce admin to allow access, or sign up with a different org.”
It says oauth error. Please help on this
Hello Raksha,
It should work in all org. Please check possible solution at https://metamind.readme.io/page/troubleshooting
If that not work, please ping me in LinkedIn or telegram group. We can join and resolve issue.
Thank You,
Dhanik
Dear Dhanik,
Thanks for explaining in detail. How to establish the connection with sandbox and generate RSA private key?
Regards,
Raksha
Hi Dhanik,
I am getting the following error–> “Error Occurred: An Apex error occurred: System.QueryException: Implementation restriction: ContentDocumentLink requires a filter by a single Id on ContentDocumentId or LinkedEntityId using the equals operator or multiple Id’s using the IN operator.”
Please help me out.
Regards.
Hello Vinesh,
Have you checked that you are getting recodId to filter using query? Check that SOQL, how many records are being returned.
Thank You,
Dhanik
Hi Dhanik,
I had issue with button placement, got it resolved, data is populated but the issue now i am facing is ; if the name is VIRESH PATNAIK, it is populating in the field as PATNAIKVIRESH.
Regards,
Viresh
Hello Viresh, Instead of merging here, add seprate columns and then create formula field.
Thank You,
Dhanik
Hi Dhanik,
I am getting the following error–> “Error Occurred: An Apex error occurred: System.QueryException: Implementation restriction: ContentDocumentLink requires a filter by a single Id on ContentDocumentId or LinkedEntityId using the equals operator or multiple Id’s using the IN operator.”
Please help me out.
Regards.
Same error as Viresh’s. If its resolved for him, Can you explain how to resolve the error.
Hello Harish,
This issue is showing because SOQL cannot find the attachment record. Please check, you are passing the correct recordid to get the attachment.
Thank You,
Dhanik
Pingback: Extract Driver License Detail from Image using Einstein API | SalesforceCodex
Dhanik,
Thank you again for the write-up. Any guidance on the test classes that I’ll need in order to push this to production?
Hey Christian,
What kind of guidance or support you need to test class?
Thank You,
Dhanik
Hello Matheus,
Have you tried using mock test class instead of using @isTest(SeeAllData=true)? It should work in this situation.
Thank You,
Dhanik
@Dhanik , can you update with test classes to this solution. I set up solution and working properly extracting PDf Data.
But got validation proleems with no test data
Hello Ram,
Can you share your test code that is not working so that I can help you?
Thank You,
Dhanik
Hi Dhanik,
This is helpful for document data extraction . As Salesforce introduced the “Intelligent Form Reader” for document data extraction, which one is the best appraoch? And do you have any samples for “Intelligent Form Reader”
Hello Raja,
Intelligent Form Reader is used especially for medical docs in the health cloud. So if your requirement is related to that then you can go ahead with Intelligent Form Reader, otherwise, you can proceed with the Einstein OCR API or any other appexchange app.
Thank You
Dhanik
Pingback: Difference between SOAP and REST API? - SalesforceCodex
Hi Dhanik,
whenever we extracting the data from image to Text into Case , Iam getting the error
1. Image Url == Null
2. ContentDocument Id ={}
Hello Sai,
Please check you are getting the image url that you are using to extract.
Thank You,
Dhanik
Hello Dhanik
we are not receiving a token in response it return “403” error. can you please help me here.
Thanks
Anamika
Hello Anamika,
Looks like you are not passing valid request information. Please check you are passing your correct API details. Even though it is not resolving, please ping me on LinkedIn.
Thank You,
Dhanik
Hi Dhanik,
We tried to implement it on the Partner Community site and it is throwing an error. When we checked the debug logs we observe that the user has no access to the einstein platform file which is causing the error. Can you please guide us here?
Regards & Thanks,
Abishek.
Hello Abhishek,
If your issue is not resolved, let us connect on linked in.
Thank You,
Dhanik
hii dhanik ,
do we have to upload those jpg images in notes and attachment of case object?? and my another question is .. if instead of image we have to go with pdf.. what changes will it needed..
Hello Ashish,
You can use notes/attachement also for this. Only thing you have to take care is, generating public URL for API. Yes, you can use PDF also for this. Please refer document for this.
Thank You,
Dhanik
Hii Dhanik,
u gave us reference for generating image url.. but its of license plate recognition.. so can we use License plate recognition api for this example also.. or we have to search for any other api on rapidapi.com..
Hello Ashish,
You need to take code for trigger ContentVersionExternalLink, ContentDocumentLinkTrigger and trigger handler ContentTriggerHandler for generating public url for any uploaded image. We will pass public URL to OCR API for processing.
Thank You,
Dhanik
thanks dhanik .. it worked for image..
hii dhanik ,,
i want to use this ocr for pdf also.. what changes shall i need to make
Hello Aashish,
Please check Detect Text in PDFs with Einstein OCR (Generally Available)
Thank You,
Dhanik
Hii Dhanik..
I want to perform this functionality for pdf.. but when i upload pdf, i get downloadable pdf url and json data but fields in case objects are not updated.. So can u tell me what changes shall i need to make so that it work for pdf as well… Since i got requirement for pdf and storing data in case object..
thanks
Ashish Sakhare
Hello Ashish,
What issue you are facing in update? Are you getting value for fields? Let us discuss, if you still want to discuss this issue.
Thank You,
Dhanik
Hi I am unable to get the contentDocument ID tried the solutions mentioned above but didn’t work
Hello Namratha,
Plese check our other post Generate Public Link for Salesforce file to resolve this issue.
Thank You,
Dhanik
Pingback: Shopify integration with Salesforce using Webhook | SalesforceCodex
Hi Dhanik,
Thank you for this wonderful article. Somehow we are not able to access the link to signup in einstien ai website. Can you help us to figure out what could be the reason? we are getting below error,
Access to api.einstein.ai was denied
You don’t have authorization to view this page.
HTTP ERROR 403
Hello Dharmendra,
403 is for forbidden access error. Please check credential for API access.
Thank You,
Dhanik