Recognize Text in Images with Firebase ML on Android with Text-to-Speech.
Chances are that in some point your life you wanted to do this so badly for any of your projects, if you’re like me. I did many things to achieve this, but doing it on mobile is the greatest superpower that you can ever have.
Before starting this, do take a look at the official documentation for updated content. One thing to keep in mind is that here I’m talking about cloud text recognition and not ‘on device text recognition’. Also note that this method is for images, not for images of documents.
You should have some prior knowledge in Android Studio to follow me. The project is on github.
You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…
First create an empty project with an Empty Activity.
Set minimum SDK level as 21. If you want you can set to enable legacy libraries, but I don’t think it is a must.
Once the project is created, open
activity_main.xml and create an
ImageView , 1
TextView and 3
Buttons for each action (capture image, covert to text, text-to-speech). I used
Relativeayout but you can use
ConstrainedLayout . Use the properties as the following.
Since we’re accessing the camera we need to set permissions for that too.
AndroidManifest.xml and put following line just before the
..and following line inside
Now open up
MainActivity.java and we’ll start coding.
First create variables for the screen controls and get their references.
Now create on click listeners for 3 buttons. After initializing the
onCreate function, and add following.
Now, before moving forward we need to add firebase to your project.
Go to https://console.firebase.google.com and click on ‘Add Project’
Provide a project name.
Select default account for analytics.
After the project is created, you will be sent to the project page. Here you need to add a new app to the project. Click an android icon to add an android project.
Add the package name. (you can find it in
Follow the guide and download the
google-services.json , copy it to the specified location.
In next step, modify the gradle files as advised.
But instead of
implementation ‘com.google.firebase:firebase-analytics:17.2.2’ we will add following 2 lines.
After that is done, we can move forward.
When user clicks on ‘Capture’ button we take a picture using device camera and place it to the
dispatchTakePictureIntent() function). Then user clicks on ‘Detect’ which will use firebase ML to convert to text and output will be stored in
detectTextFromImage() function) and finally ‘Speak’ button will read it out loud. You will notice that we have set to text-to-speech language as `
English`. Now we will implement those functions and some other helper functions.
onCreate function, add following 4 functions.
First two functions will take a picture using device camera.
The Android way of delegating actions to other applications is to invoke an
Intentthat describes what you want done. This process involves three pieces: The
Intentitself, a call to start the external
Activity, and some code to handle the image data when focus returns to your activity.
If the simple feat of taking a photo is not the culmination of your app’s ambition, then you probably want to get the image back from the camera application and do something with it.
The Android Camera application encodes the photo in the return
onActivityResult()as a small
Bitmapin the extras, under the key
"data". The following code retrieves this image and displays it in an
Last two functions are used to detect text from the image we have now as a Bitmap. Here we create a
FirebaseVisionImage object from the
Bitmap and using
FirebaseVisionTextDetector to get the text in it.
If the text recognition operation succeeds, a
FirebaseVisionText object will be passed to the success listener. A
FirebaseVisionText object contains the full text recognized in the image and zero or more
TextBlock represents a rectangular block of text, which contains zero or more
Line objects. Each
Line object contains zero or more
Element objects, which represent words and word-like entities (dates, numbers, and so on).
Element object, you can get the text recognized in the region and the bounding coordinates of the region.
That’s it. I hope you learned something new!. 😎
Edit: I made an automated text recognition app for blind people based on this. Feel free to go through the code and see what I did. I used an old version of CameraX library and Firebase with some basic UI automation to do this. [APKs]