Recognize Text in Images with Firebase ML on Android with Text-to-Speech.
Cloud text recognition
Chances are that in some point your life you wanted to do this so badly for any of your projects, if you’re like me. I did many things to achieve this, but doing it on mobile is the greatest superpower that you can ever have.
Before starting this, do take a look at the official documentation for updated content. One thing to keep in mind is that here I’m talking about cloud text recognition and not ‘on device text recognition’. Also note that this method is for images, not for images of documents.
You should have some prior knowledge in Android Studio to follow me. The project is on github.
First create an empty project with an Empty Activity.
Set minimum SDK level as 21. If you want you can set to enable legacy libraries, but I don’t think it is a must.
Once the project is created, open activity_main.xml
and create an ImageView
, 1 TextView
and 3 Buttons
for each action (capture image, covert to text, text-to-speech). I used Relativeayout
but you can use ConstrainedLayout
. Use the properties as the following.
Since we’re accessing the camera we need to set permissions for that too.
Open AndroidManifest.xml
and put following line just before the <application>
tag.
<uses-feature android:name="android.hardware.camera"
android:required="true" />
..and following line inside <activity>
tag.
<meta-data
android:name="com.google.firebase.ml.vision.DEPENDENCIES"
android:value="ocr" />
Now open up MainActivity.java
and we’ll start coding.
First create variables for the screen controls and get their references.
Now create on click listeners for 3 buttons. After initializing the textView
inside onCreate
function, and add following.
Now, before moving forward we need to add firebase to your project.
Go to https://console.firebase.google.com and click on ‘Add Project’
Provide a project name.
Enable analytics.
Select default account for analytics.
After the project is created, you will be sent to the project page. Here you need to add a new app to the project. Click an android icon to add an android project.
Add the package name. (you can find it in applicationId
in build.gradle
file.)
Follow the guide and download the google-services.json
, copy it to the specified location.
In next step, modify the gradle files as advised.
But instead of implementation ‘com.google.firebase:firebase-analytics:17.2.2’
we will add following 2 lines.
implementation 'com.google.firebase:firebase-core:15.0.2'
implementation 'com.google.firebase:firebase-ml-vision:15.0.0'
Afterwards, click Sync Now
.
After that is done, we can move forward.
When user clicks on ‘Capture’ button we take a picture using device camera and place it to the imageView
(dispatchTakePictureIntent()
function). Then user clicks on ‘Detect’ which will use firebase ML to convert to text and output will be stored in textView
(detectTextFromImage()
function) and finally ‘Speak’ button will read it out loud. You will notice that we have set to text-to-speech language as `English
`. Now we will implement those functions and some other helper functions.
After onCreate
function, add following 4 functions.
First two functions will take a picture using device camera.
The Android way of delegating actions to other applications is to invoke an
Intent
that describes what you want done. This process involves three pieces: TheIntent
itself, a call to start the externalActivity
, and some code to handle the image data when focus returns to your activity.
If the simple feat of taking a photo is not the culmination of your app’s ambition, then you probably want to get the image back from the camera application and do something with it.
The Android Camera application encodes the photo in the return
Intent
delivered toonActivityResult()
as a smallBitmap
in the extras, under the key"data"
. The following code retrieves this image and displays it in anImageView
.
Last two functions are used to detect text from the image we have now as a Bitmap. Here we create a FirebaseVisionImage
object from theBitmap
and using FirebaseVisionTextDetector
to get the text in it.
If the text recognition operation succeeds, a FirebaseVisionText
object will be passed to the success listener. A FirebaseVisionText
object contains the full text recognized in the image and zero or more TextBlock
objects.
Each TextBlock
represents a rectangular block of text, which contains zero or more Line
objects. Each Line
object contains zero or more Element
objects, which represent words and word-like entities (dates, numbers, and so on).
For each TextBlock
, Line
, and Element
object, you can get the text recognized in the region and the bounding coordinates of the region.
That’s it. I hope you learned something new!. 😎
Edit: I made an automated text recognition app for blind people based on this. Feel free to go through the code and see what I did. I used an old version of CameraX library and Firebase with some basic UI automation to do this. [APKs]