Extract text from images using keras-ocr in Python

This post explains how to extract text from images using keras-ocr. keras-ocr provides an end-to-end training pipeline to build new OCR models.

See also: How to convert PDF file to image using Python

See also: Extract text from images using pytesseract

Extracting text with keras-ocr

Let's build keras-ocr pipeline to extract text from below two images.

1. Install keras-ocr
   
pip install keras-ocr
     
2. Import keras-ocr and download pretrained weights for the detector and recognizer
  
import keras_ocr 
pipeline = keras_ocr.pipeline.Pipeline()
     
3. Read images from urls to image object
   
images = [
    keras_ocr.tools.read(url) for url in [
        'https://storage.googleapis.com/gcptutorials.com/examples/keras-ocr-img-1.jpg',        
        'https://storage.googleapis.com/gcptutorials.com/examples/keras-ocr-img-2.png'
    ]
]
     

4. Check image objects for images
   
print(images[0])
print(images[1])
     
5. Run the pipeline recognizer on images
   
prediction_groups = pipeline.recognize(images)
     
6. Extract text from First Image
   
predicted_image_1 = prediction_groups[0]
for text, box in predicted_image_1:
    print(text)    
     
7. Extract text form Second Image
   
predicted_image_2 = prediction_groups[1]
for text, box in predicted_image_2:
    print(text)
     

8. Complete code snippet to extract text with keras-ocr in Python
   
import keras_ocr 

pipeline = keras_ocr.pipeline.Pipeline()

images = [
    keras_ocr.tools.read(url) for url in [
        'https://storage.googleapis.com/gcptutorials.com/examples/keras-ocr-img-1.jpg',        
        'https://storage.googleapis.com/gcptutorials.com/examples/keras-ocr-img-2.png'
    ]
]

print(images[0])
print(images[1])

prediction_groups = pipeline.recognize(images)

predicted_image_1 = prediction_groups[0]
for text, box in predicted_image_1:
    print(text)

predicted_image_2 = prediction_groups[1]
for text, box in predicted_image_2:
    print(text)
    
     

Category: TensorFlow