(Originally published elsewhere in 2018), AWS Rekognition Text detection limited to 50 words.
AWS Rekognition Text detection limited to 50 words
I built an API which uses Rekognition on AWS quickly, just to test the text detection capability.
Approach
I create a Flask API and utilised boto.
rek = boto3.client('rekognition', region_name="us-east-1")
After that I took the image bytes directly and ran a detect_text call, not too tough.
Detect text result
I uploaded an image with a small number of words and was pleased with the result. However when uploading an image containing a paragraph I found that only a subset of the words were returned.
The limit is 50 words -
“DetectText can detect up to 50 words in an image.”[0]
Text result response
The response splits up items by Type, either “line” or “word”, and has a parentID when a word, so I filtered just the lines like this:
if label['Type']=="LINE"
It works, great result, but a solution for a larger number of words makes me think of just running this through Tesseract OCR.
[0] - https://docs.aws.amazon.com/rekognition/latest/dg/text-detection.html (Last paragraph)
Originally published elsewhere on 08 July, 2019.