Skip to main content

Extract text from an image

Extract Text transformation extracts the text present in the image. User can specify only to just detect text (or) extract text.

This transformation supports png, jpeg, jpg, webp, cr2, nef, rw2, dng, orf, raw, heic, heif, avif, tiff and tif type of files.

You can use the extract text option to:

  • Detect text only: It only returns the texts' top-left coordinates and dimensions.
  • Detect and get the text (default): It will also return the detected text in addition to coordinates and dimensions.

Instead of getting the output in a traditional CDN URL, you will get a JSON output from the Context API while the CDN URL will return the original image without any modifications.


Detect text (detect_only)

Detect Text detects all the texts in an image and returns their postions.

The default value is false.

The JSON keys are explained in the below table: different properties will be present based on the parameter value

dataIt is an array that consists of multiple text's bbox, text and confidence values.
data[].bboxIt indicates the text's coordinates (top, left) and dimensions (height, width) in pixels.
data[].textIt is the actual text at the given coordinates (bbox).
data[].confidenceIt indicates the confidence level of the text in the image (out of 1).
textIt shows all the texts in an image.
bboxesIt is a list that shows all the detected text's coordinatess (top, left) and dimensions (height, width) in pixels.

The Context API Response (default) tab consists of a JSON response that has the coordinates and dimensions in the bbox property, which is at a distance of 41 px from the top, 57 px from the left, 188 px in width, and 84 px in height. The text mentioned in the image is "OPEN" and the confidence level of the text detection is 0.825 out of 1.

The Context API Response (detect_only=true) tab contains a JSON response that is obtained when the detect_only parameter is selected. It will only return the coordinates and dimensions.

Example with multiple texts in a single image

The Context API Response tab consists of a JSON response that has multiple texts in the bbox, and the detected texts are illustrated in the below image in yellow boxes. The overall and individual text(s) detected in the image are highlighted in the JSON response.

Transform and enhance your images using our powerful AI technology. Organize your images in more efficient manner and our extensible APIs enables seamless integration with your system unleashing the power of our platform. Join the large community of users who use PixelBin to transform their image libraries and achieve excellent performance

Is this page useful?