Skip to main content

Extract text from an image

Extract Text transformation allows users to precisely extract the desired text from an image.

You can use the extract text option to:

  • Detect text only: It only returns the texts' top-left coordinates and dimensions.
  • Detect and get the text (default): It will also return the detected text in addition to coordinates and dimensions.
note

Instead of getting the output in a traditional CDN URL, you will get a JSON output from the Context API while the CDN URL will return the original image without any modifications.


Params

Detect text (detect_only)

Detect Text detects all the texts in an image and returns their postions.

The default value is false.

The JSON keys are explained in the below table: different properties will be present based on the parameter value

PropertyDescription
dataIt is an array that consists of multiple text's bbox, text and confidence values.
data[].bboxIt indicates the text's coordinates (top, left) and dimensions (height, width) in pixels.
data[].textIt is the actual text at the given coordinates (bbox).
data[].confidenceIt indicates the confidence level of the text in the image (out of 1).
textIt shows all the texts in an image.
bboxesIt is a list that shows all the detected text's coordinatess (top, left) and dimensions (height, width) in pixels.

The Context API Response (default) tab consists of a JSON response that has the coordinates and dimensions in the bbox property, which is at a distance of 41 px from the top, 57 px from the left, 188 px in width, and 84 px in height. The text mentioned in the image is "OPEN" and the confidence level of the text detection is 0.825 out of 1.

The Context API Response (detect_only=true) tab contains a JSON response that is obtained when the detect_only parameter is selected. It will only return the coordinates and dimensions.


Example with multiple texts in a single image

The Context API Response tab consists of a JSON response that has multiple texts in the bbox, and the detected texts are illustrated in the below image in yellow boxes. The overall and individual text(s) detected in the image are highlighted in the JSON response.

Is this page useful?