Extract text from an image
Extract Text transformation allows users to precisely extract the desired text from an image.
You can use the extract text option to:
- Detect text only: It only returns the texts' top-left coordinates and dimensions.
- Detect and get the text (default): It will also return the detected text in addition to coordinates and dimensions.
Instead of getting the output in a traditional CDN URL, you will get a JSON output from the Context API while the CDN URL will return the original image without any modifications.
Params
Detect text (detect_only
)
Detect Text
detects all the texts in an image and returns their postions.
The default value is false
.
- Input Image
- Context API Response (default)
- Context API Response (detect_only=true)
- URL
- React
<PixelBinImage url="https://cdn.pixelbin.io/v2/dummy-cloudname/FIDrmb/original/images/transformation/ocr_text_detect.jpeg" />
{
"output": {
"data": [
{
"bbox": {
"top": 41,
"left": 57,
"width": 188,
"height": 84
},
"text": "OPEN",
"confidence": 0.825
}
],
"text": "OPEN"
}
}
{
"output": {
"bboxes": [
{
"top": 41,
"left": 57,
"width": 188,
"height": 84
}
]
}
}
The JSON keys are explained in the below table: different properties will be present based on the parameter value
Property | Description |
---|---|
data | It is an array that consists of multiple text's bbox , text and confidence values. |
data[].bbox | It indicates the text's coordinates (top, left) and dimensions (height, width) in pixels. |
data[].text | It is the actual text at the given coordinates (bbox ). |
data[].confidence | It indicates the confidence level of the text in the image (out of 1). |
text | It shows all the texts in an image. |
bboxes | It is a list that shows all the detected text's coordinatess (top, left) and dimensions (height, width) in pixels. |
The Context API Response (default) tab consists of a JSON response that has the coordinates and dimensions in the bbox
property, which is at a distance of 41 px from the top, 57 px from the left, 188 px in width, and 84 px in height. The text
mentioned in the image is "OPEN" and the confidence
level of the text detection is 0.825 out of 1.
The Context API Response (detect_only=true) tab contains a JSON response that is obtained when the detect_only
parameter is selected. It will only return the coordinates and dimensions.
Example with multiple texts in a single image
- Input Image
- Context API Response (default)
- Context API Response (detect_only=true)
- URL
- React
<PixelBinImage url="https://cdn.pixelbin.io/v2/dummy-cloudname/FIDrmb/original/images/transformation/multiple-extract-text.png" />
{
"output": {
"data": [
{
"bbox": {
"top": 221,
"left": 463,
"width": 364,
"height": 134
},
"text": "KEEP",
"confidence": 1
},
{
"bbox": {
"top": 377,
"left": 425,
"width": 433,
"height": 144
},
"text": "CALM",
"confidence": 1
},
{
"bbox": {
"top": 537,
"left": 563,
"width": 158,
"height": 66
},
"text": "AND",
"confidence": 1
},
{
"bbox": {
"top": 621,
"left": 449,
"width": 392,
"height": 134
},
"text": "LOVE",
"confidence": 1
},
{
"bbox": {
"top": 783,
"left": 189,
"width": 490,
"height": 138
},
"text": "BLACK",
"confidence": 1
},
{
"bbox": {
"top": 787,
"left": 715,
"width": 386,
"height": 134
},
"text": "CATS",
"confidence": 1
}
],
"text": "KEEP CALM AND LOVE BLACK CATS"
}
}
{
"output": {
"bboxes": [
{
"top": 221,
"left": 463,
"width": 364,
"height": 134
},
{
"top": 377,
"left": 425,
"width": 433,
"height": 144
},
{
"top": 537,
"left": 563,
"width": 158,
"height": 66
},
{
"top": 621,
"left": 449,
"width": 392,
"height": 134
},
{
"top": 783,
"left": 189,
"width": 490,
"height": 138
},
{
"top": 787,
"left": 715,
"width": 386,
"height": 134
},
],
}
}
The Context API Response tab consists of a JSON response that has multiple texts in the bbox
, and the detected texts are illustrated in the below image in yellow boxes. The overall and individual text(s) detected in the image are highlighted in the JSON response.