OCR¶
- class agentlego.tools.OCR(lang='en', line_group_tolerance=-1, device=True, toolmeta=None, **read_args)[source]
A tool to recognize the optical characters on an image.
- Parameters:
lang (str | Sequence[str]) – The language to be recognized. Defaults to ‘en’.
line_group_tolerance (int) – The line group tolerance threshold. Defaults to -1, which means to disable the line group method.
device (str | bool) – The device to load the model. Defaults to True, which means automatically select device.
**read_args – Other keyword arguments for read text. Please check the EasyOCR docs.
toolmeta (None | dict | ToolMeta) – The additional info of the tool. Defaults to None.
Default Tool Meta¶
name: OCR
description: This tool can recognize all text on the input image.
inputs:
image (ImageIO)
outputs:
str: OCR results, include bbox in x1, y1, x2, y2 format and the recognized text.
Examples¶
Download the demo resource
wget https://raw.githubusercontent.com/open-mmlab/mmocr/main/demo/demo_kie.jpeg
Use the tool directly (without agent)
from agentlego.apis import load_tool
# load tool
tool = load_tool('OCR', device='cuda', lang='en', x_ths=3., line_group_tolerance=30)
# apply tool
res = tool('demo_kie.jpeg')
For bilingual Chinese and English OCR, lang may be ['en', 'ch_sim'], here is all supported language code name.
With Lagent
from lagent import ReAct, GPTAPI, ActionExecutor
from agentlego.apis import load_tool
# load tools and build agent
# please set `OPENAI_API_KEY` in your environment variable.
tool = load_tool('OCR', device='cuda').to_lagent()
agent = ReAct(GPTAPI(temperature=0.), action_executor=ActionExecutor([tool]))
# agent running with the tool.
ret = agent.chat(f'Here is a receipt image `demo_kie.jpeg`, please tell me the total cost.')
for step in ret.inner_steps[1:]:
print('------')
print(step['content'])
Set up¶
Before using the tool, please confirm you have installed the related dependencies by the below commands.
pip install easyocr
Reference¶
The default implementation of OCR tool uses EasyOCR.