Shortcuts

OCR

class agentlego.tools.OCR(lang='en', line_group_tolerance=-1, device=True, toolmeta=None, **read_args)[source]

A tool to recognize the optical characters on an image.

Parameters:
  • lang (str | Sequence[str]) – The language to be recognized. Defaults to ‘en’.

  • line_group_tolerance (int) – The line group tolerance threshold. Defaults to -1, which means to disable the line group method.

  • device (str | bool) – The device to load the model. Defaults to True, which means automatically select device.

  • **read_args – Other keyword arguments for read text. Please check the EasyOCR docs.

  • toolmeta (None | dict | ToolMeta) – The additional info of the tool. Defaults to None.

Default Tool Meta

  • name: OCR

  • description: This tool can recognize all text on the input image.

  • inputs:

    • image (ImageIO)

  • outputs:

    • str: OCR results, include bbox in x1, y1, x2, y2 format and the recognized text.

Examples

Download the demo resource

wget https://raw.githubusercontent.com/open-mmlab/mmocr/main/demo/demo_kie.jpeg

Use the tool directly (without agent)

from agentlego.apis import load_tool

# load tool
tool = load_tool('OCR', device='cuda', lang='en', x_ths=3., line_group_tolerance=30)

# apply tool
res = tool('demo_kie.jpeg')

For bilingual Chinese and English OCR, lang may be ['en', 'ch_sim'], here is all supported language code name.

With Lagent

from lagent import ReAct, GPTAPI, ActionExecutor
from agentlego.apis import load_tool

# load tools and build agent
# please set `OPENAI_API_KEY` in your environment variable.
tool = load_tool('OCR', device='cuda').to_lagent()
agent = ReAct(GPTAPI(temperature=0.), action_executor=ActionExecutor([tool]))

# agent running with the tool.
ret = agent.chat(f'Here is a receipt image `demo_kie.jpeg`, please tell me the total cost.')
for step in ret.inner_steps[1:]:
    print('------')
    print(step['content'])

Set up

Before using the tool, please confirm you have installed the related dependencies by the below commands.

pip install easyocr

Reference

The default implementation of OCR tool uses EasyOCR.