TextToBbox¶

class agentlego.tools.TextToBbox(model='glip_atss_swin-t_b_fpn_dyhead_pretrain_obj365', device='cuda', toolmeta=None)[source]

A tool to detection the given object.

Parameters:

model (str) – The model name used to detect texts. Which can be found in the MMDetection repository. Defaults to glip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.
device (str) – The device to load the model. Defaults to ‘cpu’.
toolmeta (None | dict | ToolMeta) – The additional info of the tool. Defaults to None.

Default Tool Meta¶

name: TextToBbox
description: The tool can detect the object location according to description.
inputs:
- image (ImageIO)
- text (str): The object description in English.
- top1 (bool): If true, return the object with highest score. If false, return all detected objects.
outputs:
- str: Detected objects, include bbox in (x1, y1, x2, y2) format, and detection score.

Examples¶

Download the demo resource

wget http://download.openmmlab.com/agentlego/road.jpg

Use the tool directly (without agent)

from agentlego.apis import load_tool

# load tool
tool = load_tool('TextToBbox', device='cuda')

# apply tool
visualization, result = tool('road.jpg', 'The largest white truck')

With Lagent

from lagent import ReAct, GPTAPI, ActionExecutor
from agentlego.apis import load_tool

# load tools and build agent
# please set `OPENAI_API_KEY` in your environment variable.
tool = load_tool('TextToBbox', device='cuda').to_lagent()
agent = ReAct(GPTAPI(temperature=0.), action_executor=ActionExecutor([tool]))

# agent running with the tool.
ret = agent.chat(f'Please detect the largest white truck in the image `road.jpg`.')
for step in ret.inner_steps[1:]:
    print('------')
    print(step['content'])

Set up¶

Before using the tool, please confirm you have installed the related dependencies by the below commands.

pip install openmim
mim install mmdet

Reference¶

This tool uses a GLIP model. See the following paper for details.

@inproceedings{li2021grounded,
      title={Grounded Language-Image Pre-training},
      author={Liunian Harold Li* and Pengchuan Zhang* and Haotian Zhang* and Jianwei Yang and Chunyuan Li and Yiwu Zhong and Lijuan Wang and Lu Yuan and Lei Zhang and Jenq-Neng Hwang and Kai-Wei Chang and Jianfeng Gao},
      year={2022},
      booktitle={CVPR},
}