TextToBbox¶
- class agentlego.tools.TextToBbox(model='glip_atss_swin-t_b_fpn_dyhead_pretrain_obj365', device='cuda', toolmeta=None)[source]
A tool to detection the given object.
- Parameters:
model (str) – The model name used to detect texts. Which can be found in the
MMDetectionrepository. Defaults toglip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.device (str) – The device to load the model. Defaults to ‘cpu’.
toolmeta (None | dict | ToolMeta) – The additional info of the tool. Defaults to None.
Default Tool Meta¶
name: TextToBbox
description: The tool can detect the object location according to description.
inputs:
image (ImageIO)
text (str): The object description in English.
top1 (bool): If true, return the object with highest score. If false, return all detected objects.
outputs:
str: Detected objects, include bbox in (x1, y1, x2, y2) format, and detection score.
Examples¶
Download the demo resource
wget http://download.openmmlab.com/agentlego/road.jpg
Use the tool directly (without agent)
from agentlego.apis import load_tool
# load tool
tool = load_tool('TextToBbox', device='cuda')
# apply tool
visualization, result = tool('road.jpg', 'The largest white truck')
With Lagent
from lagent import ReAct, GPTAPI, ActionExecutor
from agentlego.apis import load_tool
# load tools and build agent
# please set `OPENAI_API_KEY` in your environment variable.
tool = load_tool('TextToBbox', device='cuda').to_lagent()
agent = ReAct(GPTAPI(temperature=0.), action_executor=ActionExecutor([tool]))
# agent running with the tool.
ret = agent.chat(f'Please detect the largest white truck in the image `road.jpg`.')
for step in ret.inner_steps[1:]:
print('------')
print(step['content'])
Set up¶
Before using the tool, please confirm you have installed the related dependencies by the below commands.
pip install openmim
mim install mmdet
Reference¶
This tool uses a GLIP model. See the following paper for details.
@inproceedings{li2021grounded,
title={Grounded Language-Image Pre-training},
author={Liunian Harold Li* and Pengchuan Zhang* and Haotian Zhang* and Jianwei Yang and Chunyuan Li and Yiwu Zhong and Lijuan Wang and Lu Yuan and Lei Zhang and Jenq-Neng Hwang and Kai-Wei Chang and Jianfeng Gao},
year={2022},
booktitle={CVPR},
}