Shortcuts

ObjectDetection

class agentlego.tools.ObjectDetection(model='rtmdet_l_8xb32-300e_coco', device='cuda', toolmeta=None)[source]

A tool to detection all objects defined in COCO 80 classes.

Parameters:
  • model (str) – The model name used to detect texts. Which can be found in the MMDetection repository. Defaults to rtmdet_l_8xb32-300e_coco.

  • device (str) – The device to load the model. Defaults to ‘cuda’.

  • toolmeta (None | dict | ToolMeta) – The additional info of the tool. Defaults to None.

Default Tool Meta

  • name: ObjectDetection

  • description: The tool can detect all common objects in the picture.

  • inputs:

    • image (ImageIO)

  • outputs:

    • str: All detected objects, include object name, bbox in (x1, y1, x2, y2) format, and detection score.

Examples

Download the demo resource

wget http://download.openmmlab.com/agentlego/road.jpg

Use the tool directly (without agent)

from agentlego.apis import load_tool

# load tool
tool = load_tool('ObjectDetection', device='cuda')

# apply tool
visualization = tool('road.jpg')

With Lagent

from lagent import ReAct, GPTAPI, ActionExecutor
from agentlego.apis import load_tool

# load tools and build agent
# please set `OPENAI_API_KEY` in your environment variable.
tool = load_tool('ObjectDetection', device='cuda').to_lagent()
agent = ReAct(GPTAPI(temperature=0.), action_executor=ActionExecutor([tool]))

# agent running with the tool.
ret = agent.chat(f'Please detect all objects in the image `road.jpg`.')
for step in ret.inner_steps[1:]:
    print('------')
    print(step['content'])

Set up

Before using the tool, please confirm you have installed the related dependencies by the below commands.

pip install openmim
mim install mmdet

Reference

This tool uses a RTMDet model by default. See the following paper for details.

@misc{lyu2022rtmdet,
      title={RTMDet: An Empirical Study of Designing Real-Time Object Detectors},
      author={Chengqi Lyu and Wenwei Zhang and Haian Huang and Yue Zhou and Yudong Wang and Yanyi Liu and Shilong Zhang and Kai Chen},
      year={2022},
      eprint={2212.07784},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}