SegmentObject¶
- class agentlego.tools.SegmentObject(sam_model='sam_vit_h_4b8939.pth', grounding_model='glip_atss_swin-t_a_fpn_dyhead_pretrain_obj365', device='cuda', toolmeta=None)[source]
A tool to segment all objects on an image.
- Parameters:
sam_model (str) – The model name used to inference. Which can be found in the
segment_anythingrepository. Defaults tosam_vit_h_4b8939.pth.grounding_model (str) – The model name used to grounding. Which can be found in the
MMDetectionrepository. Defaults toglip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.device (str) – The device to load the model. Defaults to ‘cpu’.
toolmeta (None | dict | ToolMeta) – The additional info of the tool. Defaults to None.
Default Tool Meta¶
name: SegmentObject
description: This tool can segment the specified kind of objects in the input image, and return the segmentation result image.
inputs:
image (ImageIO)
text (str): The object to segment.
outputs:
ImageIO: The segmentation result image.
Examples¶
Download the demo resource
wget http://download.openmmlab.com/agentlego/cups.png
Use the tool directly (without agent)
from agentlego.apis import load_tool
# load tool
tool = load_tool('SegmentObject', device='cuda')
# apply tool
segmentation = tool('cups.png', 'water cup')
With Lagent
from lagent import ReAct, GPTAPI, ActionExecutor
from agentlego.apis import load_tool
# load tools and build agent
# please set `OPENAI_API_KEY` in your environment variable.
tool = load_tool('SegmentObject', device='cuda').to_lagent()
agent = ReAct(GPTAPI(temperature=0.), action_executor=ActionExecutor([tool]))
# agent running with the tool.
ret = agent.chat(f'Please segment all water cups in the image `cups.png`.')
for step in ret.inner_steps[1:]:
print('------')
print(step['content'])
Set up¶
Before using the tool, please confirm you have installed the related dependencies by the below commands.
pip install openmim, segment_anything
mim install mmdet
Reference¶
This tool uses a Segment Anything model and GLIP model. See the following papers for details.
@misc{kirillov2023segment,
title={Segment Anything},
author={Alexander Kirillov and Eric Mintun and Nikhila Ravi and Hanzi Mao and Chloe Rolland and Laura Gustafson and Tete Xiao and Spencer Whitehead and Alexander C. Berg and Wan-Yen Lo and Piotr Dollár and Ross Girshick},
year={2023},
eprint={2304.02643},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@inproceedings{li2021grounded,
title={Grounded Language-Image Pre-training},
author={Liunian Harold Li* and Pengchuan Zhang* and Haotian Zhang* and Jianwei Yang and Chunyuan Li and Yiwu Zhong and Lijuan Wang and Lu Yuan and Lei Zhang and Jenq-Neng Hwang and Kai-Wei Chang and Jianfeng Gao},
year={2022},
booktitle={CVPR},
}