ObjectReplace¶
- class agentlego.tools.ObjectReplace(sam_model='sam_vit_h_4b8939.pth', grounding_model='glip_atss_swin-t_a_fpn_dyhead_pretrain_obj365', device='cuda', toolmeta=None)[源代码]
A tool to replace the certain objects in the image.
- 参数:
sam_model (str) – The model name used to inference. Which can be found in the
segment_anythingrepository. Defaults tosam_vit_h_4b8939.pth.grounding_model (str) – The model name used to inference. Which can be found in the
MMdetectionrepository. Defaults toglip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.device (str) – The device to load the model. Defaults to ‘cuda’.
toolmeta (None | dict | ToolMeta) – The additional info of the tool. Defaults to None.
默认工具信息¶
名称: ObjectReplace
描述: This tool can replace the specified object in the input image with another object, like replacing a cat in an image with a dog.
输入:
image (ImageIO): The image to edit.
text1 (str): The object to be replaced.
text2 (str): The object to replace with.
输出:
ImageIO
Example¶
Use the tool directly (without agent)
from agentlego.apis import load_tool
# load tool
tool = load_tool('ObjectReplace', device='cuda')
# apply tool
image = tool('examples/demo.png', 'cat', 'a white dog')
print(image)
With Lagent
from lagent import ReAct, GPTAPI, ActionExecutor
from agentlego.apis import load_tool
# load tools and build agent
# please set `OPENAI_API_KEY` in your environment variable.
tool = load_tool('ObjectReplace', device='cuda').to_lagent()
agent = ReAct(GPTAPI(temperature=0.), action_executor=ActionExecutor([tool]))
# agent running with the tool.
img_path = 'examples/demo.png'
ret = agent.chat(f'According to the image `{img_path}`, replace the cat with a white dog in the image.')
for step in ret.inner_steps[1:]:
print('------')
print(step['content'])
Set up¶
Before using this tool, please confirm you have installed the related dependencies by the below commands.
pip install -U diffusers
pip install -U segment_anything
pip install -U openmim
mim install -U mmdet
Reference¶
This tool uses SAM, Stable Diffusion and GLIP in default settings. See the following papers for details.
@article{kirillov2023segany,
title={Segment Anything},
author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
journal={arXiv:2304.02643},
year={2023}
}
@InProceedings{Rombach_2022_CVPR,
author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
title = {High-Resolution Image Synthesis With Latent Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {10684-10695}
}
@article{zhang2022glipv2,
title={GLIPv2: Unifying Localization and Vision-Language Understanding},
author={Zhang, Haotian* and Zhang, Pengchuan* and Hu, Xiaowei and Chen, Yen-Chun and Li, Liunian Harold and Dai, Xiyang and Wang, Lijuan and Yuan, Lu and Hwang, Jenq-Neng and Gao, Jianfeng},
journal={arXiv preprint arXiv:2206.05836},
year={2022}
}