ImageDescription¶

class agentlego.tools.ImageDescription(model='blip-base_3rdparty_caption', device='cuda', toolmeta=None)[源代码]

A tool to describe an image.

参数:

model (str) – The model name used to inference. Which can be found in the MMPreTrain repository. Defaults to blip-base_3rdparty_caption.
device (str) – The device to load the model. Defaults to ‘cpu’.
toolmeta (None | dict | ToolMeta) – The additional info of the tool. Defaults to None.

默认工具信息¶

名称: ImageDescription
描述: A useful tool that returns a brief description of the input image.
输入:
- image (ImageIO)
输出:
- str

Examples¶

Use the tool directly (without agent)

from agentlego.apis import load_tool

# load tool
tool = load_tool('ImageDescription', device='cuda')

# apply tool
caption = tool('examples/demo.png')
print(caption)

With Lagent

from lagent import ReAct, GPTAPI, ActionExecutor
from agentlego.apis import load_tool

# load tools and build agent
# please set `OPENAI_API_KEY` in your environment variable.
tool = load_tool('ImageDescription', device='cuda').to_lagent()
agent = ReAct(GPTAPI(temperature=0.), action_executor=ActionExecutor([tool]))

# agent running with the tool.
img_path = 'examples/demo.png'
ret = agent.chat(f'Describe the image `{img_path}`.')
for step in ret.inner_steps[1:]:
    print('------')
    print(step['content'])

With Transformers Agent

from transformers import HfAgent
from agentlego.apis import load_tool
from PIL import Image

# load tools and build transformers agent
tool = load_tool('ImageDescription', device='cuda').to_transformers_agent()
agent = HfAgent('https://api-inference.huggingface.co/models/bigcode/starcoder', additional_tools=[tool])

# agent running with the tool (For demo, we directly specify the tool name here.)
caption = agent.run(f'Use the tool `{tool.name}` to describe the image.', image=Image.open('examples/demo.png'))
print(caption)

Set up¶

Before using the tool, please confirm you have installed the related dependencies by the below commands.

pip install -U openmim
mim install -U mmpretrain

Reference¶

This tool uses a BLIP model in default settings. See the following paper for details.

@inproceedings{li2022blip,
      title={BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation},
      author={Junnan Li and Dongxu Li and Caiming Xiong and Steven Hoi},
      year={2022},
      booktitle={ICML},
}