VisionAgent

OWLv2 Image

A tool that can detect and count multiple objects given a text prompt such as category names or referring expressions on images. The categories in text prompt are separated by commas. It returns a list of bounding boxes with normalized coordinates, label names and associated probability scores.

Output

python
import requests

url = "https://api.landing.ai/v1/tools/text-to-object-detection"
files = {
  "image": open("{{path_to_image}}", "rb")
}

data = {
  "prompts": [ "{{prompt1}}", "{{prompt2}}" ],
  "model": "owlv2"
}
response = requests.post(url, files=files, data=data)

print(response.json())