VisionAgent

Florence2 Phrase Grounding

A tool that can detect multiple objects given a text prompt which can be object names or caption. You can optionally separate the object names in the text with commas. It returns a list of bounding boxes with normalized coordinates, label names and associated probability scores of 1.0.

Output

python
import requests
import base64

url = "https://api.landing.ai/v1/tools/florence2"

with open("{{path_to_image}}", "rb") as image_file:
  base64_string = base64.b64encode(image_file.read()).decode('utf-8')

payload = {
  "image": base64_string,
  "prompt": "{{prompt}}",
  "task": "<CAPTION_TO_PHRASE_GROUNDING>"
}
headers = {
  "Content-Type": "application/json",
  "Accept": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.json())