Florence2 Phrase Grounding
A tool that can detect multiple objects given a text prompt which can be object names or caption. You can optionally separate the object names in the text with commas. It returns a list of bounding boxes with normalized coordinates, label names and associated probability scores of 1.0.
Output
python
import requests
import base64
url = "https://api.landing.ai/v1/tools/florence2"
with open("{{path_to_image}}", "rb") as image_file:
base64_string = base64.b64encode(image_file.read()).decode('utf-8')
payload = {
"image": base64_string,
"prompt": "{{prompt}}",
"task": "<CAPTION_TO_PHRASE_GROUNDING>"
}
headers = {
"Content-Type": "application/json",
"Accept": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
print(response.json())