Landing AI
Florence2 Phrase Grounding
A tool that can detect multiple objects given a text prompt which can be object names or caption. You can optionally separate the object names in the text with commas. It returns a list of bounding boxes with normalized coordinates, label names and associated probability scores of 1.0.
Countgd Counting
A tool that can precisely count multiple instances of an object given a text prompt. It returns a list of bounding boxes with normalized coordinates, label names and associated confidence scores.
IXC25 Image VQA
A a tool that can answer any questions about arbitrary images including regular images or images of documents or presentations. It returns text as an answer to the question.
OWLv2 Image
A tool that can detect and count multiple objects given a text prompt such as category names or referring expressions on images. The categories in text prompt are separated by commas. It returns a list of bounding boxes with normalized coordinates, label names and associated probability scores.
IXC25 Temporal Localization
Temporally segment a video given a prompt that can be either an object or a phrase. It returns a list of boolean values indicating whether the object or phrase is present in the corresponding frame.
Loca Visual Prompt Counting
A tool that counts the dominant foreground object given an image and a visual prompt which is a bounding box describing the object. It returns only the count of the objects in the image.
Loca Zero Shot Counting
a tool that counts the dominant foreground object given an image and no other information about the content. It returns only the count of the objects in the image.
Florence-2 Roberta Vqa
A tool that takes an image and analyzes its contents, generates detailed captions and then tries to answer the given question using the generated context. It returns text as an answer to the question.
Depth Anything V2
A tool that runs depth_anythingv2 model to generate a depth image from a given RGB image. The returned depth image is monochrome and represents depth values as pixel intesities with pixel values ranging from 0 to 255
ViT NSFW Classification
A tool that can classify an image as 'nsfw' or 'normal'. It returns the predicted label and their probability scores based on image content.
Florence-2 Image Caption
A tool that can caption or describe an image based on its contents. It returns a text describing the image.
Florence-2 Sam2 Image
A tool that can segment multiple objects given a text prompt such as category names or referring expressions. The categories in the text prompt are separated by commas. It returns a list of bounding boxes, label names, mask file names and associated probability scores of 1.0.
Florence-2 Sam2 Video Tracking
A tool that can segment and track multiple entities in a video given a text prompt such as category names or referring expressions. You can optionally separate the categories in the text with commas. It only tracks entities present in the first frame and only returns segmentation masks. It is useful for tracking and counting without duplicating counts.