Vision
Object Detection
Detect, locate, and label objects in images and GUI screenshots.
POST/v1/vision/object-detection
Detect objects in real-world images or identify GUI elements in screenshots. Returns bounding boxes, confidences, and optional segmentation masks.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
url | string | conditional | URL of the source image. One of url or file_store_key is required. |
file_store_key | string | conditional | Key of a previously uploaded file. |
prompts | string[] | no | Target detection prompts. Each 1–150 chars. |
features | string[] | no | Any of "object_detection", "gui". Defaults to both. |
annotated_image | boolean | no | Return an annotated image. Defaults to false. |
return_type | "url" | "base64" | no | Format for annotated image. Defaults to url. |
return_masks | boolean | no | Return binary segmentation masks for detected objects. |
Request
curl https://api.marob.ai/v1/vision/object-detection \
-H "Authorization: Bearer $MAROB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://marob.ai/samples/warehouse.jpg",
"prompts": ["forklift", "worker", "hard_hat"],
"annotated_image": true
}'Response
{
"success": true,
"_usage": {
"input_tokens": 35,
"output_tokens": 210,
"inference_time_tokens": 980,
"total_tokens": 1225
},
"log_id": "log_01JABC...",
"annotated_image": "https://cdn.marob.ai/annotated/ab12.png",
"objects": [
{
"label": "forklift",
"bounds": { "x": 120, "y": 240, "width": 380, "height": 280 },
"confidence": 0.94,
"mask": null
}
],
"gui_elements": [],
"tags": ["warehouse", "industrial"]
}