Marob AI
Vision

Object Detection

Detect, locate, and label objects in images and GUI screenshots.

POST/v1/vision/object-detection

Detect objects in real-world images or identify GUI elements in screenshots. Returns bounding boxes, confidences, and optional segmentation masks.

Parameters

NameTypeRequiredDescription
urlstringconditionalURL of the source image. One of url or file_store_key is required.
file_store_keystringconditionalKey of a previously uploaded file.
promptsstring[]noTarget detection prompts. Each 1–150 chars.
featuresstring[]noAny of "object_detection", "gui". Defaults to both.
annotated_imagebooleannoReturn an annotated image. Defaults to false.
return_type"url" | "base64"noFormat for annotated image. Defaults to url.
return_masksbooleannoReturn binary segmentation masks for detected objects.

Request

curl https://api.marob.ai/v1/vision/object-detection \
  -H "Authorization: Bearer $MAROB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://marob.ai/samples/warehouse.jpg",
    "prompts": ["forklift", "worker", "hard_hat"],
    "annotated_image": true
  }'

Response

{
  "success": true,
  "_usage": {
    "input_tokens": 35,
    "output_tokens": 210,
    "inference_time_tokens": 980,
    "total_tokens": 1225
  },
  "log_id": "log_01JABC...",
  "annotated_image": "https://cdn.marob.ai/annotated/ab12.png",
  "objects": [
    {
      "label": "forklift",
      "bounds": { "x": 120, "y": 240, "width": 380, "height": 280 },
      "confidence": 0.94,
      "mask": null
    }
  ],
  "gui_elements": [],
  "tags": ["warehouse", "industrial"]
}

On this page