Google Cloud Storage image location, or publicly-accessible image URL. If both content and source are provided for an image, content takes precedence and is used to perform the image annotation request.

A publicly-accessible image HTTP/HTTPS URL. When fetching images from HTTP/HTTPS URLs, Google cannot guarantee that the request will be completed. Your request may fail if the specified host denies the request (e.g. due to request throttling or DOS prevention), or if Google throttles requests to the site for abuse prevention. You should not depend on externally-hosted images for production applications.

When both gcsImageUri and imageUri are specified, imageUri takes precedence.

The bounding polygon around the face. The coordinates of the bounding box are in the original image's scale, as returned in ImageParams. The bounding box is computed to "frame" the face in accordance with human expectations. It is based on the landmarker results. Note that one or more x and/or y coordinates may not be generated in the BoundingPoly (the polygon will be unbounded) if only a partial face appears in the image to be annotated.

The fdBoundingPoly bounding polygon is tighter than the boundingPoly, and encloses only the skin part of the face. Typically, it is used to eliminate the face from any image analysis that detects the "amount of skin" visible in an image. It is not based on the landmarker results, only on the initial face detection, hence the

Zero coordinate values

When the API detects a coordinate ("x" or "y") value of 0, that coordinate is omitted in the
JSON response. For example, a response could take the following form: [{},{"x": 28},
{"x": 28,"y": 43},{"y": 43}]. This response shows all three representation possibilities:

{} - an empty object when both "x":0 and "y":0.

{"x": 28} and {"y": 43} - an object with a single key-value
pair when one coordinate is 0 but the other is a non-zero value.

{"x": 28,"y": 43} - an object with both key-value pairs when both
coordinates have a non-zero value.

NormalizedVertex

A vertex represents a 2D point in the image. NOTE: the normalized vertex coordinates are relative to the original image and range from 0 to 1.

JSON representation

{"x": number,"y": number}

Fields

x

number

X coordinate.

y

number

Y coordinate.

Zero coordinate values

The general format for bounding polys in the
JSON response when detected is an array of 4 vertex objects:

When the API detects a coordinate ("x" or "y") value of 0.0, that coordinate is omitted in the
JSON response. For example, a response could take the following form: [{},{"x": 0.028},
{"x": 0.028,"y": 0.043},{"y": 0.043}]. This response shows all three representation possibilities:

{} - an empty object when both "x":0.0 and "y":0.0.

{"x": 0.028} and {"y": 0.043} - an object with a single key-value
pair when one coordinate is 0.0 but the other is a non-zero value.

{"x": 0.028,"y": 0.043} - an object with both key-value pairs when both
coordinates have a non-zero value.

Type

Face landmark (feature) type. Left and right are defined from the vantage of the viewer of the image without considering mirror projections typical of photos. So, LEFT_EYE, typically, is the person's right eye.

Enums

UNKNOWN_LANDMARK

Unknown face landmark detected. Should not be filled.

LEFT_EYE

Left eye.

RIGHT_EYE

Right eye.

LEFT_OF_LEFT_EYEBROW

Left of left eyebrow.

RIGHT_OF_LEFT_EYEBROW

Right of left eyebrow.

LEFT_OF_RIGHT_EYEBROW

Left of right eyebrow.

RIGHT_OF_RIGHT_EYEBROW

Right of right eyebrow.

MIDPOINT_BETWEEN_EYES

Midpoint between eyes.

NOSE_TIP

Nose tip.

UPPER_LIP

Upper lip.

LOWER_LIP

Lower lip.

MOUTH_LEFT

Mouth left.

MOUTH_RIGHT

Mouth right.

MOUTH_CENTER

Mouth center.

NOSE_BOTTOM_RIGHT

Nose, bottom right.

NOSE_BOTTOM_LEFT

Nose, bottom left.

NOSE_BOTTOM_CENTER

Nose, bottom center.

LEFT_EYE_TOP_BOUNDARY

Left eye, top boundary.

LEFT_EYE_RIGHT_CORNER

Left eye, right corner.

LEFT_EYE_BOTTOM_BOUNDARY

Left eye, bottom boundary.

LEFT_EYE_LEFT_CORNER

Left eye, left corner.

RIGHT_EYE_TOP_BOUNDARY

Right eye, top boundary.

RIGHT_EYE_RIGHT_CORNER

Right eye, right corner.

RIGHT_EYE_BOTTOM_BOUNDARY

Right eye, bottom boundary.

RIGHT_EYE_LEFT_CORNER

Right eye, left corner.

LEFT_EYEBROW_UPPER_MIDPOINT

Left eyebrow, upper midpoint.

RIGHT_EYEBROW_UPPER_MIDPOINT

Right eyebrow, upper midpoint.

LEFT_EAR_TRAGION

Left ear tragion.

RIGHT_EAR_TRAGION

Right ear tragion.

LEFT_EYE_PUPIL

Left eye pupil.

RIGHT_EYE_PUPIL

Right eye pupil.

FOREHEAD_GLABELLA

Forehead glabella.

CHIN_GNATHION

Chin gnathion.

CHIN_LEFT_GONION

Chin left gonion.

CHIN_RIGHT_GONION

Chin right gonion.

Position

A 3D position in the image, used primarily for Face detection landmarks. A valid Position must have both x and y coordinates. The position coordinates are in the same scale as the original image.

JSON representation

{"x": number,"y": number,"z": number}

Fields

x

number

X coordinate.

y

number

Y coordinate.

z

number

Z coordinate (or depth).

Likelihood

A bucketized representation of likelihood, which is intended to give clients highly stable results across model upgrades.

The language code for the locale in which the entity textual description is expressed.

description

string

Entity textual description, expressed in its locale language.

score

number

Overall score of the result. Range [0, 1].

confidence(deprecated)

number

This item is deprecated!

Deprecated. Use score instead. The accuracy of the entity detection in an image. For example, for an image in which the "Eiffel Tower" entity is detected, this field represents the confidence that there is a tower in the query image. Range [0, 1].

topicality

number

The relevancy of the ICA (Image Content Annotation) label to the image. For example, the relevancy of "tower" is likely higher to an image containing the detected "Eiffel Tower" than to an image containing a detected distant towering building, even though the confidence that there is a tower in each image may be the same. Range [0, 1].

The location information for the detected entity. Multiple LocationInfo elements can be present because one location may indicate the location of the scene in the image, and another location may indicate the location of the place where the image was taken. Location information is usually present for landmarks.

The bounding box for the block. The vertices are in the order of top-left, top-right, bottom-right, bottom-left. When a rotation of the bounding box is detected the rotation is represented as around the top-left corner as defined when the text is read in the 'natural' orientation. For example:

The bounding box for the paragraph. The vertices are in the order of top-left, top-right, bottom-right, bottom-left. When a rotation of the bounding box is detected the rotation is represented as around the top-left corner as defined when the text is read in the 'natural' orientation. For example: * when the text is horizontal it might look like: 0----1 | | 3----2 * when it's rotated 180 degrees around the top-left corner it becomes: 2----3 | | 1----0 and the vertice order will still be (0, 1, 2, 3).

The bounding box for the word. The vertices are in the order of top-left, top-right, bottom-right, bottom-left. When a rotation of the bounding box is detected the rotation is represented as around the top-left corner as defined when the text is read in the 'natural' orientation. For example: * when the text is horizontal it might look like: 0----1 | | 3----2 * when it's rotated 180 degrees around the top-left corner it becomes: 2----3 | | 1----0 and the vertice order will still be (0, 1, 2, 3).

The bounding box for the symbol. The vertices are in the order of top-left, top-right, bottom-right, bottom-left. When a rotation of the bounding box is detected the rotation is represented as around the top-left corner as defined when the text is read in the 'natural' orientation. For example: * when the text is horizontal it might look like: 0----1 | | 3----2 * when it's rotated 180 degrees around the top-left corner it becomes: 2----3 | | 1----0 and the vertice order will still be (0, 1, 2, 3).

text

string

The actual UTF-8 representation of the symbol.

confidence

number

Confidence of the OCR results for the symbol. Range [0, 1].

BlockType

Type of a block (text, image etc) as identified by OCR.

Enums

UNKNOWN

Unknown block type.

TEXT

Regular text block.

TABLE

Table block.

PICTURE

Image block.

RULER

Horizontal/vertical line box.

BARCODE

Barcode block.

SafeSearchAnnotation

Set of features pertaining to the image, computed by computer vision methods over safe-search verticals (for example, adult, spoof, medical, violence).

The fraction of pixels the color occupies in the image. Value in range [0, 1].

Color

Represents a color in the RGBA color space. This representation is designed for simplicity of conversion to/from color representations in various languages over compactness; for example, the fields of this representation can be trivially provided to the constructor of "java.awt.Color" in Java; it can also be trivially provided to UIColor's "+colorWithRed:green:blue:alpha" method in iOS; and, with just a little work, it can be easily formatted into a CSS "rgba()" string in JavaScript, as well. Here are some examples:

This means that a value of 1.0 corresponds to a solid color, whereas a value of 0.0 corresponds to a completely transparent color. This uses a wrapper message rather than a simple float scalar so that it is possible to distinguish between a default value and the value being unset. If omitted, this color object is to be rendered as a solid color (as if the alpha value had been explicitly given with a value of 1.0).

CropHintsAnnotation

Set of crop hints that are used to generate new crops when serving images.