Nuxeo Vision

The Nuxeo Vision addon provides a gateway to computer vision services. Currently it supports the Google Vision API but other services can be easily integrated as they become available. This gateway provides a wide range of features including shape recognition, auto-classification of images, OCR, face detection and more.

See https://cloud.google.com/vision/ for more information.

Installation and Configuration

Installation

This addon requires no specific installation steps. It can be installed like any other package with nuxeoctl command line or from the Update Center.

Google Account Configuration

Follow the instructions at https://cloud.google.com/vision/docs/getting-started.

Nuxeo Platform Configuration

Once you have a google service account credential file:

Upload the JSON credential file on your Nuxeo Instance, at the same location as the instance's nuxeo.conf file.

Edit nuxeo.conf and set the credential file path:

org.nuxeo.vision.google.credential=PATH_TO_JSON_CREDENTIAL_FILE

Functional Overview

By default, the Computer Vision Service is called every time the main binary file of a picture or video document is updated. Classification labels are stored in the Tags property and OCRed text in the dc:source property.

For videos, the platform uses the images of the storyboard.

Customization

Overriding the Default Behavior

The default behavior is defined in two automation chains which can be overridden with an XML contribution.

Once the addon is installed on your Nuxeo instance, import the VisionOp operation definition in your Studio project. See the instructions on the page Referencing an Externally Defined Operation.
Create your automation chains and use the operation inside them. You can use the regular automation chains or Automation Scripting.

Create an XML extension that specifies that your automation chains should be used.

<extension target="org.nuxeo.vision.core.service" point="configuration">
    <configuration>
        <pictureMapperChainName>MY_PICTURE_CHAIN</pictureMapperChainName>
        <videoMapperChainName>MY_VIDEO_CHAIN</videoMapperChainName>
    </configuration>
</extension>

Deploy your Studio customization.

Disabling the Default Behavior

The default behavior can also be completely disabled with the following contribution:

<extension target="org.nuxeo.ecm.core.event.EventServiceComponent" point="listener">
    <listener name="visionPictureConversionChangedListener" class="org.nuxeo.vision.core.listener.PictureConversionChangedListener" enabled="false"></listener>
    <listener name="visionVideoChangedListener" class="org.nuxeo.vision.core.listener.VideoStoryboardChangedListener" enabled="false"></listener>
</extension>

Core Implementation

In order to enable you to build your own custom logic, the addon provides an automation operation, called VisionOp. This operation takes a blob or list of blobs as input and calls the Computer Vision service for each one. The list of all the available features can be found at https://cloud.google.com/vision/reference/rest/v1/images/annotate#Type.

The result of the operation is stored in a context variable and is an object of type VisionResponse.

Here’s how the operation is used in the default chain:

function run(input, params) {
    var blob = Picture.GetView(input, {
        'viewName': 'Medium'
    });
    blob = VisionOp(blob, {
        features: ['LABEL_DETECTION'],
        maxResults: 5,
        outputVariable: 'annotations'
    });
    var annotations = ctx.annotations;
    if (annotations === undefined || annotations.length === 0) return;
    var textAndLabels = annotations[0];
    // build tag list
    var labels = textAndLabels.getClassificationLabels();
    if (labels !== undefined && labels !== null && labels.length > 0) {
        var tags = [];
        for (var i = 0; i < labels.length; i++) {
            tags.push(labels[i].getText().replace(/\s/g, '+'));
        }
        input = Services.TagDocument(input, {
            'tags': tags
        });
    }
    input = Document.Save(input, {});
    return input;
}

Google Vision API Limitations

The API has some known and documented best practices and limitations you should be aware of. For example (as of December 2016):

There is limitation to the size of the image you send to the API: "Image files sent to the Google Cloud Vision API should not exceed 4 MB". There also is a limitation when you send a list of images (max. 8MB). This is an important information to handle before requesting data. And this is why, if you look at the original chain, it actually takes the “Medium” conversion, which is a JPEG we can assume is always smaller than 4MB. You also should read the limitations in terms of maximum number of images/second, etc.
Not all image formats are handled. TIFF for example is not handled.

Also, as it is a cloud service, these limitations will surely evolve, change, maybe depending on a subscription, etc.