Nuxeo Vision

The Nuxeo Vision addon provides a gateway to computer vision services. Currently it supports the Google Vision API and Amazon Rekognition API. Other services can be easily integrated as they become available. These external services allow you to understand the content of an image by encapsulating powerful machine learning models. This gateway provides a wide range of features including shape recognition, auto-classification of images, OCR, face detection and more.

Hyland University

Watch the related courses on Hyland University

Installation and Configuration

This addon requires no specific installation steps. It can be installed like any other package with nuxeoctl command line or from the Update Center.

To install the Nuxeo Vision Package, you have several options:

From the Nuxeo Marketplace: install the Nuxeo Vision package.
From the command line: nuxeoctl mp-install nuxeo-vision.
From the Admin Center on JSF UI, go to Admin > Update Center > Packages from Nuxeo Marketplace.

Google Vision Configuration

Configure a Google service account
Google Cloud Computing invoicing changes from time to time: it is likely that you'll need to activate billing on your account.
You can generate either an API Key or a Service Account Key (saved as a JSON file)

It is a common Security best practice to use an API key, we recommend avoiding using a service account.

If you generated an API key, use the org.nuxeo.vision.google.key parameter:
```
org.nuxeo.vision.google.key=[your_api_key_goes_here]
```
If you created a Service Account Key, install it on your server and edit your nuxeo.conf to add the full path to the file:
```
org.nuxeo.vision.google.credential=[path_to_credentials_goes_here]
```

See Google Documentation about Vision for more information.

AWS Rekognition Configuration

Configure a key/secret pair in the AWS console
Check the FAQ to see in which regions the API is available

Edit nuxeo.conf with the suitable information:

org.nuxeo.vision.default.provider=aws
nuxeo.aws.region=
nuxeo.aws.accessKeyId=
nuxeo.aws.secretKey=

Functional Overview

By default, the Computer Vision Service is called every time the main binary file of a picture or video document is updated. Classification labels are stored in the Tags property.

For videos, the platform sends the images of the storyboard to the cloud service.

Customization

Overriding the Default Behavior

The default behavior is defined in two automation chains which can be overridden with an XML contribution.

Once the addon is installed on your Nuxeo instance, import the VisionOp operation definition in your Studio project. See the instructions on the page Referencing an Externally Defined Operation.
Create your automation chains and use the operation inside them. You can use the regular automation chains or Automation Scripting.

Create an XML extension that specifies that your automation chains should be used.

<extension target="org.nuxeo.vision.core.service" point="configuration">
    <configuration>
        <defaultProviderName>${org.nuxeo.vision.default.provider:=}</defaultProviderName>
        <pictureMapperChainName>MY_PICTURE_CHAIN</pictureMapperChainName>
        <videoMapperChainName>MY_VIDEO_CHAIN</videoMapperChainName>
    </configuration>
</extension>

Deploy your Studio customization.

Disabling the Default Behavior

The default behavior can also be completely disabled with the following contribution:

<extension target="org.nuxeo.ecm.core.event.EventServiceComponent" point="listener">
    <listener name="visionPictureConversionChangedListener" class="org.nuxeo.vision.core.listener.PictureConversionChangedListener" enabled="false"></listener>
    <listener name="visionVideoChangedListener" class="org.nuxeo.vision.core.listener.VideoStoryboardChangedListener" enabled="false"></listener>
</extension>

Core Implementation

`VisionOp` Operation

In order to enable you to build your own custom logic, the addon provides an automation operation, called VisionOp. This operation takes a blob or list of blobs as input and calls the Computer Vision service for each one.

The result of the operation is stored in a context variable and is an object of type VisionResponse.

Here is how the operation is used in the default chain:

function run(input, params) {
    var blob = Picture.GetView(input, {
        'viewName': 'Medium'
    });
    blob = VisionOp(blob, {
        features: ['LABEL_DETECTION'],
        maxResults: 5,
        outputVariable: 'annotations'
    });

    var annotations = ctx.annotations;
    if (annotations === undefined || annotations.length === 0) {
        return;
    }
    var textAndLabels = annotations[0];
    // build tag list
    var labels = textAndLabels.getClassificationLabels();
    if (labels !== undefined && labels !== null && labels.length > 0) {
        var tags = [];
        for (var i = 0; i < labels.length; i++) {
            var label = labels[i];
            var tag = label.getText();
            if (tag ===undefined || tag ===null) {
                continue;
            }
            tags.push(tag.replace(/[^A-Z0-9]+/ig,'+'));
        }
        input = Services.TagDocument(input, {'tags': tags});
    }
    input = Document.Save(input, {});
    return input;
}

Listeners and Events

When using the default implementation, Nuxeo Vision sends events once the processing is done:

After handing a picture, it sends the visionOnImageDone event
After processing a video (actually, the video storyboard), it sends the visionOnVideoDone event

Listening to these events is a good way to process your own business logic when it depends on the result of the tagging: you are then sure it was processed with no error. If an error occurred during the call to the service, these events are not fired and server.log will contain the error stack.

These events are not sent if automatic processing had been disabled, and they are not sent by the VisionOp operation. If you change the behavior, you may want to send the events (this depends on your configuration)

Google Vision and AWS Rekognition API Limitations

Google Vision API has some known and documented limitations you should be aware of.
You should also regularly check Google Vision API documentation for changes. For example, at the time the API was first released, TIFF was not supported. It supports it as of December 2018, etc.
Amazon Rekognition doesn't provide text-recognition services (OCR). Nuxeo Vision implements only labels detection and safe search.

Also, as these are cloud services, these limitations evolve, change, maybe depending on a subscription, etc.

Nuxeo Vision

Installation and Configuration

Google Vision Configuration

AWS Rekognition Configuration

Functional Overview

Customization

Overriding the Default Behavior

Disabling the Default Behavior

Core Implementation

VisionOp Operation

Listeners and Events

Google Vision and AWS Rekognition API Limitations

`VisionOp` Operation