The Nuxeo AI package integrates Machine Learning Services into the Nuxeo platform. This can be used on several tasks like Data Enrichment.
See the GitHub Readme for the Developer project description.
Concept
Nuxeo AI package is a core system of streams that allows the Nuxeo Platform to interact with AI services, be them external from external suppliers, or internal from Nuxeo. These services can be used in a multitude of ways within the platform.
The core of the system is a sequence of processors connected with streams. At the head of the process there is a filtering system that selects documents to be processed. The next step is to call the AI service to apply a classification to the data. The final step handles the returned data from the AI service, transforms it to the purpose needed in the Nuxeo Platform.
The first use of the Core-AI streams system is to enrich data in existing/new document. We filter data on a new documents event from a specific documentType, call a classification system and use the results to enrich the document via tags or a specific facet. This makes it easy to search for data.
The system is composed of a core package, called nuxeo-ai-core
and extension
packages that implement extensions for external services usage.
Core AI
Installation
This addon requires no specific installation steps. It can be installed like any other package with nuxeoctl command line or from the Marketplace.
Nuxeo Configuration
You can set these in your nuxeo.conf
:
Parameter | Description | Default value | Since |
---|---|---|---|
nuxeo.ai.images.enabled |
Create a stream for creation/modification of images. | false |
Since 1.0 |
nuxeo.ai.video.enabled |
Create a stream for creation/modification of video files. | false |
Since 1.0 |
nuxeo.ai.audio.enabled |
Create a stream for creation/modification of audio files. | false |
Since 1.0 |
nuxeo.ai.text.enabled |
Create a stream for text extracted from blobs. | false |
Since 1.0 |
nuxeo.ai.stream.config.name |
The name of the stream log config | pipes |
Since 1.0 |
nuxeo.enrichment.source.stream |
The name of the stream that receives Enrichment data | enrichment.in |
Since 1.0 |
nuxeo.enrichment.save.tags |
Should enrichment labels be saved as a standard Nuxeo tags? | false |
Since 1.0 |
nuxeo.enrichment.save.facets |
Should enrichment data be saved as a document facet? | true |
Since 1.0 |
nuxeo.enrichment.raiseEvent |
Should an enrichmentMetadataCreated event be raised when new enrichment data is added to the stream? |
true |
Since 1.0 |
Core AI Streams
Core AI allows you to customize a series of streams and processors. By default it provides 4 default document streams that can be activated by the configuration parameters shown above.
- images - When a image is added to a document.
- videos - When a video is added to a document.
- audio - When an audio file is added to a document.
- text - When binary text is extracted from a document.
These allow you to start your processing chain quickly.
Insight Deduplication
This feature requires an Insight subscription.
The Deduplication feature presents images that are similar to other existing repository images. In a Nuxeo repository, such detection can be run each time a picture will be added to a document. On existing assets, a complete re-index can be operated through Nuxeo Insight.
Configuration
In order to activate the deduplication feature, we need to activate it in the nuxeo.conf
file:
nuxeo.insight.dedup.enabled=true
By default, each document of Picture type will be analyzed to detect potential similar documents. Here is the exact query:
SELECT * FROM Document WHERE ecm:mixinType = 'Picture' AND ecm:tag NOT IN ('not_duplicate')
This query and the metadata to introspect can be customized through a Nuxeo extension point as follows:
<requires>org.nuxeo.ai.similar.content.default.config</require>
<extension target="org.nuxeo.ai.similar.content.services.SimilarServiceComponent" point="configuration">
<deduplication name="specific-contribution"
query="SELECT * FROM Document WHERE ecm:mixinType = 'Picture' AND ecm:tag NOT IN ('not_duplicate')">
<!-- Here is the blob metadata to introspect-->
<xpath>file:content</xpath>
<filter id="dedup-default-filter">
<rule grant="true">
<type>Picture</type>
</rule>
</filter>
</deduplication>
</extension>
Note the specific-contribution
name of the contribution that needs to be set in nuxeo.conf
through the variable:
nuxeo.ai.similar.content.configuration.id=specific-contribution
There are two ways to detect the similar documents:
- Via a complete re-index of the repository through Nuxeo Insight.
- Via an automatic listener that will display the similar documents in the Nuxeo UI each time an image has been added/updated/removed.
This listener is disabled by default and can be activated through nuxeo.conf
using:
# Deduplication Listener activation flag
nuxeo.ai.similar.content.listener.enable=true
Nuxeo Web UI Customization Forms
In order to be able to display all the similar documents each time an image is added/updated/removed, you have to add in the target document type form (create/edit/metadata) as follows:
Example of widget usage in Create/Edit forms:
<nuxeo-dropzone role="widget" label="[[i18n('file.content')]]" name="content" document=""></nuxeo-dropzone> <nuxeo-ai-dedup-grid property="file:content" doc=[[document]]/>
Example of widget usage in Metadata forms:
<nuxeo-ai-dedup-grid property="file:content" doc=[[document]]/>
- Don't forget the double binding on
{{document}}
on the nuxeo dropzone element, that the systems can get changes event for the Create/Edit forms. - The
property
parameter needs to be set to define which blob metadata you want to introspect.
- Don't forget the double binding on
Example of widget with custom content:
<nuxeo-ai-dedup-grid property="file:content" doc=[[document]]> <slot name="dedup-content"> <!-- custom template for each similar document accessible via [[item]] --> <p>[[item.title]]</p> </slot> </nuxeo-ai-dedup-grid>
Default Content is here:
<nuxeo-card heading="[[_getSimilarsLength(similars)]] [[i18n('ai.insight.dedup.label')]]" collapsible opened>
<template is="dom-repeat" items="[[similars]]">
<slot name="dedup-grid-content">
<div class="thumbnailContainer" on-tap="_navigate">
<img src="[[_thumbnail(item)]]" alt$="[[item.title]]"/>
</div>
<a class="title" href$="[[item.contextParameters.documentURL]]" on-tap="_navigate">
<div class="dataContainer">
<div class="title" id="title">[[item.title]]</div>
<nuxeo-tag>[[formatDocType(item.type)]]</nuxeo-tag>
<nuxeo-tooltip for="title">[[item.title]]</nuxeo-tooltip>
</div>
</a>
<div class="actions">
<div on-click="_delete" style="float:left">
<paper-icon-button icon="delete" noink=""></paper-icon-button>
<span class="label" hidden$="[[!showLabel]]">[[_label]]</span>
</div>
<nuxeo-favorites-toggle-button document="[[item]]"></nuxeo-favorites-toggle-button>
<nuxeo-download-button document="[[item]]"></nuxeo-download-button>
</div>
</slot>
</template>
</nuxeo-card>
Misc
Hooks
- An event
similarDocumentsFound
is fired each time a similar document has been sent to the Insight deduplication index. By creating a listener triggered by this event, you can introspect all the similar documents (by their ids) and the source document as follow:
Example
package .....
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.concurrent.atomic.AtomicReference;
import org.nuxeo.ecm.core.api.DocumentModel;
import org.nuxeo.ecm.core.event.Event;
import org.nuxeo.ecm.core.event.EventListener;
import org.nuxeo.ecm.core.event.impl.DocumentEventContext;
public class ResolveDuplicatesListener implements EventListener {
public static final AtomicReference<DocumentModel> docRef = new AtomicReference<>();
public static final List<String> similarIds = Collections.synchronizedList(new ArrayList<>());
@Override
public void handleEvent(Event event) {
DocumentEventContext ctx = (DocumentEventContext) event.getContext();
docRef.set(ctx.getSourceDocument());
@SuppressWarnings("unchecked")
List<String> ids = (List<String>) ctx.getProperty("similarIds");
similarIds.clear();
similarIds.addAll(ids);
}
}
- An Automation operation can be contributed as "deduplication operation" as follows:
<require>org.nuxeo.ai.similar.content.default.config</require>
<extension target="org.nuxeo.ai.similar.content.services.SimilarServiceComponent" point="operation">
<deduplication-operation class="org.something.YourOperation"/>
</extension>
This operation is triggered when a batch duplicate process is launched (AI.ProcessDuplicates
operation execution) and can be used to introspect each document with its similars:
Example (default operation):
@Operation(id = DefaultDeduplicationResolverOperation.ID, category = "AI", label = "Default Deduplication resolver")
public class DefaultDeduplicationResolverOperation {
private static final Logger log = LogManager.getLogger(DefaultDeduplicationResolverOperation.class);
public static final String ID = "AI.DeduplicationResolverOperation";
@Param(name = "similar")
protected Set<Pair<String, String>> similar;
@Param(name = "xpath")
protected String xpath;
@OperationMethod
public void resolve(DocumentModel doc) {
log.warn("Received document {} with duplicates of size {}", doc.getId(), similar.size());
}
}
Additional Configuration
Other parameters of the deduplication stream can be updated:
# default value = 2
nuxeo.insight.dedup.concurrency
# default value = 2
nuxeo.insight.dedup.partitions
# default value = 1
nuxeo.ai.dedup.scroller.concurrency
# default value = 2
nuxeo.ai.dedup.resolver.concurrency
Extensions
Core AI is created with multiple extension points to the several processors.
The initial release has:
nuxeo-ai-aws
package that allows us to connect to the Machine Learning services supplied by Amazon.nuxeo-ai-image-quality
package that uses Sightengine.
AWS
As part of the initial release, we have a set of extensions for Amazon Web Services.
These include Rekognition, Comprehend and Translate.
See the GitHub Readme for more technical details and all the services that are currently available with this extension.
Before You Start
You should be familiar with Amazon Web Services and be in possession of your credentials.
Big Picture
Specifying Your Amazon Credentials and Region
Credentials are discovered using nuxeo-runtime-aws
.
The chain searches for credentials in order: Nuxeo's AWSConfigurationService, environment variables, system properties, profile credentials, EC2Container credentials.
In nuxeo.conf
, add the following lines:
nuxeo.aws.accessKeyId=your_AWS_ACCESS_KEY_ID
nuxeo.aws.secretKey=your_AWS_SECRET_ACCESS_KEY
nuxeo.aws.sessionToken=your_AWS_SESSION_TOKEN
nuxeo.aws.region=your_AWS_REGION
The region code can be found in the S3 Region Documentation. The default is us-east-1
. At the time this documentation was written, the list is:
- us-east-1: US East (N. Virginia) (default)
- us-east-2: US East (Ohio)
- us-west-1: US West (N. California)
- us-west-2: US West (Oregon)
- eu-west-1: EU (Ireland)
- eu-west-2: EU (London)
- eu-west-3: EU (Paris)
- eu-central-1: EU (Frankfurt)
- ap-south-1: Asia Pacific (Mumbai)
- ap-southeast-1: Asia Pacific (Singapore)
- ap-southeast-2: Asia Pacific (Sydney)
- ap-northeast-1: Asia Pacific (Tokyo)
- ap-northeast-2: Asia Pacific (Seoul)
- ap-northeast-3: Asia Pacific (Osaka-Local)
- sa-east-1: South America (São Paulo)
- ca-central-1: Canada (Central)
- cn-north-1: China (Beijing)
- cn-northwest-1: China (Ningxia)
If you are only using images and an S3 BinaryManager is already being used then it re-uses the image data to pass a reference instead of uploading the binary again.
Installation
This addon requires no specific installation steps. It can be installed like any other package with nuxeoctl command line or from the Marketplace.
Quick Start
Install the
nuxeo-ai-aws
package../bin/nuxeoctl mp-install nuxeo-ai-aws
Add the following parameters to
nuxeo.conf
.nuxeo.ai.images.enabled=true nuxeo.ai.text.enabled=true nuxeo.enrichment.aws.images=true nuxeo.enrichment.aws.text=true nuxeo.enrichment.save.tags=true nuxeo.enrichment.save.facets=true nuxeo.enrichment.raiseEvent=true
- Set your AWS credentials AWS credentials.
- Start Nuxeo and upload an image.
- Wait 10 seconds then look at the document tags and document JSON
enrichment:items
facet.
Nuxeo Configuration
You can set these in your nuxeo.conf
. They are used in combination with the other configuration parameters for nuxeo-ai-core
shown above.
Parameter | Description | Default value | Since |
---|---|---|---|
nuxeo.enrichment.aws.images |
Run AWS enrichiment services on images. | false |
Since 1.0 |
nuxeo.enrichment.aws.text |
Run AWS enrichiment services on text. | false |
Since 1.0 |
Image Quality
An implementation of an enrichment service that uses Sightengine.
See the GitHub Readme for additional technical details.
Before You Start
Register with Sightengine to obtain your apiKey
and apiSecret
.
Big Picture
Installation
This addon requires no specific installation steps. It can be installed like any other package with nuxeoctl command line or from the Marketplace.
Quick Start
Install the nuxeo-ai-image-quality package.
./bin/nuxeoctl mp-install nuxeo-ai-image-quality`
Add the following parameters to
nuxeo.conf
:nuxeo.ai.images.enabled=true nuxeo.enrichment.save.tags=true nuxeo.enrichment.save.facets=true nuxeo.enrichment.raiseEvent=true nuxeo.ai.sightengine.apiKey=YOUR_API_KEY nuxeo.ai.sightengine.apiSecret=YOUR_API_SECRET
- Start Nuxeo and upload an image.
- Wait 10 seconds then look at the document tags and document JSON
enrichment:items
facet.
Nuxeo Configuration
You can set these in your nuxeo.conf
. They are used in combination with the other configuration parameters for nuxeo-ai-core
shown above.
Parameter | Description | Default value | Since |
---|---|---|---|
nuxeo.ai.sightengine.apiKey |
The API key for sightengine | Since 1.0 | |
nuxeo.ai.sightengine.apiSecret |
The API secret for sightengine | Since 1.0 | |
nuxeo.ai.sightengine.all |
Configure an enrichment service to process the images stream and call all sightengine models |
true |
Since 1.0 |