Guides and Tutorials

Choosing How to Import Data in the Nuxeo Platform

Updated: October 16, 2020

Different Tools for Different Use Cases

The Nuxeo Platform provides several tools for managing imports. Choosing the right tool will depend on your exact use cases:

  • What amount of data do you need to import? Hundreds, thousands, millions?
  • Do you need to do the import while the application is running? Initial / one time import vs everyday import.
  • How complex is your import? How many business rules are integrated in your import?
  • What is the source you want to import from? Database, XML, files, ...
  • What are your skills? SQL, ETL/ESB, Java dev, ...

This page will walk you through the different import options and give you the pros and cons of each approach.

Possible Approaches

User Imports

By default, the Nuxeo Platform allows users to import several documents at a time via:

Import criteria details

CriteriaValueComment
Average import speedLowA few documents.
Custom logic handlingBuilt-inAll custom logic will be called.
Ability to handle huge volumeNoNo transaction batch management.
Production interruptionNo 
Blob uploadIn transactionThe blob upload is part of the import transaction.
Post import tasksNone 

The key point is that all these user import systems are designed to be easy to use, but are not designed for high performance and huge volume.

HTTP API

Nuxeo HTTP Automation API can be used to run imports inside the Nuxeo Platform.

You can use Automation from custom code, custom scripting or from tools like:

Using the API allows you to easily define import custom logic on the client side, but:

  • blobs upload will be part of the process,
  • doing transaction batch is not easy since it requires to create custom chains.

Import criteria details

CriteriaValueComment
Average import speedLow / MediumSeveral documents (between 5 and 20 docs).
Custom logic handlingBuilt-inAll custom logic will be called.
Ability to handle huge volumeNoNo easy transaction batch management.
Production interruptionNo 
Blob uploadIn process

The blob upload is part of the import process.

Post import tasksNone 
For more information

Platform Importer

The Platform importer is a framework that can be used to build custom importers that use the Java API.

Unlike the previous methods:

  • The Java API is directly used: no network and marshaling overhead.
  • Blobs are read from a local filesystem: no network cost.

The importer does handle several aspects that are important for managing performances:

  • transaction batch,
  • de-activating some listeners,
  • process event handles in bulk-mode.

Import criteria details

CriteriaValueComment
Average import speedHighSeveral hundreds of documents (between 50 and 500 docs).
Custom logic handlingBuilt-inMost custom logic will be called: depending on which listeners are removed.
Ability to handle huge volumeYesNative handling of transaction batch + bulk event handler mode.
Production interruptionYesThe bulk mode is not adapted for a normal usage: at least a dedicated Nuxeo node should be allocated. High speed import is likely to saturate the database: this will slow down all interactive usages.
Blob uploadSeparated

Blobs are directly read on the server side FileSystem.

Post import tasksMay need to restart full text indexing. May need to restart process for listeners that were by-passed .In a lot of cases, the full text indexing is deactivated during processing, as well as other slow processes like video conversation, thumbnails generation, etc. After import, these processes need to be restarted.

SQL Import

Thanks to the VCS Repository clear and clean SQL structure, you can directly use SQL injection.

Of course, this is by far the fastest technique, but since you will bypass all the Java business layer, you will need to do some checks and post processing. In addition, if you want the SQL import to be really fast, you may want to deactivate some of the integrity constraints and triggers.

Import criteria details

CriteriaValueComment
Average import speedVery highSeveral thousands of Documents (between 500 and 5000 docs).
Custom logic handlingBypassAll Java Layer is by-passed.
Ability to handle huge volumeYesNative handling of transaction batch + bulk event handler mode.
Production interruptionYesUsually, the database server configuration is changed to make the bulk insert more efficient.
Blob uploadNot handled

Blobs needs to be managed by a separated process.

Post import tasksMay need to restart full text indexing. May need to restart some triggers.

  • Rebuild full text.
  • Rebuild ancestors cache.
  • Rebuild read-ACLs