Different Tools for Different Use Cases
The Nuxeo Platform provides several tools for managing imports. Choosing the right tool will depend on your exact use cases:
- What amount of data do you need to import? Hundreds, thousands, millions?
- Do you need to do the import while the application is running? Initial / one time import vs everyday import.
- How complex is your import? How many business rules are integrated in your import?
- What is the source you want to import from? Database, XML, files, ...
- What are your skills? SQL, ETL/ESB, Java dev, ...
This page will walk you through the different import options and give you the pros and cons of each approach.
Possible Approaches
User Imports
By default, the Nuxeo Platform allows users to import several documents at a time via:
Import criteria details
Criteria | Value | Comment |
---|---|---|
Average import speed | Low | A few documents. |
Custom logic handling | Built-in | All custom logic will be called. |
Ability to handle huge volume | No | No transaction batch management. |
Production interruption | No | |
Blob upload | In transaction | The blob upload is part of the import transaction. |
Post import tasks | None |
The key point is that all these user import systems are designed to be easy to use, but are not designed for high performance and huge volume.
HTTP API
Nuxeo HTTP Automation API can be used to run imports inside the Nuxeo Platform.
You can use Automation from custom code, custom scripting or from tools like:
- ETL: see the Talend Connector
- ESB: see the Mule Connector
Using the API allows you to easily define import custom logic on the client side, but:
- blobs upload will be part of the process,
- doing transaction batch is not easy since it requires to create custom chains.
Import criteria details
Criteria | Value | Comment |
---|---|---|
Average import speed | Low / Medium | Several documents (between 5 and 20 docs). |
Custom logic handling | Built-in | All custom logic will be called. |
Ability to handle huge volume | No | No easy transaction batch management. |
Production interruption | No | |
Blob upload | In process |
The blob upload is part of the import process. |
Post import tasks | None |
Platform Importer
The Platform importer is a framework that can be used to build custom importers that use the Java API.
Unlike the previous methods:
- The Java API is directly used: no network and marshaling overhead.
- Blobs are read from a local filesystem: no network cost.
The importer does handle several aspects that are important for managing performances:
- transaction batch,
- de-activating some listeners,
- process event handles in bulk-mode.
Import criteria details
Criteria | Value | Comment |
---|---|---|
Average import speed | High | Several hundreds of documents (between 50 and 500 docs). |
Custom logic handling | Built-in | Most custom logic will be called: depending on which listeners are removed. |
Ability to handle huge volume | Yes | Native handling of transaction batch + bulk event handler mode. |
Production interruption | Yes | The bulk mode is not adapted for a normal usage: at least a dedicated Nuxeo node should be allocated. High speed import is likely to saturate the database: this will slow down all interactive usages. |
Blob upload | Separated |
Blobs are directly read on the server side FileSystem. |
Post import tasks | May need to restart full text indexing. May need to restart process for listeners that were by-passed . | In a lot of cases, the full text indexing is deactivated during processing, as well as other slow processes like video conversation, thumbnails generation, etc. After import, these processes need to be restarted. |
SQL Import
Thanks to the VCS Repository clear and clean SQL structure, you can directly use SQL injection.
Of course, this is by far the fastest technique, but since you will bypass all the Java business layer, you will need to do some checks and post processing. In addition, if you want the SQL import to be really fast, you may want to deactivate some of the integrity constraints and triggers.
Import criteria details
Criteria | Value | Comment |
---|---|---|
Average import speed | Very high | Several thousands of Documents (between 500 and 5000 docs). |
Custom logic handling | Bypass | All Java Layer is by-passed. |
Ability to handle huge volume | Yes | Native handling of transaction batch + bulk event handler mode. |
Production interruption | Yes | Usually, the database server configuration is changed to make the bulk insert more efficient. |
Blob upload | Not handled |
Blobs needs to be managed by a separated process. |
Post import tasks | May need to restart full text indexing. May need to restart some triggers. |
|