The nuxeo-platform-importer-core module is designed to offer support for multi-threaded import on a Nuxeo repository.
It has a configurable framework which has as the main part, the org.nuxeo.ecm.platform.importer.base.GenericMultiThreadedImporter class. This 'importer' is responsible, depending on the way it is configured, for performing the import. The configuration of an 'importer' can be established starting with the instantiation of such an 'importer'. You need to provide a source node of the import, which should contain the entry point of what will be imported, a path to where the import should be made on the current repository, parameters that will control the maximum number of threads that will be created during the import and also a logger that will be used during the import (a default one, which is provided by the module, can be used). In case you need to have an audit support for the import, you can obtain one by providing a 'jobName', which will be used to represent in audit, the workflow of the import that will be started. The audit support can be used to avoid later imports(in case the import finished with success).
Here is an example of how such an importer can be instantiated:
Next, an 'importer' can be configured after instantiation, by providing it 'tools' that are used during the import.
One of these 'tools' is so called the 'factory', and it is used when performing the import of a document. Usually such a 'factory' is supposed to treat both cases, when importing a folderish or a leaf document (an interface is provided for this scope org.nuxeo.ecm.platform.importer.factories.ImporterDocumentModelFactory).
Another 'tool' that is used is the 'filter'. More than one 'filter' can be provided to a 'factory' and their scope is to handle the events that are raised during the import. Usually it is better to block all the events that are raised during and after the import of a document (the import of a document can be translated in creating a Nuxeo document model and saving properties on it, which often causes the raise of events), in order to increase the performance of the import.
The last 'tool' that can be provided to an 'importer' is the thread policy that should be used. In case no thread policy is specified, then the default multi thread one is used (this is provided by org.nuxeo.ecm.platform.importer.threading.DefaultMultiThreadingPolicy class). Here is an example of how such tools can be provided to an instantiated importer.
Usually such an 'importer' should be instantiated and configured in an instance method of a class that extends the org.nuxeo.ecm.platform.importer.executor.AbstractImporterExecutor class. In this instance method, after the importer is instantiated and configured, a call to a superclass method should be made, which will start the import
The second parameter specifies whether the import should start synchronous or asynchronous.
This class will be the base class for the import and the method that instantiates, configure and start the import, should be called.
Download
To download nuxeo-platform-importer-core, check the Nuxeo Marketplace or, if needed, download a more recent version of the JAR (to be installed by hand) from the Nuxeo Maven repository.
Other import tools
You can also have a look at the https://github.com/nuxeo/nuxeo-platform-replication which is a Nuxeo replication tool that uses internally the nuxeo-platform-importer-core module. For more details about Nuxeo replication have a look at How to replicate the Nuxeo repository and http://doc.nuxeo.org/5.1/books/nuxeo-book/html/admin-replication.html.