Nuxeo Add-Ons

Scan Documents Importer

Updated: March 18, 2024

The Scan Documents Importer addon allows to create documents from XML files located on the file system every time a dedicated event is launched. It can therefore be easily configured to import data on a regular basis.


This addon requires no specific installation steps. It can be installed like any other package with nuxeoctl command line or from the Update Center.


A step by step example explaining the addon configuration can be found in the Nuxeo blogs : [Monday Dev Heaven] Multi-threaded, transactional bulk import with Nuxeo

Please note that the XML can only be mapped to non-multivalued and non-complex fields. If you need this functionality, see the advanced XML parsing section.

A Java mapper class example can be found on GitHub. This allows to create a specific Nuxeo document type depending on the XML source.

Advanced XML Parsing

Advanced XML parsing for complex and / or multivalued fields can be achieved by adding the following bundles into your platform (copy the jar files into the nxserver/bundles directory):

  1. nuxeo-importer-xml-parser
  2. nuxeo-importer-scan-xml-parser

These bundles provide you with a new service (org.nuxeo.ecm.platform.importer.xml.parser.XMLImporterComponent) and extension points that need to be used instead of the regular ones:

  1. documentMapping to determine which document type should be created depending on a set of conditions

  2. attributeMapping to do the XML parsing and map to the corresponding metadata

A detailed documentation on the advanced XML parsing usage can be found on the nuxeo-importer-xml-parser GitHub page. To get you started, below is a working example with the original XML file and the corresponding XML configuration that can be pasted into Nuxeo Studio.

Original XML file

  <order_number value="Invoice NX38937987-421-690" />
  <software_source value="My accounting software" />
  <supplier value="Papeterie Stylo Dépôt" />
  <order_date value="2005-03-12T11:00:00.000Z" />
  <planned_delivery_date value="2005-04-17" />
  <total_incl_taxes value="65.90" />
  <file name="order made on march 12 2005.pdf" />
    <desc>Glue sticks</desc>

Corresponding XML extension into Nuxeo Studio

<!-- Doctype to create depending on XML formatting
     In this case, having an invoice tag means I should create an Invoice document in Nuxeo -->
<extension target="org.nuxeo.ecm.platform.importer.xml.parser.XMLImporterComponent" point="documentMapping">
    <docConfig tagName="invoice">

<!-- XML to metadata mapping
     In this case, my invoice schema is as follows:
         order_number                         string
        software_source                        string
        supplier                            string
        total_inc_taxes                        float
        order_date                            date
        planned_delivery_date                date
        items                                complex, multivalued
            ref                                string
            description                        string
            amount                            float
            deliverydate                    date
<extension target="org.nuxeo.ecm.platform.importer.xml.parser.XMLImporterComponent" point="attributeMapping">
    <attributeConfig tagName="order_number" docProperty="dc:title" xmlPath="@value"/>
  <attributeConfig tagName="software_source" docProperty="dc:source" xmlPath="@value"/>
    <attributeConfig tagName="supplier" docProperty="invoice:supplier" xmlPath="@value"/>
  <attributeConfig tagName="total_incl_taxes" docProperty="invoice:amount" xmlPath="@value"/>
  <attributeConfig tagName="order_date" docProperty="invoice:orderdate" xmlPath="@value"/>
  <attributeConfig tagName="planned_delivery_date" docProperty="invoice:planneddeliverydate" xmlPath="@value"/>

  <attributeConfig tagName="file" docProperty="file:content">
        <mapping documentProperty="filename">@name</mapping>
        <mapping documentProperty="content">@name</mapping>

    <attributeConfig tagName="item" docProperty="invoice:items">
       <mapping documentProperty="ref">ref/text()</mapping>
    <mapping documentProperty="description">desc/text()</mapping>
    <mapping documentProperty="amount">amount/text()</mapping>
    <mapping documentProperty="deliverydate">
                String date = currentElement.selectNodes('delivery_date/text()')[0].getText().trim();
              return Fn.parseDate(date, 'yyyy.MM.dd')