Marketplace Add-Ons

Scan Documents Importer

Updated: July 17, 2023

The Scan Documents Importer addon allows to create documents from XML files located on the file system every time a dedicated event is launched. It can therefore be easily configured to import data on a regular basis.


The Scan Documents Importer addon is available as a Nuxeo Package. Look at how to install a Nuxeo Package for further instructions.


A step by step example explaining the addon configuration can be found in the Nuxeo blogs : [Monday Dev Heaven] Multi-threaded, transactional bulk import with Nuxeo

Please note that the XML can only be mapped to non-multivalued and non-complex fields. If you need this functionality, see the advanced XML parsing section.

A Java mapper class example can be found on GitHub. This allows to create a specific Nuxeo document type depending on the XML source.

Advanced XML Parsing

Advanced XML parsing for complex and / or multivalued fields can be achieved by adding the following bundles into your platform (copy the jar files into the nxserver/bundles directory):

  1. nuxeo-importer-xml-parser
  2. nuxeo-importer-scan-xml-parser

These bundles provide you with a new service (org.nuxeo.ecm.platform.importer.xml.parser.XMLImporterComponent) and extension points that need to be used instead of the regular ones:

  1. documentMapping to determine which document type should be created depending on a set of conditions

  2. attributeMapping to do the XML parsing and map to the corresponding metadata

A detailed documentation on the advanced XML parsing usage can be found on the nuxeo-importer-xml-parser GitHub page. To get you started, below is a working example with the original XML file and the corresponding XML configuration that can be pasted into Nuxeo Studio.

Original XML file

  <order_number value="Invoice NX38937987-421-690" />
  <software_source value="My accounting software" />
  <supplier value="Papeterie Stylo Dépôt" />
  <order_date value="2005-03-12T11:00:00.000Z" />
  <planned_delivery_date value="2005-04-17" />
  <total_incl_taxes value="65.90" />
  <file name="order made on march 12 2005.pdf" />
    <desc>Glue sticks</desc>

Corresponding XML extension into Nuxeo Studio

<!-- Doctype to create depending on XML formatting
     In this case, having an invoice tag means I should create an Invoice document in Nuxeo --> 
<extension target="org.nuxeo.ecm.platform.importer.xml.parser.XMLImporterComponent" point="documentMapping"> 
    <docConfig tagName="invoice"> 

<!-- XML to metadata mapping 
     In this case, my invoice schema is as follows: 
         order_number                         string
        software_source                        string
        supplier                            string
        total_inc_taxes                        float
        order_date                            date
        planned_delivery_date                date
        items                                complex, multivalued
            ref                                string
            description                        string
            amount                            float
            deliverydate                    date 
<extension target="org.nuxeo.ecm.platform.importer.xml.parser.XMLImporterComponent" point="attributeMapping"> 
    <attributeConfig tagName="order_number" docProperty="dc:title" xmlPath="@value"/> 
  <attributeConfig tagName="software_source" docProperty="dc:source" xmlPath="@value"/> 
    <attributeConfig tagName="supplier" docProperty="invoice:supplier" xmlPath="@value"/> 
  <attributeConfig tagName="total_incl_taxes" docProperty="invoice:amount" xmlPath="@value"/> 
  <attributeConfig tagName="order_date" docProperty="invoice:orderdate" xmlPath="@value"/> 
  <attributeConfig tagName="planned_delivery_date" docProperty="invoice:planneddeliverydate" xmlPath="@value"/> 

  <attributeConfig tagName="file" docProperty="file:content"> 
        <mapping documentProperty="filename">@name</mapping> 
        <mapping documentProperty="content">@name</mapping> 

    <attributeConfig tagName="item" docProperty="invoice:items"> 
       <mapping documentProperty="ref">ref/text()</mapping> 
    <mapping documentProperty="description">desc/text()</mapping> 
    <mapping documentProperty="amount">amount/text()</mapping> 
    <mapping documentProperty="deliverydate"> 
                String date = currentElement.selectNodes('delivery_date/text()')[0].getText().trim(); 
              return Fn.parseDate(date, 'yyyy.MM.dd')