Marketplace Add-Ons

Scan Documents Importer

Updated: October 16, 2020

The Scan Documents Importer add-on allows to create documents from XML files located on the file system every time a dedicated event is launched. It can therefore be easily configured to import data on a regular basis.

Installation

The Scan Documents Importer addon is available as a Nuxeo Package since Nuxeo 5.9.4. Look at how to install a Nuxeo Package for further instuctions.

Installation instructions

Due to a known bug in Nuxeo 5.9.5, some elements need to be added manually in addition to the Nuxeo Package:

Configuration

A step by step example explaining the add-on configuration can be found in the Nuxeo blogs : [Monday Dev Heaven] Multi-threaded, transactional bulk import with Nuxeo

Please note that the XML can only be mapped to non-multivalued and non-complex fields. If you need this functionality, see the advanced XML parsing section.

A Java mapper class example can be found on GitHub. This allows to create a specific Nuxeo document type depending on the XML source.

Advanced XML parsing

Advanced XML parsing for complex and / or multivalued fields can be achieved by adding the following bundles into your platform (copy the jar files into the nxserver/bundles directory):

  1. nuxeo-importer-xml-parser
  2. nuxeo-importer-scan-xml-parser

These bundles provide you with a new service (org.nuxeo.ecm.platform.importer.xml.parser.XMLImporterComponent) and extension points that need to be used instead of the regular ones:

  1. documentMapping to determine which document type should be created depending on a set of conditions

  2. attributeMapping to do the XML parsing and map to the corresponding metadata

A detailed documentation on the advanced XML parsing usage can be found on the nuxeo-importer-xml-parser GitHub page. To get you started, below is a working example with the original XML file and the corresponding XML configuration that can be pasted into Nuxeo Studio.

Original XML file

<invoice>
  <order_number value="Invoice NX38937987-421-690" />
  <software_source value="My accounting software" />
  <supplier value="Papeterie Stylo Dépôt" />
  <order_date value="2005-03-12T11:00:00.000Z" />
  <planned_delivery_date value="2005-04-17" />
  <total_incl_taxes value="65.90" />
  <file name="order made on march 12 2005.pdf" />
  <item>
    <ref>373668</ref>
    <desc>Pens</desc>
    <amount>12.30</amount>
    <delivery_date>2005.04.17</delivery_date>
  </item>
  <item>
    <ref>737282</ref>
    <desc>Poster</desc>
    <amount>3.70</amount>
    <delivery_date>2005.04.17</delivery_date>
  </item>
  <item>
    <ref>029938</ref>
    <desc>Glue sticks</desc>
    <amount>7.75</amount>
    <delivery_date>2005.04.20</delivery_date>
  </item>
</invoice>

Corresponding XML extension into Nuxeo Studio

<!-- Doctype to create depending on XML formatting
     In this case, having an invoice tag means I should create an Invoice document in Nuxeo --> 
<extension target="org.nuxeo.ecm.platform.importer.xml.parser.XMLImporterComponent" point="documentMapping"> 
    <docConfig tagName="invoice"> 
      <docType>Invoice</docType> 
    </docConfig> 
</extension> 

<!-- XML to metadata mapping 
     In this case, my invoice schema is as follows: 
         order_number                         string
        software_source                        string
        supplier                            string
        total_inc_taxes                        float
        order_date                            date
        planned_delivery_date                date
        items                                complex, multivalued
            ref                                string
            description                        string
            amount                            float
            deliverydate                    date 
--> 
<extension target="org.nuxeo.ecm.platform.importer.xml.parser.XMLImporterComponent" point="attributeMapping"> 
    <attributeConfig tagName="order_number" docProperty="dc:title" xmlPath="@value"/> 
  <attributeConfig tagName="software_source" docProperty="dc:source" xmlPath="@value"/> 
    <attributeConfig tagName="supplier" docProperty="invoice:supplier" xmlPath="@value"/> 
  <attributeConfig tagName="total_incl_taxes" docProperty="invoice:amount" xmlPath="@value"/> 
  <attributeConfig tagName="order_date" docProperty="invoice:orderdate" xmlPath="@value"/> 
  <attributeConfig tagName="planned_delivery_date" docProperty="invoice:planneddeliverydate" xmlPath="@value"/> 

  <attributeConfig tagName="file" docProperty="file:content"> 
        <mapping documentProperty="filename">@name</mapping> 
        <mapping documentProperty="content">@name</mapping> 
    </attributeConfig> 

    <attributeConfig tagName="item" docProperty="invoice:items"> 
       <mapping documentProperty="ref">ref/text()</mapping> 
    <mapping documentProperty="description">desc/text()</mapping> 
    <mapping documentProperty="amount">amount/text()</mapping> 
    <mapping documentProperty="deliverydate"> 
             #{ 
                String date = currentElement.selectNodes('delivery_date/text()')[0].getText().trim(); 
              return Fn.parseDate(date, 'yyyy.MM.dd') 
        }]]> 
        </mapping> 
  </attributeConfig> 
</extension>