Nuxeo Add-Ons

Scan Documents Importer

Updated: March 18, 2024

The Scan Documents Importer addon allows to create documents from XML files located on the file system every time a dedicated event is launched. It can therefore be easily configured to import data on a regular basis.

Installation

This addon requires no specific installation steps. It can be installed like any other package with nuxeoctl command line or from the Update Center.

Configuration

A step by step example explaining the addon configuration can be found in the Nuxeo blogs : [Monday Dev Heaven] Multi-threaded, transactional bulk import with Nuxeo

Please note that the XML can only be mapped to non-multivalued and non-complex fields. If you need this functionality, see the advanced XML parsing section.

A Java mapper class example can be found on GitHub. This allows to create a specific Nuxeo document type depending on the XML source.

Advanced XML Parsing

Advanced XML parsing for complex and / or multivalued fields can be achieved by adding the following bundles into your platform (copy the jar files into the nxserver/bundles directory):

  1. nuxeo-importer-xml-parser
  2. nuxeo-importer-scan-xml-parser

These bundles provide you with a new service (org.nuxeo.ecm.platform.importer.xml.parser.XMLImporterComponent) and extension points that need to be used instead of the regular ones:

  1. documentMapping to determine which document type should be created depending on a set of conditions

  2. attributeMapping to do the XML parsing and map to the corresponding metadata

A detailed documentation on the advanced XML parsing usage can be found on the nuxeo-importer-xml-parser GitHub page. To get you started, below is a working example with the original XML file and the corresponding XML configuration that can be pasted into Nuxeo Studio.

Original XML file

<invoice>
  <order_number value="Invoice NX38937987-421-690" />
  <software_source value="My accounting software" />
  <supplier value="Papeterie Stylo Dépôt" />
  <order_date value="2005-03-12T11:00:00.000Z" />
  <planned_delivery_date value="2005-04-17" />
  <total_incl_taxes value="65.90" />
  <file name="order made on march 12 2005.pdf" />
  <item>
    <ref>373668</ref>
    <desc>Pens</desc>
    <amount>12.30</amount>
    <delivery_date>2005.04.17</delivery_date>
  </item>
  <item>
    <ref>737282</ref>
    <desc>Poster</desc>
    <amount>3.70</amount>
    <delivery_date>2005.04.17</delivery_date>
  </item>
  <item>
    <ref>029938</ref>
    <desc>Glue sticks</desc>
    <amount>7.75</amount>
    <delivery_date>2005.04.20</delivery_date>
  </item>
</invoice>

Corresponding XML extension into Nuxeo Studio

<!-- Doctype to create depending on XML formatting
     In this case, having an invoice tag means I should create an Invoice document in Nuxeo -->
<extension target="org.nuxeo.ecm.platform.importer.xml.parser.XMLImporterComponent" point="documentMapping">
    <docConfig tagName="invoice">
      <docType>Invoice</docType>
    </docConfig>
</extension>

<!-- XML to metadata mapping
     In this case, my invoice schema is as follows:
         order_number                         string
        software_source                        string
        supplier                            string
        total_inc_taxes                        float
        order_date                            date
        planned_delivery_date                date
        items                                complex, multivalued
            ref                                string
            description                        string
            amount                            float
            deliverydate                    date
-->
<extension target="org.nuxeo.ecm.platform.importer.xml.parser.XMLImporterComponent" point="attributeMapping">
    <attributeConfig tagName="order_number" docProperty="dc:title" xmlPath="@value"/>
  <attributeConfig tagName="software_source" docProperty="dc:source" xmlPath="@value"/>
    <attributeConfig tagName="supplier" docProperty="invoice:supplier" xmlPath="@value"/>
  <attributeConfig tagName="total_incl_taxes" docProperty="invoice:amount" xmlPath="@value"/>
  <attributeConfig tagName="order_date" docProperty="invoice:orderdate" xmlPath="@value"/>
  <attributeConfig tagName="planned_delivery_date" docProperty="invoice:planneddeliverydate" xmlPath="@value"/>

  <attributeConfig tagName="file" docProperty="file:content">
        <mapping documentProperty="filename">@name</mapping>
        <mapping documentProperty="content">@name</mapping>
    </attributeConfig>

    <attributeConfig tagName="item" docProperty="invoice:items">
       <mapping documentProperty="ref">ref/text()</mapping>
    <mapping documentProperty="description">desc/text()</mapping>
    <mapping documentProperty="amount">amount/text()</mapping>
    <mapping documentProperty="deliverydate">
             #{
                String date = currentElement.selectNodes('delivery_date/text()')[0].getText().trim();
              return Fn.parseDate(date, 'yyyy.MM.dd')
        }]]>
        </mapping>
  </attributeConfig>
</extension>