Nuxeo Add-Ons

Scan Documents Importer

Updated: March 18, 2024

The Scan Documents Importer addon allows to create documents from XML files located on the file system every time a dedicated event is launched. It can therefore be easily configured to import data on a regular basis.

Installation

This addon requires no specific installation steps. It can be installed like any other package with nuxeoctl command line or from the Update Center.

Configuration

A step by step example explaining the addon configuration can be found in the Nuxeo blogs : [Monday Dev Heaven] Multi-threaded, transactional bulk import with Nuxeo

Please note that the XML can only be mapped to non-multivalued and non-complex fields. If you need this functionality, see the advanced XML parsing section.

A Java mapper class example can be found on GitHub. This allows to create a specific Nuxeo document type depending on the XML source.

Advanced XML Parsing

Advanced XML parsing for complex and / or multivalued fields can be achieved by adding the following bundles into your platform (copy the jar files into the nxserver/bundles directory):

These bundles provide you with a new service (org.nuxeo.ecm.platform.importer.xml.parser.XMLImporterComponent) and extension points that need to be used instead of the regular ones:

  1. documentMapping to determine which document type should be created depending on a set of conditions

  2. attributeMapping to do the XML parsing and map to the corresponding metadata

A detailed documentation on the advanced XML parsing usage can be found on the nuxeo-importer-xml-parser GitHub page. To get you started, below is a working example with the original XML file and the corresponding XML configuration that can be pasted into Nuxeo Studio.

Original XML file

<invoice> <order_number value="Invoice NX38937987-421-690" /> <software_source value="My accounting software" /> <supplier value="Papeterie Stylo Dépôt" /> <order_date value="2005-03-12T11:00:00.000Z" /> <planned_delivery_date value="2005-04-17" /> <total_incl_taxes value="65.90" /> <file name="order made on march 12 2005.pdf" /> <item> <ref>373668</ref> <desc>Pens</desc> <amount>12.30</amount> <delivery_date>2005.04.17</delivery_date> </item> <item> <ref>737282</ref> <desc>Poster</desc> <amount>3.70</amount> <delivery_date>2005.04.17</delivery_date> </item> <item> <ref>029938</ref> <desc>Glue sticks</desc> <amount>7.75</amount> <delivery_date>2005.04.20</delivery_date> </item> </invoice>

Corresponding XML extension into Nuxeo Studio

<!-- Doctype to create depending on XML formatting In this case, having an invoice tag means I should create an Invoice document in Nuxeo --> <extension target="org.nuxeo.ecm.platform.importer.xml.parser.XMLImporterComponent" point="documentMapping"> <docConfig tagName="invoice"> <docType>Invoice</docType> </docConfig> </extension> <!-- XML to metadata mapping In this case, my invoice schema is as follows: order_number string software_source string supplier string total_inc_taxes float order_date date planned_delivery_date date items complex, multivalued ref string description string amount float deliverydate date --> <extension target="org.nuxeo.ecm.platform.importer.xml.parser.XMLImporterComponent" point="attributeMapping"> <attributeConfig tagName="order_number" docProperty="dc:title" xmlPath="@value"/> <attributeConfig tagName="software_source" docProperty="dc:source" xmlPath="@value"/> <attributeConfig tagName="supplier" docProperty="invoice:supplier" xmlPath="@value"/> <attributeConfig tagName="total_incl_taxes" docProperty="invoice:amount" xmlPath="@value"/> <attributeConfig tagName="order_date" docProperty="invoice:orderdate" xmlPath="@value"/> <attributeConfig tagName="planned_delivery_date" docProperty="invoice:planneddeliverydate" xmlPath="@value"/> <attributeConfig tagName="file" docProperty="file:content"> <mapping documentProperty="filename">@name</mapping> <mapping documentProperty="content">@name</mapping> </attributeConfig> <attributeConfig tagName="item" docProperty="invoice:items"> <mapping documentProperty="ref">ref/text()</mapping> <mapping documentProperty="description">desc/text()</mapping> <mapping documentProperty="amount">amount/text()</mapping> <mapping documentProperty="deliverydate"> #{ String date = currentElement.selectNodes('delivery_date/text()')[0].getText().trim(); return Fn.parseDate(date, 'yyyy.MM.dd') }]]> </mapping> </attributeConfig> </extension>