The Scan Documents Importer add-on allows to create documents from XML files located on the file system every time a dedicated event is launched. It can therefore be easily configured to import data on a regular basis.
Installation
The Scan Documents Importer addon is available as a Nuxeo Package since Nuxeo 5.9.4. Look at how to install a Nuxeo Package for further instuctions.
Due to a known bug in Nuxeo 5.9.5, some elements need to be added manually in addition to the Nuxeo Package:
- The jaxen 1.1.1 library JAR needs to be copied into the
nxserver/lib
folder. - The nuxeo-importer-jaxrs bundle (JAR) needs to be copied into the
nxserver/bundles
folder.
Configuration
A step by step example explaining the add-on configuration can be found in the Nuxeo blogs : [Monday Dev Heaven] Multi-threaded, transactional bulk import with Nuxeo
Please note that the XML can only be mapped to non-multivalued and non-complex fields. If you need this functionality, see the advanced XML parsing section.
A Java mapper class example can be found on GitHub. This allows to create a specific Nuxeo document type depending on the XML source.
Advanced XML parsing
Advanced XML parsing for complex and / or multivalued fields can be achieved by adding the following bundles into your platform (copy the jar files into the nxserver/bundles
directory):
These bundles provide you with a new service (org.nuxeo.ecm.platform.importer.xml.parser.XMLImporterComponent
) and extension points that need to be used instead of the regular ones:
documentMapping
to determine which document type should be created depending on a set of conditionsattributeMapping
to do the XML parsing and map to the corresponding metadata
A detailed documentation on the advanced XML parsing usage can be found on the nuxeo-importer-xml-parser GitHub page. To get you started, below is a working example with the original XML file and the corresponding XML configuration that can be pasted into Nuxeo Studio.
<invoice>
<order_number value="Invoice NX38937987-421-690" />
<software_source value="My accounting software" />
<supplier value="Papeterie Stylo Dépôt" />
<order_date value="2005-03-12T11:00:00.000Z" />
<planned_delivery_date value="2005-04-17" />
<total_incl_taxes value="65.90" />
<file name="order made on march 12 2005.pdf" />
<item>
<ref>373668</ref>
<desc>Pens</desc>
<amount>12.30</amount>
<delivery_date>2005.04.17</delivery_date>
</item>
<item>
<ref>737282</ref>
<desc>Poster</desc>
<amount>3.70</amount>
<delivery_date>2005.04.17</delivery_date>
</item>
<item>
<ref>029938</ref>
<desc>Glue sticks</desc>
<amount>7.75</amount>
<delivery_date>2005.04.20</delivery_date>
</item>
</invoice>
<!-- Doctype to create depending on XML formatting
In this case, having an invoice tag means I should create an Invoice document in Nuxeo -->
<extension target="org.nuxeo.ecm.platform.importer.xml.parser.XMLImporterComponent" point="documentMapping">
<docConfig tagName="invoice">
<docType>Invoice</docType>
</docConfig>
</extension>
<!-- XML to metadata mapping
In this case, my invoice schema is as follows:
order_number string
software_source string
supplier string
total_inc_taxes float
order_date date
planned_delivery_date date
items complex, multivalued
ref string
description string
amount float
deliverydate date
-->
<extension target="org.nuxeo.ecm.platform.importer.xml.parser.XMLImporterComponent" point="attributeMapping">
<attributeConfig tagName="order_number" docProperty="dc:title" xmlPath="@value"/>
<attributeConfig tagName="software_source" docProperty="dc:source" xmlPath="@value"/>
<attributeConfig tagName="supplier" docProperty="invoice:supplier" xmlPath="@value"/>
<attributeConfig tagName="total_incl_taxes" docProperty="invoice:amount" xmlPath="@value"/>
<attributeConfig tagName="order_date" docProperty="invoice:orderdate" xmlPath="@value"/>
<attributeConfig tagName="planned_delivery_date" docProperty="invoice:planneddeliverydate" xmlPath="@value"/>
<attributeConfig tagName="file" docProperty="file:content">
<mapping documentProperty="filename">@name</mapping>
<mapping documentProperty="content">@name</mapping>
</attributeConfig>
<attributeConfig tagName="item" docProperty="invoice:items">
<mapping documentProperty="ref">ref/text()</mapping>
<mapping documentProperty="description">desc/text()</mapping>
<mapping documentProperty="amount">amount/text()</mapping>
<mapping documentProperty="deliverydate">
#{
String date = currentElement.selectNodes('delivery_date/text()')[0].getText().trim();
return Fn.parseDate(date, 'yyyy.MM.dd')
}]]>
</mapping>
</attributeConfig>
</extension>