Additional Services

Conversion

Updated: July 17, 2023

The Nuxeo Platform comes with a conversion service that can be used to manage conversion of blobs from one format to an other.

This is what is used for getting a PDF file from an office document, for instance. This is also the infrastructure you should use if you want to plug an Autocad convertor or any business specific format that is not available in the built-in set of converters.

The conversion service provides:

  • A simple API for transforming blobs or blobs stored on documents through the BlobHolder interface, checking the availability of a converter, etc.
  • A caching mechanism to avoid duplicated process time if the blob to convert was previously converted in the same target format
  • A transformation chain smart logic for automatically chaining the right converters to go from one format to another
  • A set of built-in converters for managing many standard formats

Converting Blobs

Java API

The Conversion Service can be accessed via the standard Nuxeo Service lookup:

ConversionService conversionService = Framework.getService(ConversionService.class)

Synchronous Conversions

To convert a BlobHolder to a given destination mime type:

BlobHolder result = conversionService.convertToMimeType("text/plain", blobHolder, params);

params is a simple Map<String,Serializable> to pass parameters to the converter (can be null).

To use a known converter:

BlobHolder result = conversionService.convert("converterName", blobHolder, params);

Asynchronous Conversions

Since 7.10, four new methods are available on the ConversionService to schedule asynchronous conversions and retrieve the result.

To schedule a new asynchronous conversion given a converter name:

String conversionId = conversionService.scheduleConversion("converterName", blobHolder, params);

To schedule a new asynchronous conversion given a destination mime type:

String conversionId = conversionService.scheduleConversion("text/plain", blobHolder, params);

Those methods return a conversion id to be used in the following methods to get the status and result of the conversion.

To retrieve the status of a scheduled conversion:

ConversionStatus status = conversionService.getConversionStatus(conversionId);

The ConversionStatus object holds the status of an asynchronous conversion which can be SCHEDULED, RUNNING or COMPLETED.

When the status is COMPLETED, the result of the conversion can be retrieved with:

BlobHolder result = conversionService.getConversionResult(conversionId, true);

The second boolean parameter defines if the conversion result will be deleted, if true, or not. If deleted, the next call to #getConversionResult will return null.

Utility Methods

Find a converter name for a given conversion:

String converterName = conversionService.getConverterName(sourceMimeType, destinationMimeType);

Test if a converter is available:

String converterName = conversionService.getConverterName(sourceMimeType, destinationMimeType);
ConverterCheckResult checkResult = conversionService.isConverterAvailable("converterName");

This call can throw ConverterNotRegistred if the target converter does not exist at all. The ConverterCheckResult class provides:

  • A isAvailable() method
  • A getErrorMessage() method: Returns the error that occurred while doing the availability check
  • A getInstallationMessage method: Returns the installation message that was contributed by the converter contributor

Automation

A few operations exist to do synchronous conversions:

REST API

The REST API provides synchronous and asynchronous conversions.

Synchronous Conversions

Synchronous conversions are retrieved through the @convert adapter.

GET http://NUXEO_SERVER/nuxeo/api/v1/path/{docPath}/@convert?converter=any2pdf
GET http://NUXEO_SERVER/nuxeo/api/v1/path/{docPath}/@blob/file:content/@convert?converter=any2pdf

To convert using a named converter:

GET http://NUXEO_SERVER/nuxeo/api/v1/path/{docPath}/@convert?converter=any2pdf

To convert using a destination mime type:

GET http://NUXEO_SERVER/nuxeo/api/v1/path/{docPath}/@convert?type=application%2Fpdf

To convert using a format (destination extension):

GET http://NUXEO_SERVER/nuxeo/api/v1/path/{docPath}/@convert?format=pdf

All those conversions can be also used with a POST request on the @convert adapter (with async param to false).

Asynchronous Conversions

Scheduling an asynchronous conversion is done by using a POST request on the @convert adapter and setting the form param async to true, otherwise the conversion will be done synchronously.

POST http://NUXEO_SERVER/nuxeo/api/v1/path/{docPath}/@convert
POST http://NUXEO_SERVER/nuxeo/api/v1/path/{docPath}/@blob/file:content/@convert

Parameters

NameTypeDescription
converter string The converter name, such as "any2pdf".
type string The destination mime type, such as "application/pdf".
format string The destination format, such as "pdf".
async boolean true to schedule an asynchronous conversion, false other wise. Default to false.

Note that at least one of the parameters convertertype or format must be set.

This POST returns a HTTP code 202 with the following data:

{
  "entity-type": "conversionScheduled",
  "conversionId": "id",
  "pollingURL": "http://localhost:8080/nuxeo/api/v1/conversion/id/poll"
}

The pollingURL is used to get the status of a scheduled conversion, it's part of the new conversion endpoint.

GET http://http://NUXEO_SERVER/nuxeo/api/v1/conversion/id/poll

For a conversion not yet completed, it returns a HTTP code 200 with the following data:

{
  "entity-type": "conversionStatus",
  "conversionId": "id",
  "status": "running" // scheduled, completed
}

For a conversion completed, it returns a HTTP code 303 with the result URL in the Location header.

To retrieve a conversion result:

GET http://NUXEO_SERVER/nuxeo/api/v1/conversion/id/result

Returns the result of the conversion, or HTTP code 404 if there is no conversion matching the id or if there is no result yet (conversion not completed).

Configuration and Contributions

Configuration

The Conversion Service supports a global configuration via XML file in order to configure caching.

<component name="org.nuxeo.ecm.core.convert.config">
  <extension target="org.nuxeo.ecm.core.convert.service.ConversionServiceImpl" point="configuration">
    <configuration>
      <!-- define directory location for caching : default to java default tmp dir (java.io.tmpdir) -->
      <cachingDirectory>/var/ConversionCache</cachingDirectory>
      <!-- GC interval in minutes (default = 10 minutes ) -->
      <gcInterval>10</gcInterval>
      <!-- maximum size for disk cache in KB (default to 10*1024) -->
      <diskCacheSize>1024</diskCacheSize>
      <!-- Enables or disables caching (default = true)-->
      <enableCache>true</enableCache>
    </configuration>
  </extension>
</component>

Contributions

Simple Converter

To contribute a new converter, you have to contribute a class that implement the org.nuxeo.ecm.core.convert.extension.Converter interface. This class will be associated to:

  • A converter name
  • A list of source mime-types
  • One destination mime-type
  • Optional named parameters
<extension target="org.nuxeo.ecm.core.convert.service.ConversionServiceImpl" point="converter">
  <converter name="html2text" class="org.nuxeo.ecm.core.convert.plugins.text.extractors.Html2TextConverter">
    <sourceMimeType>text/html</sourceMimeType>
    <sourceMimeType>text/xhtml</sourceMimeType>
    <destinationMimeType>text/plain</destinationMimeType>
    <parameters>
      <parameter name="myParam">myValue</parameter>
    </parameters>
  </converter>
</extension>

See list of built-in contributions.

Chained Converters

You can also contribute a converter that is a chain of existing converters. To to this, the contributed converter does not have to define an implementation class, just a chain of either converters or mime-types. If mime-types are used, the conversion service will automatically guess the converter chain from the mime-types steps.

<extension target="org.nuxeo.ecm.core.convert.service.ConversionServiceImpl" point="converter">
  <!-- explicit chain of 2 converters : converter1 + converter2 -->
  <converter name="chainedConverter" >
    <sourceMimeType>some/mimetype</sourceMimeType>
    <destinationMimeType>some/other-mimetype</destinationMimeType>
    <conversionSteps>
      <subconverter>converter1</subconverter>
      <subconverter>converter2</subconverter>
    </conversionSteps>
  </converter>
  <!-- define chain via mime types : foo/bar1 => foo/bar2 => foo/bar3 -->
  <converter name="chainedMimeType" >
    <sourceMimeType>foo/bar1</sourceMimeType>
    <destinationMimeType>foo/bar3</destinationMimeType>
    <conversionSteps>
      <step>foo/bar2</step>
    </conversionSteps>
  </converter>
</extension>

When using chained converters, the additional optional parameters are passed to each underlying converter.

Converter based on external tools (such as command line or OpenOffice server based) can implement the ExternalConverter interface. This interfaces adds a isConverterAvailable() method that will be called in order to check converter availability.

Converters Based on External Command Tools

A lot of conversion tools come as command line executables. So in some case it is interesting to wraps these command lines into a converter.

For that purpose, we provide a base class for converters that are based on a command line wrapped by the Nuxeo command-line service.

The base class org.nuxeo.ecm.platform.convert.plugins.CommandLineBasedConverter handles all the dirty work, and you only have to override the methods to define the parameters of the command line and the parsing of the output.

<extension target="org.nuxeo.ecm.core.convert.service.ConversionServiceImpl" point="converter">
  <!-- converter based on the pdftohml command line -->
  <converter name="pdf2html" class="org.nuxeo.ecm.platform.convert.plugins.PDF2HtmlConverter">
    <sourceMimeType>application/pdf</sourceMimeType>
    <destinationMimeType>text/html</destinationMimeType>
    <parameters>
      <parameter name="CommandLineName">pdftohtml</parameter>
    </parameters>
  </converter>
</extension>