Nuxeo Server

Binary Metadata

Updated: March 18, 2024

How It Works

A Nuxeo listener watches for document creation/modification and triggers metadata mapping in the following conditions:

  • On document creation, if the attached binary is not empty the listener reads the metadata and updates the document.
  • On document modification:
    • If the attached binary is dirty and the document metadata are not dirty, the listener reads the metadata from attached binary to document.
    • If the attached binary is dirty and the document metadata are dirty, the listener writes the metadata from the document to the attached binary.
    • If the attached binary is not dirty and the document metadata are dirty, the listener writes the metadata from the document to the attached binary.

You can contribute your own metadata mapping and choose to have it applied with the same rules and / or through Nuxeo Automation operations.

By default the Nuxeo Platform uses ExifTool which supports many different data formats including EXIFGPSIPTCXMP. You can refer to its documentation for further details and a complete list of formats. Other processors can be added if needed.

Contributing Metadata Mappings

Metadata mapping is made through an XML contribution on the metadataMappings extension point:

   <!-- Map binary metadata to Nuxeo document metadata -->

  <extension target="org.nuxeo.binary.metadata"
             point="metadataMappings">
    <!-- Define "processor" to use and specify the attached binary's xpath ("blobXPath") -->
    <!-- Technical "id" should be unique  -->
    <!-- "ignorePrefix" is by default set to true. Here metadata have prefixes, so set it to false. -->
    <metadataMapping id="Example" processor="exifTool" blobXPath="file:content" ignorePrefix="false">
      <!-- "name" = binary metadata  , "xpath" = document metadata -->
      <!-- See PDF metadata extraction example in this page -->
      <metadata name="PDF:Producer" xpath="dc:title"/>
      <metadata name="PDF:Author" xpath="dc:description"/>
    </metadataMapping>
 </extension>

Contributing Metadata Rules

This part is only needed if you plan to use your metadata mapping with the standard listener.

Metadata rules are defined through an XML contribution on the metadataRules extension point:

 <!-- Define which mappings will be called by the listener, and under which conditions -->
 <extension target="org.nuxeo.binary.metadata"
             point="metadataRules">
   <!-- "order" = priority , "async" = listener mode (set "true" to apply mapping as background work) -->
   <!-- Technical "id" should be unique  -->
   <rule id="default" order="0" enabled="true" async="false">
      <metadataMappings>
        <metadataMapping-id>Example</metadataMapping-id>
        <metadataMapping-id>...</metadataMapping-id>
      </metadataMappings>
      <!-- see the link below for filter contributions -->
      <filters>
        <filter-id>hasFileType</filter-id>
        <filter-id>...</filter-id>
      </filters>
    </rule>
  </extension>

  <extension target="org.nuxeo.ecm.platform.actions.ActionService"
             point="filters">
    <filter id="hasFileType">
      <rule grant="true">
        <type>File</type>
      </rule>
    </filter>
  </extension>

Filters contribution documentation.

Default Operations

  • Document.SetMetadataFromBlob: To write metadata to a Document from a binary according to a contributed metadata mapping.
  • Blob.SetMetadataFromDocument: To write metadata to a Blob (xpath parameter, or BlobHolder if empty) from a document (input) given a custom metadata mapping defined in a Properties parameter (xpath=metadataName), using a named processor (exifTool for instance).
  • Blob.SetMetadataFromContext: To write metadata to a Blob from Context using a named processor (exifTool for instance) and given metadata, and return the updated Blob.
  • Context.SetMetadataFromBlob: To read metadata from a Blob (input) given a custom list of metadata defined (or optional, to get all metadata in result of ExifTool) in a StringList parameter (metadataName1, metadataName2, ...), using a named processor (exifTool for instance), and put the result (a Map) in the Context.
  • Blob.ReadMetadata: To return Map of all binary properties in input.

Contributing a New Processor

The Nuxeo default contribution for binary metadata processor is ExifTool:

<extension target="org.nuxeo.binary.metadata"
             point="metadataProcessors">
    <processor id="exifTool"
               class="org.nuxeo.binary.metadata.internals.ExifToolProcessor"/>
  </extension>

If you need to add a new processor:

  1. Declare a new contribution with specific id and class.

    <extension target="org.nuxeo.binary.metadata"
                 point="metadataProcessors">
        <processor id="myProcessor"
                   class="org.mycompany.my.MyProcessorClazz"/>
      </extension>
    
  2. Extend org.nuxeo.binary.metadata.api.BinaryMetadataProcessor and implement the following methods:

    /**
         * Write given metadata into given blob. Since 7.3 ignorePrefix is added.
         *
         * @param blob Blob to write.
         * @param metadata Metadata to inject.
         * @param ignorePrefix
         * @return the updated blob, or {@code null} if there was an error (since 7.4)
         */
        public Blob writeMetadata(Blob blob, Map<String, Object> metadata, boolean ignorePrefix);
        /**
         * Read from a given blob given metadata map. Since 7.3 ignorePrefix is added.
         *
         * @param blob Blob to read.
         * @param metadata Metadata to extract.
         * @param ignorePrefix
         * @return Metadata map.
         */
        public Map<String, Object> readMetadata(Blob blob, List<String> metadata, boolean ignorePrefix);
        /**
         * Read all metadata from a given blob. Since 7.3 ignorePrefix is added.
         *
         * @param blob Blob to read.
         * @param ignorePrefix
         * @return Metadata map.
         */
        public Map<String, Object> readMetadata(Blob blob, boolean ignorePrefix);
    

    Here is the ExifTool example org.nuxeo.binary.metadata.internals.ExifToolProcessor and the command line documentation to execute third command lines from the Nuxeo Platform.

ExifTool Extraction Example

Metadata extraction example from a PDF file using ExifTool:

> exiftool -G -json hello.pdf
[{
  "SourceFile": "hello.pdf",
  "ExifTool:ExifToolVersion": 9.76,
  "File:FileName": "hello.pdf",
  "File:Directory": ".",
  "File:FileSize": "30 kB",
  "File:FileModifyDate": "2015:01:05 14:57:19+01:00",
  "File:FileAccessDate": "2015:01:05 17:02:43+01:00",
  "File:FileInodeChangeDate": "2015:01:05 14:57:19+01:00",
  "File:FilePermissions": "rwxr-xr-x",
  "File:FileType": "PDF",
  "File:MIMEType": "application/pdf",
  "PDF:PDFVersion": 1.4,
  "PDF:Linearized": "No",
  "PDF:PageCount": 1,
  "PDF:Language": "en-US",
  "PDF:Author": "John Doe",
  "PDF:Creator": "Writer",
  "PDF:Producer": "OpenOffice.org 3.2",
  "PDF:CreateDate": "2010:10:26 15:48:33+02:00"
}]

Metrics

Since 7.2, Metrics have been added to Binary Metadata services to monitor default/custom processor performances with Nuxeo.

To activate it, the following variable in nuxeo.conf must be set:

binary.metadata.monitor.enable=true

Or log4j level to TRACE for org.nuxeo.binary.metadata.internals.BinaryMetadataComponent must be set.

This feature gives the ability to get time execution informations through JMX: org.nuxeo.StopWatch.

Default Contribution

  • IPTC schema has been removed from document type Picture
  • Only IPTC:Source, IPTC:CopyrightNoticeIPTC:Description are stored respectively into dc:sourcedc:rights and dc:description.
  • Widget summary_picture_iptc has been removed from document summary
  • Mistral engine is removed from metadata extraction of the Nuxeo Platform
  • EXIF mapping remains identical

Here is the default metadata mapping contribution in the Nuxeo Platform:

Default Contribution

<extension target="org.nuxeo.binary.metadata"
 point="metadataMappings">
  <metadataMapping id="EXIF" processor="exifTool" blobXPath="file:content" ignorePrefix="false">
    <metadata name="EXIF:ImageDescription" xpath="imd:image_description"/>
    <metadata name="EXIF:UserComment" xpath="imd:user_comment"/>
    <metadata name="EXIF:Equipment" xpath="imd:equipment"/>
    <metadata name="EXIF:DateTimeOriginal" xpath="imd:date_time_original"/>
    <metadata name="EXIF:XResolution" xpath="imd:xresolution"/>
    <metadata name="EXIF:YResolution" xpath="imd:yresolution"/>
    <metadata name="EXIF:PixelXDimension" xpath="imd:pixel_xdimension"/>
    <metadata name="EXIF:PixelYDimension" xpath="imd:pixel_ydimension"/>
    <metadata name="EXIF:Copyright" xpath="imd:copyright"/>
    <metadata name="EXIF:ExposureTime" xpath="imd:exposure_time"/>
    <metadata name="EXIF:ISO" xpath="imd:iso_speed_ratings"/>
    <metadata name="EXIF:FocalLength" xpath="imd:focalLength"/>
    <metadata name="EXIF:ColorSpace" xpath="imd:color_space"/>
    <metadata name="EXIF:WhiteBalance" xpath="imd:white_balance"/>
    <metadata name="EXIF:IccProfile" xpath="imd:icc_profile"/>
    <metadata name="EXIF:Orientation" xpath="imd:orientation"/>
    <metadata name="EXIF:FNumber" xpath="imd:fnumber"/>
  </metadataMapping>
  <metadataMapping id="IPTC" processor="exifTool" blobXPath="file:content" ignorePrefix="false">
    <metadata name="IPTC:Source" xpath="dc:source"/>
    <metadata name="IPTC:CopyrightNotice" xpath="dc:rights"/>
    <metadata name="IPTC:Description" xpath="dc:description"/>
  </metadataMapping>
</extension>
<extension target="org.nuxeo.binary.metadata"
 point="metadataRules">
  <rule id="iptc" order="0" enabled="true" async="false">
    <metadataMappings>
      <metadataMapping-id>EXIF</metadataMapping-id>
      <metadataMapping-id>IPTC</metadataMapping-id>
    </metadataMappings>
    <filters>
      <filter-id>hasPictureType</filter-id>
    </filters>
  </rule>
</extension>
<extension target="org.nuxeo.ecm.platform.actions.ActionService"
 point="filters">
  <filter id="hasPictureType">
    <rule grant="true">
      <type>Picture</type>
    </rule>
  </filter>
</extension>