Nuxeo Server

Binary Metadata

Updated: October 11, 2017 Page Information Edit on GitHub

The Nuxeo Platform enables to extract information from the uploaded files attached to a document and automatically fill in the document metadata at creation time. This enables you to leverage metadata existing outside the Nuxeo Platform to automatically categorize documents, preventing users from editing document to report these metadata. Automated metadata extraction is activated by default on Nuxeo DAM: The IPTC legend, copyright and source are used to automatically fill in the description, rights and source metadata of pictures.

How It Works

A Nuxeo listener watches for document creation/modification and triggers metadata mapping in the following conditions:

  • On document creation, if the attached binary is not empty the listener reads the metadata and updates the document.
  • On document modification:
    • If the attached binary is dirty and the document metadata are not dirty, the listener reads the metadata from attached binary to document.
    • If the attached binary is dirty and the document metadata are dirty, the listener writes the metadata from the document to the attached binary.
    • If the attached binary is not dirty and the document metadata are dirty, the listener writes the metadata from the document to the attached binary.

You can contribute your own metadata mapping and choose to have it applied with the same rules and / or through Nuxeo Automation operations.

By default the Nuxeo Platform uses ExifTool which supports many different data formats including EXIFGPSIPTCXMP. You can refer to its documentation for further details and a complete list of formats. Other processors can be added if needed.

Contributing Metadata Mappings

Metadata mapping is made through an XML contribution on the metadataMappings extension point:

   <!-- Map binary metadata to Nuxeo document metadata -->

  <extension target="org.nuxeo.binary.metadata"
             point="metadataMappings">
    <!-- Define "processor" to use and specify the attached binary's xpath ("blobXPath") -->
    <!-- Technical "id" should be unique  -->
    <!-- "ignorePrefix" is by default set to true. Here metadata have prefixes, so set it to false. -->
    <metadataMapping id="Example" processor="exifTool" blobXPath="file:content" ignorePrefix="false">
      <!-- "name" = binary metadata  , "xpath" = document metadata -->
      <!-- See PDF metadata extraction example in this page -->
      <metadata name="PDF:Producer" xpath="dc:title"/>
      <metadata name="PDF:Author" xpath="dc:description"/>
    </metadataMapping>
 </extension>

Contributing Metadata Rules

This part is only needed if you plan to use your metadata mapping with the standard listener.

Metadata rules are defined through an XML contribution on the metadataRules extension point:

 <!-- Define which mappings will be called by the listener, and under which conditions -->
 <extension target="org.nuxeo.binary.metadata"
             point="metadataRules">
   <!-- "order" = priority , "async" = listener mode (set "true" to apply mapping as background work) -->
   <!-- Technical "id" should be unique  -->
   <rule id="default" order="0" enabled="true" async="false">
      <metadataMappings>
        <metadataMapping-id>Example</metadataMapping-id>
        <metadataMapping-id>...</metadataMapping-id>
      </metadataMappings>
      <!-- see the link below for filter contributions -->
      <filters>
        <filter-id>hasFileType</filter-id>
        <filter-id>...</filter-id>
      </filters>
    </rule>
  </extension>

  <extension target="org.nuxeo.ecm.platform.actions.ActionService"
             point="filters">
    <filter id="hasFileType">
      <rule grant="true">
        <type>File</type>
      </rule>
    </filter>
  </extension>

Filters contribution documentation.

Default Operations

  • Document.SetMetadataFromBlob: To write metadata to a Document from a binary according to a contributed metadata mapping.
  • Blob.SetMetadataFromDocument: To write metadata to a Blob (xpath parameter, or BlobHolder if empty) from a document (input) given a custom metadata mapping defined in a Properties parameter (xpath=metadataName), using a named processor (exifTool for instance).
  • Blob.SetMetadataFromContext: To write metadata to a Blob from Context using a named processor (exifTool for instance) and given metadata, and return the updated Blob.
  • Context.SetMetadataFromBlob: To read metadata from a Blob (input) given a custom list of metadata defined (or optional, to get all metadata in result of ExifTool) in a StringList parameter (metadataName1, metadataName2, ...), using a named processor (exifTool for instance), and put the result (a Map) in the Context.
  • Blob.ReadMetadata: To return Map of all binary properties in input.

Contributing a New Processor

The Nuxeo default contribution for binary metadata processor is ExifTool:

<extension target="org.nuxeo.binary.metadata"
             point="metadataProcessors">
    <processor id="exifTool"
               class="org.nuxeo.binary.metadata.internals.ExifToolProcessor"/>
  </extension>

If you need to add a new processor:

  1. Declare a new contribution with specific id and class.

    <extension target="org.nuxeo.binary.metadata"
                 point="metadataProcessors">
        <processor id="myProcessor"
                   class="org.mycompany.my.MyProcessorClazz"/>
      </extension>
    
  2. Extend org.nuxeo.binary.metadata.api.BinaryMetadataProcessor and implement the following methods:

    /**
         * Write given metadata into given blob. Since 7.3 ignorePrefix is added.
         *
         * @param blob Blob to write.
         * @param metadata Metadata to inject.
         * @param ignorePrefix
         * @return the updated blob, or {@code null} if there was an error (since 7.4)
         */
        public Blob writeMetadata(Blob blob, Map<String, Object> metadata, boolean ignorePrefix);
        /**
         * Read from a given blob given metadata map. Since 7.3 ignorePrefix is added.
         *
         * @param blob Blob to read.
         * @param metadata Metadata to extract.
         * @param ignorePrefix
         * @return Metadata map.
         */
        public Map<String, Object> readMetadata(Blob blob, List<String> metadata, boolean ignorePrefix);
        /**
         * Read all metadata from a given blob. Since 7.3 ignorePrefix is added.
         *
         * @param blob Blob to read.
         * @param ignorePrefix
         * @return Metadata map.
         */
        public Map<String, Object> readMetadata(Blob blob, boolean ignorePrefix);
    

    Here is the ExifTool example org.nuxeo.binary.metadata.internals.ExifToolProcessor and the command line documentation to execute third command lines from the Nuxeo Platform.

ExifTool Extraction Example

Metadata extraction example from a PDF file using ExifTool:

> exiftool -G -json hello.pdf
[{
  "SourceFile": "hello.pdf",
  "ExifTool:ExifToolVersion": 9.76,
  "File:FileName": "hello.pdf",
  "File:Directory": ".",
  "File:FileSize": "30 kB",
  "File:FileModifyDate": "2015:01:05 14:57:19+01:00",
  "File:FileAccessDate": "2015:01:05 17:02:43+01:00",
  "File:FileInodeChangeDate": "2015:01:05 14:57:19+01:00",
  "File:FilePermissions": "rwxr-xr-x",
  "File:FileType": "PDF",
  "File:MIMEType": "application/pdf",
  "PDF:PDFVersion": 1.4,
  "PDF:Linearized": "No",
  "PDF:PageCount": 1,
  "PDF:Language": "en-US",
  "PDF:Author": "John Doe",
  "PDF:Creator": "Writer",
  "PDF:Producer": "OpenOffice.org 3.2",
  "PDF:CreateDate": "2010:10:26 15:48:33+02:00"
}]

Metrics

Since 7.2, Metrics have been added to Binary Metadata services to monitor default/custom processor performances with Nuxeo.

To activate it, the following variable in nuxeo.conf must be set:

binary.metadata.monitor.enable=true

Or log4j level to TRACE for org.nuxeo.binary.metadata.internals.BinaryMetadataComponent must be set.

This feature gives the ability to get time execution informations through JMX: org.nuxeo.StopWatch.

Default Contribution

  • IPTC schema has been removed from document type Picture
  • Only IPTC:Source, IPTC:CopyrightNoticeIPTC:Description are stored respectively into dc:sourcedc:rights and dc:description.
  • Widget summary_picture_iptc has been removed from document summary
  • Mistral engine is removed from metadata extraction of the Nuxeo Platform
  • EXIF mapping remains identical

Here is the default metadata mapping contribution in the Nuxeo Platform:

Default Contribution

<extension target="org.nuxeo.binary.metadata"
 point="metadataMappings">
  <metadataMapping id="EXIF" processor="exifTool" blobXPath="file:content" ignorePrefix="false">
    <metadata name="EXIF:ImageDescription" xpath="imd:image_description"/>
    <metadata name="EXIF:UserComment" xpath="imd:user_comment"/>
    <metadata name="EXIF:Equipment" xpath="imd:equipment"/>
    <metadata name="EXIF:DateTimeOriginal" xpath="imd:date_time_original"/>
    <metadata name="EXIF:XResolution" xpath="imd:xresolution"/>
    <metadata name="EXIF:YResolution" xpath="imd:yresolution"/>
    <metadata name="EXIF:PixelXDimension" xpath="imd:pixel_xdimension"/>
    <metadata name="EXIF:PixelYDimension" xpath="imd:pixel_ydimension"/>
    <metadata name="EXIF:Copyright" xpath="imd:copyright"/>
    <metadata name="EXIF:ExposureTime" xpath="imd:exposure_time"/>
    <metadata name="EXIF:ISO" xpath="imd:iso_speed_ratings"/>
    <metadata name="EXIF:FocalLength" xpath="imd:focalLength"/>
    <metadata name="EXIF:ColorSpace" xpath="imd:color_space"/>
    <metadata name="EXIF:WhiteBalance" xpath="imd:white_balance"/>
    <metadata name="EXIF:IccProfile" xpath="imd:icc_profile"/>
    <metadata name="EXIF:Orientation" xpath="imd:orientation"/>
    <metadata name="EXIF:FNumber" xpath="imd:fnumber"/>
  </metadataMapping>
  <metadataMapping id="IPTC" processor="exifTool" blobXPath="file:content" ignorePrefix="false">
    <metadata name="IPTC:Source" xpath="dc:source"/>
    <metadata name="IPTC:CopyrightNotice" xpath="dc:rights"/>
    <metadata name="IPTC:Description" xpath="dc:description"/>
  </metadataMapping>
</extension>
<extension target="org.nuxeo.binary.metadata"
 point="metadataRules">
  <rule id="iptc" order="0" enabled="true" async="false">
    <metadataMappings>
      <metadataMapping-id>EXIF</metadataMapping-id>
      <metadataMapping-id>IPTC</metadataMapping-id>
    </metadataMappings>
    <filters>
      <filter-id>hasPictureType</filter-id>
    </filters>
  </rule>
</extension>
<extension target="org.nuxeo.ecm.platform.actions.ActionService"
 point="filters">
  <filter id="hasPictureType">
    <rule grant="true">
      <type>Picture</type>
    </rule>
  </filter>
</extension>


 
 
2 months ago Manon Lumeau fix-NXDOC-1333-drive-linux
a year ago Manon Lumeau 55
2 years ago Manon Lumeau 54 | javadoc links updated
2 years ago Solen Guitter 53
2 years ago Andre Justo 52 | updated BinaryMetadataProcessor method signatures
2 years ago Manon Lumeau 51
2 years ago Manon Lumeau 49
2 years ago Manon Lumeau 50
3 years ago Vladimir Pasquier 48
3 years ago Vladimir Pasquier 47
3 years ago Solen Guitter 46 | Add links to explorer and javadoc
3 years ago Vladimir Pasquier 45
3 years ago Vladimir Pasquier 44
3 years ago Solen Guitter 43 | format
3 years ago Vladimir Pasquier 42
3 years ago Vladimir Pasquier 41
3 years ago Vladimir Pasquier 40 | Add default metadata mapping contribution
3 years ago Vladimir Pasquier 39
3 years ago Vladimir Pasquier 38
3 years ago Bob Canaway 37
3 years ago Vladimir Pasquier 36
3 years ago Solen Guitter 35 | Title capitalization
3 years ago Vladimir Pasquier 34
3 years ago Vladimir Pasquier 33
3 years ago Bertrand Chauvin 32
3 years ago Vladimir Pasquier 31
3 years ago Vladimir Pasquier 30
3 years ago Vladimir Pasquier 29
3 years ago Vladimir Pasquier 28
3 years ago Vladimir Pasquier 27
3 years ago Vladimir Pasquier 25
3 years ago Vladimir Pasquier 26
3 years ago Bertrand Chauvin 24
3 years ago Vladimir Pasquier 23
3 years ago Vladimir Pasquier 22
3 years ago Bertrand Chauvin 21
3 years ago Vladimir Pasquier 20
3 years ago Vladimir Pasquier 19
3 years ago Vladimir Pasquier 18
3 years ago Vladimir Pasquier 17
3 years ago Vladimir Pasquier 15
3 years ago Vladimir Pasquier 16
3 years ago Solen Guitter 14 | format
3 years ago Vladimir Pasquier 13
3 years ago Vladimir Pasquier 12
3 years ago Vladimir Pasquier 11
3 years ago Vladimir Pasquier 10
3 years ago Vladimir Pasquier 9
3 years ago Vladimir Pasquier 8
3 years ago Vladimir Pasquier 7
3 years ago Vladimir Pasquier 6
3 years ago Vladimir Pasquier 5
3 years ago Vladimir Pasquier 4
3 years ago Vladimir Pasquier 3
3 years ago Vladimir Pasquier 2
3 years ago Vladimir Pasquier 1
History: Created by Vladimir Pasquier