How It Works
A Nuxeo listener watches for document creation/modification and triggers metadata mapping in the following conditions:
- On document creation, if the attached binary is not empty the listener reads the metadata and updates the document.
- On document modification:
- If the attached binary is dirty and the document metadata are not dirty, the listener reads the metadata from attached binary to document.
- If the attached binary is dirty and the document metadata are dirty, the listener writes the metadata from the document to the attached binary.
- If the attached binary is not dirty and the document metadata are dirty, the listener writes the metadata from the document to the attached binary.
You can contribute your own metadata mapping and choose to have it applied with the same rules and / or through Nuxeo Automation operations.
By default the Nuxeo Platform uses ExifTool which supports many different data formats including EXIF, GPS, IPTC, XMP. You can refer to its documentation for further details and a complete list of formats. Other processors can be added if needed.
Contributing Metadata Mappings
Metadata mapping is made through an XML contribution on the metadataMappings
extension point:
<!-- Map binary metadata to Nuxeo document metadata -->
<extension target="org.nuxeo.binary.metadata"
point="metadataMappings">
<!-- Define "processor" to use and specify the attached binary's xpath ("blobXPath") -->
<!-- Technical "id" should be unique -->
<!-- "ignorePrefix" is by default set to true. Here metadata have prefixes, so set it to false. -->
<metadataMapping id="Example" processor="exifTool" blobXPath="file:content" ignorePrefix="false">
<!-- "name" = binary metadata , "xpath" = document metadata -->
<!-- See PDF metadata extraction example in this page -->
<metadata name="PDF:Producer" xpath="dc:title"/>
<metadata name="PDF:Author" xpath="dc:description"/>
</metadataMapping>
</extension>
Contributing Metadata Rules
This part is only needed if you plan to use your metadata mapping with the standard listener.
Metadata rules are defined through an XML contribution on the metadataRules
extension point:
<!-- Define which mappings will be called by the listener, and under which conditions -->
<extension target="org.nuxeo.binary.metadata"
point="metadataRules">
<!-- "order" = priority , "async" = listener mode (set "true" to apply mapping as background work) -->
<!-- Technical "id" should be unique -->
<rule id="default" order="0" enabled="true" async="false">
<metadataMappings>
<metadataMapping-id>Example</metadataMapping-id>
<metadataMapping-id>...</metadataMapping-id>
</metadataMappings>
<!-- see the link below for filter contributions -->
<filters>
<filter-id>hasFileType</filter-id>
<filter-id>...</filter-id>
</filters>
</rule>
</extension>
<extension target="org.nuxeo.ecm.platform.actions.ActionService"
point="filters">
<filter id="hasFileType">
<rule grant="true">
<type>File</type>
</rule>
</filter>
</extension>
Filters contribution documentation.
Default Operations
Document.TriggerMetadataMapping
: To trigger a contributed metadataMapping based on its name on a documentBinary.WriteMetadataFromDocument
: To write metadata to a Blob (xpath parameter, or BlobHolder if empty) from a document (input) given a custom metadata mapping defined in a Properties parameter (xpath=metadataName), using a named processor (exifTool for instance).Binary.WriteMetadataFromContext
: To write metadata to a Blob (input) from a custom metadata mapping defined in a Properties parameter (metadataName=value), using a named processor (exifTool for instance).Context.ReadMetadataFromBinary
: To read metadata from a Blob (input) given a custom list of metadata defined (or optional, to get all metadata in result of ExifTool) in a StringList parameter (metadataName1, metadataName2, ...), using a named processor (exifTool for instance), and put the result (a Map) in the Context.Binary.ReadMetadata
: To return Map of all binary properties in input.
Contributing a New Processor
The Nuxeo default contribution for binary metadata processor is ExifTool:
<extension target="org.nuxeo.binary.metadata"
point="metadataProcessors">
<processor id="exifTool"
class="org.nuxeo.binary.metadata.internals.ExifToolProcessor"/>
</extension>
If you need to add a new processor:
Declare a new contribution with specific id and class.
<extension target="org.nuxeo.binary.metadata" point="metadataProcessors"> <processor id="myProcessor" class="org.mycompany.my.MyProcessorClazz"/> </extension>
Extend
org.nuxeo.binary.metadata.api.BinaryMetadataProcessor
and implement the following methods:/** * Write given metadata into given blob. Since 7.3 ignorePrefix is added. * * @param blob Blob to write. * @param metadata Metadata to inject. * @param ignorePrefix * @return the updated blob, or {@code null} if there was an error (since 7.4) */ public Blob writeMetadata(Blob blob, Map<String, String> metadata, boolean ignorePrefix); /** * Read from a given blob given metadata map. Since 7.3 ignorePrefix is added. * * @param blob Blob to read. * @param metadata Metadata to extract. * @param ignorePrefix * @return Metadata map. */ public Map<String, Object> readMetadata(Blob blob, List<String> metadata, boolean ignorePrefix); /** * Read all metadata from a given blob. Since 7.3 ignorePrefix is added. * * @param blob Blob to read. * @param ignorePrefix * @return Metadata map. */ public Map<String, Object> readMetadata(Blob blob, boolean ignorePrefix);
Here is the ExifTool example
org.nuxeo.binary.metadata.internals.ExifToolProcessor
and the command line documentation to execute third command lines from the Nuxeo Platform.
ExifTool Extraction Example
Metadata extraction example from a PDF file using ExifTool:
> exiftool -G -json hello.pdf
[{
"SourceFile": "hello.pdf",
"ExifTool:ExifToolVersion": 9.76,
"File:FileName": "hello.pdf",
"File:Directory": ".",
"File:FileSize": "30 kB",
"File:FileModifyDate": "2015:01:05 14:57:19+01:00",
"File:FileAccessDate": "2015:01:05 17:02:43+01:00",
"File:FileInodeChangeDate": "2015:01:05 14:57:19+01:00",
"File:FilePermissions": "rwxr-xr-x",
"File:FileType": "PDF",
"File:MIMEType": "application/pdf",
"PDF:PDFVersion": 1.4,
"PDF:Linearized": "No",
"PDF:PageCount": 1,
"PDF:Language": "en-US",
"PDF:Author": "John Doe",
"PDF:Creator": "Writer",
"PDF:Producer": "OpenOffice.org 3.2",
"PDF:CreateDate": "2010:10:26 15:48:33+02:00"
}]
Metrics
Since 7.2, Metrics have been added to Binary Metadata services to monitor default/custom processor performances with Nuxeo.
To activate it, the following variable in nuxeo.conf must be set:
binary.metadata.monitor.enable=true
Or log4j level to TRACE for org.nuxeo.binary.metadata.internals.BinaryMetadataComponent
must be set.
This feature gives the ability to get time execution informations through JMX: org.nuxeo.StopWatch
.
Default Contribution
- IPTC schema has been removed from document type Picture
- Only
IPTC:Source
,IPTC:CopyrightNotice
,IPTC:Description
are stored respectively intodc:source
,dc:rights
anddc:description
. - Widget
summary_picture_iptc
has been removed from document summary - Mistral engine is removed from metadata extraction of the Nuxeo Platform
- EXIF mapping remains identical
Here is the default metadata mapping contribution in the Nuxeo Platform:
<extension target="org.nuxeo.binary.metadata"
point="metadataMappings">
<metadataMapping id="EXIF" processor="exifTool" blobXPath="file:content" ignorePrefix="false">
<metadata name="EXIF:ImageDescription" xpath="imd:image_description"/>
<metadata name="EXIF:UserComment" xpath="imd:user_comment"/>
<metadata name="EXIF:Equipment" xpath="imd:equipment"/>
<metadata name="EXIF:DateTimeOriginal" xpath="imd:date_time_original"/>
<metadata name="EXIF:XResolution" xpath="imd:xresolution"/>
<metadata name="EXIF:YResolution" xpath="imd:yresolution"/>
<metadata name="EXIF:PixelXDimension" xpath="imd:pixel_xdimension"/>
<metadata name="EXIF:PixelYDimension" xpath="imd:pixel_ydimension"/>
<metadata name="EXIF:Copyright" xpath="imd:copyright"/>
<metadata name="EXIF:ExposureTime" xpath="imd:exposure_time"/>
<metadata name="EXIF:ISO" xpath="imd:iso_speed_ratings"/>
<metadata name="EXIF:FocalLength" xpath="imd:focalLength"/>
<metadata name="EXIF:ColorSpace" xpath="imd:color_space"/>
<metadata name="EXIF:WhiteBalance" xpath="imd:white_balance"/>
<metadata name="EXIF:IccProfile" xpath="imd:icc_profile"/>
<metadata name="EXIF:Orientation" xpath="imd:orientation"/>
<metadata name="EXIF:FNumber" xpath="imd:fnumber"/>
</metadataMapping>
<metadataMapping id="IPTC" processor="exifTool" blobXPath="file:content" ignorePrefix="false">
<metadata name="IPTC:Source" xpath="dc:source"/>
<metadata name="IPTC:CopyrightNotice" xpath="dc:rights"/>
<metadata name="IPTC:Description" xpath="dc:description"/>
</metadataMapping>
</extension>
<extension target="org.nuxeo.binary.metadata"
point="metadataRules">
<rule id="iptc" order="0" enabled="true" async="false">
<metadataMappings>
<metadataMapping-id>EXIF</metadataMapping-id>
<metadataMapping-id>IPTC</metadataMapping-id>
</metadataMappings>
<filters>
<filter-id>hasPictureType</filter-id>
</filters>
</rule>
</extension>
<extension target="org.nuxeo.ecm.platform.actions.ActionService"
point="filters">
<filter id="hasPictureType">
<rule grant="true">
<type>Picture</type>
</rule>
</filter>
</extension>