Server

File Storage

Updated: October 1, 2024

Hyland University
Watch the related courses on Hyland University:
Video on Document Blobs from the Data Persistence course
university_file_storage.png
university_file_storage.png

Files and Blobs

A file is what is commonly handled on user's desktop or other file system. It is a binary content managed under a file system, which means with a location (or locations if fragmented), a path and a name. On the Nuxeo Platform, the concept of file system doesn't exist. The content is stored as a binary stream, and the address of that content is stored in the database. The database has the notion of "Blob", that represents the binary stream and a set of metadata:

  • The hash of the binary stream
  • The length
  • A name
  • A mime-type.

At the Core level, blobs are bound to documents via a property of type BlobProperty. So a document can store multiple files that are standalone blob properties, or list of blob properties. When configuring this property, it has to be of type

<xs:element name="test" type="nxs:content"/>

Which corresponds in Nuxeo Studio to selecting "Blob" in the type.

You can then use either the Document.SetBLob operation to set a blob on a given property or the setPropertyValue(String xpath,

Serializable value), of the documentModel object in the Java API. You can also have a look at the BlobAdapter pattern.

Blob Manager and Blob Providers

At lower level, blobs are managed in the Nuxeo Platform by Blob Providers. Most of the time, blob Java objects implement the interface ManagedBlob, that provides the getKey() method. This method returns an id for identifying the blob and this id starts by a prefix that gives the Blob Provider used to retrieve it.

A Blob Provider is a component that provides an API to read and write binary streams as well as additional services such as:

  • Getting associated thumbnail of a binary stream
  • Getting a download URI
  • Some version management features
  • Getting available conversions
  • Getting registered applications links

As we will see later in this page, the Nuxeo Platform is shipped with several Blob Provider implementations:

  • File System implementation
  • S3, Azure, ...
  • Google Drive, Dropbox

As a specialization, a Blob Provider can implement the interface DocumentBlobProvider if it is capable of dealing with advanced document-related operations like versioning.

A Nuxeo Platform instance can make use of several Blob Providers on the same instance. The DocumentBlobManager service is in charge of determining for read and write operations which Blob Provider should be used depending on various parameters. The BlobManager service is then in charge of actualing using the correct BlobProvider to do the read/write operation.

A typical low level Java call for creating a file is the following:

DocumentBlobManager blobManager = Framework.getService(DocumentBlobManager.class);
String key = blobManager.writeBlob(blob, doc, xpath);

The DocumentBlobManager service uses the contributed BlobDispatcher implementation (or the default one) to determine which Blob Provider to use for persisting the blob. It can therefore accept the document and the blob's xpath as parameters. We will review below how the default BlobDispatcher works.

Bellow is a sequence diagram of what happens when writing a binary stream.

Default Blob Provider

Without installing any additional addon, you will find several Blob Provider implementations that you can use.

Name Class Description Resources
Default org.nuxeo.ecm.core.blob.LocalBlobProvider The default implementation. Stores binaries using their MD5 (or other) hash on the local filesystem. Configuration
Encrypted org.nuxeo.ecm.core.blob.AESBlobProvider Stores binaries encrypted using AES form on the local filesystem. Configuration
External File System org.nuxeo.ecm.core.blob.FilesystemBlobProvider Reads content stored on an external file system. Configuration

To register a new Blob Provider, use the blobprovider extension point with the Java class for your Blob Provider:

<extension target="org.nuxeo.ecm.core.blob.BlobManager" point="configuration">
  <blobprovider name="default">
    <class>org.nuxeo.ecm.core.blob.LocalBlobProvider</class>
    <property name="path">binaries</property>
  </blobprovider>
</extension>

Additional Blob Providers

In addition to the default ones listed above, the following implementations exist and can be used:

Name Class Description Resources
Azure org.nuxeo.ecm.blob.azure.AzureBlobProvider Stores Content on Azure Object Store (CDN available through configuration) Nuxeo Package - Sources
- Configuration
Google Storage org.nuxeo.ecm.core.storage.gcp.GoogleStorageBlobProvider Stores Content on Google Storage service from Google Cloud Platform Nuxeo Package - Sources
- Configuration
MongoDB GridFS Storage org.nuxeo.ecm.core.storage.mongodb.blob.GridFSBlobProvider Stores Content on MongoDB backend Sources
- Configuration
Amazon S3 org.nuxeo.ecm.blob.s3.S3BlobProvider Stores content on Amazon S3 Nuxeo Package - Sources - Configuration
Google Drive org.nuxeo.ecm.liveconnect.google.drive.GoogleDriveBlobProvider Reads content from Google Drive Nuxeo Package - Sources
Dropbox org.nuxeo.ecm.liveconnect.dropbox.DropboxBlobProvider Reads content from Dropbox Nuxeo Package - Sources

Blob Dispatcher and HSM

The Blob Manager allows to enforce typical HSM (Hierarchical Storage Management) behavior as illustrated in the high level schema below:

This is doable thanks to the BlobDispatcher class.

The role of the blob dispatcher is to decide, based on a blob and its containing document, where the blob's binary is actually going to be stored. The Nuxeo Platform provides a default blob dispatcher (org.nuxeo.ecm.core.blob.DefaultBlobDispatcher) that is easy to configure for most basic needs. But it can be replaced by a custom implementation if needed.

Without specific configuration, the DefaultBlobDispatcher stores a document's blob's binary in a blob provider with the same name as the document's repository name.

Advanced dispatching configuration is possible using properties. Each property name is a list of comma-separated clauses, with each clause consisting of a property, an operator and a value. The property can be a document property XPath, or ecm:repositoryName, ecm:path, or, to match the current blob being dispatched, blob:name, blob:mime-type, blob:encoding, blob:digest, blob:length or blob:xpath. Comma-separated clauses are ANDed together. The special property name default defines the default provider, and must be present.

Available operators between property and value are =, !=, <, >, ~ and ^. The operators < and > work with integer values. The operator ~ does glob matching using ? to match a single arbitrary character, and * to match any number of characters (including none). The operator ^ does full regexp matching.

For example, all the videos could be stored somewhere, the attachments in a different area, the documents from a secret source in an encrypted area, and the rest in a default location. To do this, you would need to specify the following:

Example Blob Dispatcher Configuration

<extension target="org.nuxeo.ecm.core.blob.DocumentBlobManager" point="configuration">
  <blobdispatcher>
    <class>org.nuxeo.ecm.core.blob.DefaultBlobDispatcher</class>
    <property name="dc:format=video">videos</property>
    <property name="blob:mime-type=video/mp4">videos</property>
    <property name="blob:xpath~files/*/file">attachments</property>
    <property name="dc:source=secret">encrypted</property>
    <property name="default">default</property>
  </blobdispatcher>
</extension>

This assumes that you have four blob providers configured, the default one and three additional ones, videos, attachments and encrypted. For example you could have:

Defining Additional Binary Managers

<extension target="org.nuxeo.ecm.core.blob.BlobManager" point="configuration">
  <blobprovider name="videos">
    <class>org.nuxeo.ecm.core.blob.LocalBlobProvider</class>
    <property name="path">binaries-videos</property>
  </blobprovider>
  <blobprovider name="attachments">
    <class>org.nuxeo.ecm.core.blob.LocalBlobProvider</class>
    <property name="path">binaries-attachments</property>
  </blobprovider>
  <blobprovider name="encrypted">
    <class>org.nuxeo.ecm.core.blob.AESBlobProvider</class>
    <property name="key">password=secret</property>
  </blobprovider>
</extension>

Separating Binaries
It is CRITICAL to keep the binaries separated between each provider. Otherwise, this will result in a shared storage configuration that will prevent the Orphaned Blobs GC from running efficiently.

Always define different path when using local blob providers. When using Amazon S3 Online Storage (or any other cloud provider), always define different bucket_prefix (or container prefix) if using the same bucket (or container).

Such WARN message is displayed at server start up otherwise:

Shared storages detected: [path] this must be avoided, review your blob providers configuration.

The default DefaultBlobDispatcher class can be replaced by your own implementation.

Download Service

All the code of the platform that performs a download (for WebDav and CMIS APIs, for custom actions in the UI, for the main download servlet) makes use of the same factorized Java code: the download service. A typical download session is like this:

DownloadService downloadService = Framework.getService(DownloadService.class); downloadService.downloadBlob(req, resp, doc, xpath, null, filename, "download");

The download service handles the call to the blobManager, logs all download in the audit and also allows to contribute some security rules for authorizing or refusing download of blobs depending on the doc, the filename, the user, the rendition name, etc. The following activity diagram gives an idea of the behavior of what happens when downloading a file:

That can also be view as a sequence diagram to better understand each actor's roles:

File URLs / Download URL File Format

The default URL patterns for downloading files from within the JSF environment are:

  • http://NUXEO_SERVER/nuxeo/nxfile/{repository}/{uuid}/blobholder:{blobIndex}/{fileName}
  • http://NUXEO_SERVER/nuxeo/nxfile/{repository}/{uuid}/{propertyXPath}/{fileName}

Where:

  • nxfile is the download servlet.
    Note that nxbigfile is also accepted for compatibility with older versions.
  • repository is the identifier of the target repository.
  • uuid is the uuid of the target document.
  • blobIndex is the index of the Blob inside the BlobHolder adapter corresponding to the target Document Type, starting at 0: blobholder:0, blobholder:1.
  • propertyXPath is the xPath of the target Blob property inside the target document. For instance: file:content, files:files/0/file.
  • fileName is the name of the file as it should be downloaded. This information is optional and is actually not used to do the resolution.
  • ?inline=true is an optional parameter to force the download to be made with Content-Disposition: inline. This means that the content will be displayed in the browser (if possible) instead of being downloaded.

Here are some examples:

  • The main file of the document: http://127.0.0.1:8080/nuxeo/nxfile/default/776c8640-7f19-4cf3-b4ff-546ea1d3d496
  • The main file of the document with a different name: http://127.0.0.1:8080/nuxeo/nxfile/default/776c8640-7f19-4cf3-b4ff-546ea1d3d496/blobholder:0/mydocument.pdf
  • An attached file of the document: http://127.0.0.1:8080/nuxeo/nxfile/default/776c8640-7f19-4cf3-b4ff-546ea1d3d496/blobholder:1
  • A file stored in the given property: http://127.0.0.1:8080/nuxeo/nxfile/default/776c8640-7f19-4cf3-b4ff-546ea1d3d496/myschema:content
  • A file stored in the given complex property, downloaded with a specific filename: http://127.0.0.1:8080/nuxeo/nxfile/default/776c8640-7f19-4cf3-b4ff-546ea1d3d496/files:files/0/file/myimage.png
  • The main file of the document inside the browser instead of being downloaded: http://127.0.0.1:8080/nuxeo/nxfile/default/776c8640-7f19-4cf3-b4ff-546ea1d3d496?inline=true

For Picture document type, a similar system is available to be able to get the attachments depending on the view name:

  • http://NUXEO_SERVER/nuxeo/nxpicsfile/{repository}/{uuid}/{viewName}:content/{fileName}

where, by default, viewName can be Original, OriginalJpeg, Medium, Thumbnail.

How It Works

The default Blob Provider implementation is based on a simple filesystem: considering the storage principles, this is safe to use this implementation even on a NFS like filesystem (since there is no conflicts).

InputStream getStream(String digest);
Path getFile(String digest);

As you can see, the methods do not have any document related parameters. This means the binary storage is independent from the documents:

  • Moving a document does not impact the binary stream.
  • Updating a document does not impact the binary stream.

In addition, the streams are stored using their digest. Thanks to that:

  • BlobStore does automatically manage de-duplication.
  • BlobStore can be safely snapshoted (files are never moved or updated, and they are only removed via a GarbageCollection process).

External File System

We provide a Blob Provider for being able to reference files that would be stored on an external file system that the Nuxeo server can reach: org.nuxeo.ecm.core.blob.FilesystemBlobProvider

The root path is a property of the contribution:

  <extension target="org.nuxeo.ecm.core.blob.BlobManager" point="configuration">
    <blobprovider name="fs">
      <class>org.nuxeo.ecm.core.blob.FilesystemBlobProvider</class>
      <property name="root">/opt/nuxeo/nxserver/blobs</property>
      <property name="preventUserUpdate">true</property>
 </blobprovider>
  </extension>

The preventUserUpdate property will be used by the UI framework for not proposing to the user the ability to update. Such a blob provider can be used when creating a document with the following code:

BlobInfo blobInfo = new BlobInfo();
blobInfo.key = "/opt/nuxeo/nxserver/blobs/foo/bar.pdf";
blobInfo.mimeType = "application/pdf";
BlobManager blobManager = Framework.getService(BlobManager.class);
Blob blob = ((FilesystemBlobProvider) blobManager.getBlobProvider("fs")).createBlob(blobInfo);

Internally the blob will then be stored in the database with a key of fs:foo/bar.pdf.

Encryption

A common question regarding the Blob Manager is the support for encryption. See Implementing Encryption for more on the configuration options.

AES Encryption

Since Nuxeo Platform 6.0, it's possible to use a Blob Provider that encrypts file using AES. Two modes are possible:

  • A fixed AES key retrieved from a Java KeyStore
  • An AES key derived from a human-readable password using the industry-standard PBKDF2 mechanism.

While the files are in use by the application, a temporary file in clear is created. It is removed as soon as possible.

Built-in S3 Encryption

When using the Amazon S3 Online Storage package, the AWS S3 client library automatically supports both client-side and server-side encryption:

With server-side encryption, the encryption is completely transparent.

In client-side encryption mode, Nuxeo Platform manages the encrypt/decrypt process using the AWS S3 client library. The local temporary file is in clear.


Related topics in this documentation
Nuxeo Studio Community Cookbook