Content Repository

Binary Store

Updated: October 16, 2020

Repository and BinaryManager

Each content repository has to be associated with a BinaryManager implementation. The BinaryManager is a low level interface that only deals with binary stream.

  Binary getBinary(InputStream in) throws IOException;

  Binary getBinary(String digest);

As you can see, the methods do not have any document related parameters. This means the binary storage is independent from the documents:

  • Moving a document does not impact the binary stream;
  • Updating a document does not impact the binary stream.

In addition, the streams are stored using their digest, thanks to that:

  • BlobStore does automatically manage de-duplication;
  • BlobStore can be safely snapshoted  (files are never moved or updated, and they are only removed via a GarbageCollection process).

From Simple FS to S3 Binary Manager 

The default BinaryManager implementation is based on a simple filesystem: considering the storage principles, this is safe to use this implementation even on a NFS like filesystem (since there is no conflicts).

You can also use the S3 Binary Manager to use AWS Cloud File System.

The Temporary storage is used to avoid delays when using the Stream several times (ex: multiple conversions) inside the Nuxeo Server.

Encryption

A common question regarding BinaryManager is the support for encryption. See Implementing Encryption for more on the configuration options.

AES Encryption

Since Nuxeo 6.0, it's possible to use a BinaryManager that encrypts file using AES. Two modes are possible:

  • a fixed AES key retrieved from a Java KeyStore,
  • an AES key derived from a human-readable password using the industry-standard PBKDF2 mechanism.

While the files are in use by the application, a temporary file in clear is created. It is removed as soon as possible.

Built-in S3 Encryption

If we take the example of the S3 BinaryManager, AWS S3 Client library supports both client side and server side encryption: 

With Server side encryption, the encryption is completely transparent.

In Client side encryption mode the S3 Client manages the encrypt / decrypt process.  The local temporary file is in clear.

Custom Encryption

You can contribute custom implementation of the BinaryManager: since the interface is very simple, the implementation is simple too.

  • The first possible approach is to handle custom crypt / decrypt on top of AWS S3 Client library:

    In that case, the local temporary file is in clear.

  • The second possible approach is to handle the crypt/decrypt process on the fly.

    This means that the temp file is crypted, but as a trade off:

    • Decrypting should be run on the fly each time the stream is read.
    • Determining the stream size requires more work.