The Nuxeo Cold Storage addon has been developed and tested for AWS. Microsoft Azure or other cloud platforms are not compatible with the addon.
Nuxeo Cold Storage is based on AWS S3 Glacier Flexible Retrieval with standard retrieval option.
The general cost principles of cold storage are:
- Storage is cheap
- Retrieval is expensive
- Anything sent to cold storage will be charged a minimum of 90 days of storage
When moving content to cold storage, only the main file attached to the document is sent to cold storage.
All the rest remains in regular storage, including:
- Other attachments in the document (e.g. anything stored in
- Renditions for the main file. Unless you disabled that option, a default rendition is necessary to provide a preview for the main file (see preview file configuration), other renditions can be removed using logic when you move the content to save on storage space if you want to.
- Elasticsearch index for the document (including fulltext index for the main file) so that you can keep finding the document in search results.
- Annotations made on the main file if you are using Nuxeo Enhanced Viewer.
Overall, you should consider that content sent to Cold Storage is meant for archival, because retrieving the content is costly and takes time. It should be seen as an efficient way to save on storage costs for content you want to keep around but that you are unlikely to need anytime soon.
You should also consider that Amazon will charge a minimum file size of 40kb, so you should not send files lower than that size to cold storage. The rule of thumb is that the larger the file is, the more you can save.
Documents can be moved in bulk to cold storage once they are stored on the platform. It is not possible to import content directly into cold storage. Reason is once the file is moved to cold storage, we cannot access it anymore unless a retrieval is requested. Nuxeo needs to execute some actions first like generating the preview rendition and indexing the content of the main file to provide a good user experience while content is under cold storage before the content can be moved.
Multiple options are available and can be combined, depending on your situation.
This option is the best solution if you have large volumes of existing content that need to be migrated to cold storage. It is easy to use at any time, versatile, provides interesting performance and is designed to handle large volumes of documents since it leverages the bulk action framework.
Let's say we want to move documents that are >1MB and, obviously, not already under cold storage (i.e.
SELECT * FROM File WHERE ecm:mixinType <> 'ColdStorage' AND file:content/length > 1048576), you could use the query below after replacing credentials and the Nuxeo server URL with your own:
curl -u Administrator:Administrator \
-H 'Content-Type: application/json'
-X POST 'http://localhost:8080/nuxeo/api/v1/search/bulk/moveToColdStorage?query=SELECT%20*%20FROM%20File%20WHERE%20ecm%3AmixinType%20%3C%3E%20%27ColdStorage%27%20AND%20file%3Acontent%2Flength%20%3E%201048576&queryLimit=10000'
Notice that we are using the
queryLimit parameter in the example above to limit the number of documents impacted to 10k; it is a good practice to test your changes at a smaller scale first.
Once your initial content has been moved to cold storage, you will likely have a need to move smaller batches of documents regularly, and you will want to have this process automated. A typical use case could be "every night, send all documents that are in an archived status, have been last modified more than 2 years ago and that are not under legal hold to cold storage".
Using a scheduled task is a perfect fit for that need. You will first contribute your event to be scheduled, then either:
if you want to leverage automation.
- Write an event listener using Java code.
Below are some code examples taken from the Nuxeo Retention addon that apply a similar logic: checking regularly for documents that are not under retention anymore and executing a bulk action on them:
You can take these examples as a basis to adapt to your own needs, knowing that the bulk action for sending content to cold storage is named
Content can also be moved to cold storage gradually on a per event basis, for example when a document reaches a particular state. In that case, you can simply configure an event handler using Nuxeo Studio.
It is not possible to dispatch content into cold storage directly.
Related question: Can I Bulk Import Content Into Cold Storage Directly?
It is still possible to find the document using a fulltext search after the file has been sent to cold storage. We keep the result of the fulltext extraction in the database, meaning that even if we don't have immediate access to the file anymore, we can rebuild the Elasticsearch index anyway.
When a document is sent to cold storage, the main file is not available (it requires a retrieval operation, taking between 3 and 5 hours). In order to preserve the user experience, we display a rendition of the document (as renditions remain on S3 standard).
By default, we are using the following renditions:
|Documents of type Picture
|Documents of type Video
You can change the renditions to be used and add new configurations for specific document type(s) and/or facet(s).
To do so, you can add an XML contribution to your Nuxeo Studio project and specify the renditions to use, as in the following example:
<extension target="org.nuxeo.coldstorage.service.ColdStorageService" point="coldStorageRendition" >
<coldStorageRendition name="defaultRendition" renditionName="thumbnail" />
<coldStorageRendition name="pictureRendition" docType="Picture" facet="Picture" renditionName="Small" />
<coldStorageRendition name="videoRendition" docType="Video" facet="Video" renditionName="MP4 480p" />
<coldStorageRendition name="myCustomRendition" docType="myCustomDocumentType" facet="Picture" renditionName="OriginalJpeg" />
- a default rendition named "defaultRendition"
Retrieval takes 3 to 5 hours. Time for restore should be consistent no matter the content type or file size as S3 Glacier is designed for 35 random restore requests per pebibyte (PiB) stored per day, which should prove quite sufficient.
The architecture of the Nuxeo Cold Storage addon relies on the standard Nuxeo Platform principles, which makes the cold storage service customizable using code. It is possible to disable email notifications when content is retrieved or to apply a different behavior by overriding the cold storage service.
Yes, any kind of logic around the process and rules to archive or to restore your content can be achieved using configuration or customization. How to achieve it will derive from the business rules you want to define in your application.
Yes. When moving the file to cold storage, we keep the current Elasticsearch index, meaning that anyone can still find the document using a fulltext search for example.
Related question: What Happens if I Reindex Content Under Cold Storage?
|Sending content into cold storage
|Retrieving content temporarily from cold storage
|Restoring content permanently from cold storage
WriteColdStorage permission is a new permission brought by the addon, that needs to be granted manually by default and includes the
Should you wish to adapt the permissions displayed in Web UI or configure them in a more suitable / granular manner for your project, you can take our HOWTO: Grant the Edit Permission without the Remove Permission documentation as a basis and adapt it to your needs.
Yes. All operations to send content to cold storage or retrieve it are available as automation operations and there are multiple ways to adapt the behavior to your needs. Some options are disabling the default buttons in the UI, restricting the usage of these operations to specific user groups, creating new buttons to handle these operations and associating them to an approval workflow, or a combination of these.
Yes. These are the main events you can bind logic to:
|content has been moved to Cold Storage
|content has been retrieved or restored from Cold Storage (can be used to execute background logic on the content being restored)
|content has been retrieve or restored from cold storage (contains properties to send the information email in the context, can be used to fine tune the notification logic)
More specific events are available as well if needed.
Yes, an audit trail is added by default for Cold Storage related actions.
|Action triggering the event
|Document sent to cold storage
|Send document to cold storage
|User requests a retrieve
|when a user does the request, not when the retrieve is available
|User requests a restore
|when a user does the request, not when the restore is complete
|Download cold document
|User downloads the original file (
|User downloads the preview file (
file:filecontent of a document sent to cold storage)
Annotations made on the main file are kept while the content is under cold storage (they are technically stored as separate documents). If you retrieve the file, they will be visible again when the file is restored.
Yes, if both addons are installed you can configure a post-retention action to send content automatically to cold storage after its retention period.
Content can only be moved to cold storage after its retention period.
Content that is under cold storage can't be put under retention.
Synchronize documents that are already under Cold Storage: it will synchronize the "compressed" rendition of it. Restore and/or retrieve actions are available on sychronized document.
Send to Cold Storage documents that are already synchronized: the format and rendition of your document will be updated in your local Nuxeo Drive folder. And if you perform a restore of your document, format and size will go back to original in your local Nuxeo Drive folder.
You will be able to upload documents to your instance using Direct transfer and then send them to Cold Storage.