Server

HOWTO: Estimate Volume Usage

Updated: March 18, 2024

Documents in the Nuxeo Platform are more than simple files. In the context of the Nuxeo Platform files are called binaries or blobs (even if they are text files).

To get an estimation of the volume usage (including filesystem and database), you need to number and qualify the content nature. For each kind of content, you must estimate the average:

  • Number of documents
  • File size
  • Text volume
  • Size of renditions or transformations (thumbnails, PDF, video conversions...)
  • Metadata size
  • Indexes

Given how hard it is sometimes to guess those values, the easiest way is still to monitor the application in effective use for some duration, then get statistics on documents and deduce the required size over time. Here are some guidelines that should help you get some estimation.

Estimating Volume Usage

Media

To get an estimate of the volume usage for pictures and videos, you should take into account the following three criteria:

  • File volume: typically 1MB-5MB per file except if you mostly manage video
  • Generated thumbnails: Depending on the original image size, the thumbnail will be between 10% to 100% of the original file size.
  • Text volume: 1% of the file volume

Structure Content

Structured Content is mostly composed of properties and has few attached files. Here are the elements to consider to estimate the volume they'll represent:

  • Property content (metadata): Consider typically 1-10 KB / object
  • File volume
  • Text volume: text content + 20% of the file volume

Office Files

For typical office files, you should consider:

  • Property content: Less than 1KB / object
  • File volume: 100 KB - 2 MB per object
  • Text volume: property content + 30% of the file volume

Storage Distribution

There are various configurations and addons providing alternative storage solutions (see the page Persistence Architecture), but here is a generic solution for storage distribution:

Filesystem

Disk usage by the Nuxeo Platform is stable and about 1 GB.

It is possible to spread the filesystem resources over multiple disks or partitions: binaries, Nuxeo data, cache data, temporary files. See the page Configuration Parameters Index (nuxeo.conf) for their configuration.

  • Binaries: By default, they are stored under a sub-directory of the data directory, without compression but with no duplication.
  • Cache files: < 1 GB (including cache of Nuxeo Packages and hotfixes).
  • Temporary files (when uploading for instance): Reserve some space which depends on the maximum size of imported files. The temporary directory can be configured using nuxeo.tmp.dir for instance. Usually: 1 GB + the maximum size of imported documents.
  • Logs: Based on Log4J, the log files can be easily configured (size limit, rolling rules, ...). The logs directory can be configured using nuxeo.log.dir. Usually: 50 MB~5 GB depending on the Log4J rules and the log level (error/warn/info/debug/trace).

Database

The database will store:

  • Extracted text volume x2
  • Metadata
  • The audit will grow over time depending on the activity (depending on your configuration).

Note that the volume depends a lot on the backend, some database will not compress data other will compress big fields and MongoDB will compress everything.

Elasticsearch

  • About 30% of the extracted text volume.
  • Size will vary according to the number of populated fields and full-text fields. Note that the _source field that stores the JSON representation of a document is lightly compressed.
  • The audit index will grow over time depending on the activity (depending on your configuration).
  • Each replica needs the same amount of disk space.

Backup

Based on the above estimations, you must reserve dedicated place(s) to store the backup locally or remotely. Depending on your infrastructure choices, you can use compression, streaming, hot backup, rsync...