Documents in the Nuxeo Platform are more than simple files. In the context of the Nuxeo Platform files are called binaries or blobs (even if they are text files).
To get an estimation of the volume usage (including filesystem and database), you need to number and qualify the content nature. For each kind of content, you must estimate the average:
- Number of documents
- File size
- Text volume
- Size of renditions or transformations (thumbnails, PDF, video conversions...)
- Metadata size
- Indexes
Given how hard it is sometimes to guess those values, the easiest way is still to monitor the application in effective use for some duration, then get statistics on documents and deduce the required size over time. Here are some guidelines that should help you get some estimation.
Estimating Volume Usage
Media
To get an estimate of the volume usage for pictures and videos, you should take into account the following three criteria:
- File volume: typically 1MB-5MB per file except if you mostly manage video
- Generated thumbnails: Depending on the original image size, the thumbnail will be between 10% to 100% of the original file size.
- Text volume: 1% of the file volume
Structure Content
Structured Content is mostly composed of properties and has few attached files. Here are the elements to consider to estimate the volume they'll represent:
- Property content (metadata): Consider typically 1-10 KB / object
- File volume
- Text volume: text content + 20% of the file volume
Office Files
For typical office files, you should consider:
- Property content: Less than 1KB / object
- File volume: 100 KB - 2 MB per object
- Text volume: property content + 30% of the file volume
Storage Distribution
There are various configurations and addons providing alternative storage solutions (see the page Persistence Architecture), but here is a generic solution for storage distribution:
Filesystem
Disk usage by the Nuxeo Platform is stable and about 300 MB.
It is possible to spread the filesystem resources over multiple disks or partitions: binaries, Nuxeo data, cache data, temporary files. See the page Configuration Parameters Index (nuxeo.conf) for their configuration.
- Binaries. By default, they are stored under a sub-directory of the
data
directory, without compression but with no duplication. - Cache files: < 1 GB (including cache of Nuxeo Packages and hotfixes).
- Temporary files (when uploading for instance): Reserve some space which depends on the maximum size of imported files.
The temporary directory can be configured using
nuxeo.tmp.dir
for instance. Usually: 1 GB + the maximum size of imported documents. - Logs: Based on Log4J, the log files can be easily configured (size limit, rolling rules, ...).
The logs directory can be configured using
nuxeo.log.dir
. Usually: 50 MB~5 GB depending on the Log4J rules and the log level (error/warn/info/debug/trace).
Database
The database will store:
- Extracted text volume x2
- Metadata
Elasticsearch
- About 30% of the extracted text volume.
- Size will vary according to the number of populated fields and full-text fields.
Note that the
_source
field that stores the JSON representation of a document is compressed. - Each replica needs the same amount of disk space.
Backup
Based on the above estimations, you must reserve dedicated place(s) to store the backup locally or remotely. Depending on your infrastructure choices, you can use compression, streaming, hot backup, rsync...