Server

Garbage-Collecting Orphaned Blobs

Updated: September 18, 2023

Immediate Garbage Collection

Since LTS 2021-HF35 (see NXP-31594), the Nuxeo Platform deletes orphaned blobs whenever a:

  • document is removed
  • document blob property is edited
  • document blob property is dispatched to another blob provider

Preconditions

Only deployments using the MongoDB backend can benefit from this feature. The following conditions must also be met:

  • repositories must have the queryBlobKeys capability
  • repositories must use LocalBlobProvider or S3BlobProvider

Repository with queryBlobKeys capability

This new GC implementation only works for repositories having the queryBlobKeys capability.

Since LTS 2021-HF02 and NXP-29516, the blob keys referenced by a document are stored in its ecm:blobKeys field.

If ALL documents of a repository have this field computed, then the repository has the queryBlobKeys capability. In other words, a repository with documents created by a nuxeo server with a version prior to LTS 2021.2 / LTS 2021-HF02 does NOT have this capability.

You can query the capability endpoint to check whether a repository has the queryBlobKeys capability.

In case of multi-repository configuration, all the repositories must have this capability.

Supported Blob Provider implementations

This GC implementation only works with Blob Providers extending BlobStoreBlobProvider which are:

2021 Default Blob Provider
In LTS 2021, the default Blob Provider is DefaultBinaryManager. This GC implementation is therefore not supported out of the box in LTS 2021. You can switch to LocalBlobProvider to benefit from this GC implementation with this configuration property:

nuxeo.core.binarymanager=org.nuxeo.ecm.core.blob.LocalBlobProvider

See NXP-31876.

Disablement

Immediate Garbage Collection is enabled by default. You can disable it with the following configuration property:

nuxeo.bulk.action.blobGC.enabled=false

Full Garbage Collection

Since LTS 2021-HF38 (see NXP-28565), a new Full GC implementation is available to scan your blob store in order to detect and delete the blobs that are no longer referenced in your repository.

This Full GC leverages the Bulk Action Framework. Like other bulk actions, the following configuration properties can be tweaked to fit your environment:

nuxeo.bulk.action.garbageCollectOrphanBlobs.defaultConcurrency=2
nuxeo.bulk.action.garbageCollectOrphanBlobs.defaultPartitions=4

Please see the dedicated Blobs Management Rest endpoint to invoke and monitor a Blob Full GC.

Despite this implementation being designed to scale on big volumes of data, it will necessarily take some time to fully garbage collect a repository referencing a certain amount of blobs.

It is recommended first to run an orphaned version GC in order to remove references to blobs. It would allow the orphaned blob Full GC to free more space. See Versions Management Rest endpoint to do so.