Immediate Garbage Collection
Since LTS 2021-HF35 (see NXP-31594), the Nuxeo Platform deletes orphaned blobs whenever a:
- document is removed
- document blob property is edited
- document blob property is dispatched to another blob provider
Preconditions
Only deployments using the MongoDB backend can benefit from this feature. The following conditions must also be met:
- repositories must have the
queryBlobKeys
capability - repositories must use a blob provider extending
BlobStoreBlobProvider
, such asS3BlobProvider
,LocalBlobProvider
,AzureBlobProvider
,GoogleStorageBlobProvider
orGridFSBlobProvider
Repository with queryBlobKeys
capability
This new GC implementation only works for repositories having the queryBlobKeys
capability.
Since LTS 2021-HF02 and NXP-29516, the blob keys referenced by a document are stored in its ecm:blobKeys
field.
If ALL documents of a repository have this field computed, then the repository has the queryBlobKeys
capability. In other words, a repository with documents created by a nuxeo server with a version prior to LTS 2021.2 / LTS 2021-HF02 does NOT have this capability.
You can query the capability endpoint to check whether a repository has the queryBlobKeys
capability.
In case of multi-repository configuration, all the repositories must have this capability.
Acquire the queryBlobKeys
Capability
In the case of missing queryBlobKeys
capability in your repository, the blob-keys-migration
migration is available to acquire it since LTS 2023-HF03. This migration process goes through all the documents of the repository with a NULL
ecm:blobKeys
field to populate it. Depending on the volume of your data, this may be a long-running process.
Check Repository Does Not Have the Capability
curl -u Administrator:Administrator http://localhost:8080/nuxeo/api/v1/capabilities
{
"cluster" : {
"enabled" : false,
"nodeId" : "1231481178147993942"
},
"entity-type" : "capabilities",
"repository" : {
"default" : {
"queryBlobKeys" : false
}
},
"server" : {
"distributionName" : "lts",
"distributionServer" : "tomcat",
"distributionVersion" : "2021.39.3"
}
}
Observe repository.default.queryBlobKeys
equals false.
Run the blob-keys-migration
Migration
curl -X POST -u Administrator:Administrator http://localhost:8080/nuxeo/api/v1/management/migration/blob-keys-migration/run
No output expected.
Wait for the Migration
Await the migration is over (running
is false) and its state
is populated
:
curl -u Administrator:Administrator http://localhost:8080/nuxeo/api/v1/management/migration/blob-keys-migration
{
"description" : "Populate ecm:blobKeys property",
"descriptionLabel" : "migration.dbs.blob.keys",
"entity-type" : "migration",
"id" : "blob-keys-migration",
"status" : {
"pingTime" : 0,
"progressMessage" : null,
"progressNum" : 0,
"progressTotal" : 0,
"running" : false,
"startTime" : 0,
"state" : "populated",
"step" : null
},
"steps" : []
}
Check Repository Now Has the Capability
curl -u Administrator:Administrator http://localhost:8080/nuxeo/api/v1/capabilities
{
"cluster" : {
"enabled" : false,
"nodeId" : "1231481178147993942"
},
"entity-type" : "capabilities",
"repository" : {
"default" : {
"queryBlobKeys" : true
}
},
"server" : {
"distributionName" : "lts",
"distributionServer" : "tomcat",
"distributionVersion" : "2021.39.3"
}
}
Observe repository.default.queryBlobKeys
equals true.
If a failure occurs during the migration, it will stop and the capability will not be acquired. Causes will have to be identified and fixed by analyzing the server logs. If you run a server with a version greater than LTS 2021-HF02, the ecm:blobKeys
is computed whenever a document is created/updated even while the migration is running.
Supported Blob Provider implementations
This GC implementation only works with Blob Providers extending BlobStoreBlobProvider which are:
- S3BlobProvider (when using amazon-s3-online-storage)
- LocalBlobProvider
nuxeo.core.binarymanager=org.nuxeo.ecm.core.blob.LocalBlobProvider
See NXP-31876.
Disablement
Immediate Garbage Collection is enabled by default. You can disable it with the following configuration property:
nuxeo.bulk.action.blobGC.enabled=false
Full Garbage Collection
Since LTS 2021-HF38 (see NXP-28565), a new Full GC implementation is available to scan your blob store in order to detect and delete the blobs that are no longer referenced in your repository.
This Full GC leverages the Bulk Action Framework. Like other bulk actions, the following configuration properties can be tweaked to fit your environment:
nuxeo.bulk.action.garbageCollectOrphanBlobs.defaultConcurrency=2
nuxeo.bulk.action.garbageCollectOrphanBlobs.defaultPartitions=4
Please see the dedicated Blobs Management Rest endpoint to invoke and monitor a Blob Full GC.