Content Repository

Java Data Structures and Caching

Updated: July 17, 2023

This page described low-level implementation details of the VCS backend.

Here is a list of Java objects that hold data:

  • Row: holds a single database row using a map (or a list of value for a multi-valued properties).
  • Fragment: a Row with a state, the original data are kept to pinpoint dirty fields that will need to be synchronized with the database at the next save. There are two kind of fragments: SimpleFragment to hold single database row and CollectionFragment to hold multi-valued fields. Fragment and Rows manipulates non-typed data (Serializable).
  • Node: holds a map of fragments (one per table, roughly equivalent to one per schema) and it gives access to typed properties.
  • Selection: holds a list of IDs for a node. It is equivalent to a cached subset of a query for the children, versions or proxies of a document.
  • Document: the low-level document (SQLDocumentLive), holding a Node and implementing the Session SPI.
  • DocumentModel: the high level document representation, synchronized with the Document at getDocument/saveDocument time, and has knowledge about rights, proxies, versions.

When a session is manipulating documents, the underlying Row objects are loaded, updated, deleted using a Mapper.

When a session is saved, the Mapper send SQL DML instructions in batch to minimize database round trip.

The main database caching is done at the Row level.

When performing a NXQL query, the result list of IDs is not cached. Only the database rows needed to represent the documents are cached.

After a commit the session sends cache invalidation to other sessions (or to other Nuxeo instances when in cluster mode). Before starting a new transaction the session processes the invalidation to update its cache.

The default cache implementation uses a cache per session. The cache is done with Java SoftReference map. This means that cache values can be garbage collected on memory pressure. The cache size depends on the size of the JVM heap and on the memory pressure.

The cache implementation is pluggable so it is possible to try other strategies like having an common cache shared by all sessions. Another available implementation based on Ehcache is the UnifiedCachingMapper.

The Selection (list of children, proxies or versions) are also cached using SoftReference at the session level.

Both Row and Selection caches expose metrics so it is possible to get the cache hit ratio.

 

Related pages in this documentation

VCS Performance Recommendations