Nuxeo can be clustered between several nodes (a.k.a. instances or machines) with the appropriate configuration. In addition, an HTTP load balancer with session affinity must be used in front of the nodes.
Nuxeo Reference Architecture.
To enable clustering, you must have at least two nodes with:
- A shared database
- A shared filesystem (unless you use an external binary store like S3)
- A dedicated Elasticsearch cluster, if using Elasticsearch
- A Kafka cluster or a Redis server
- A load-balancer with sticky sessions
You can head to the Nuxeo cluster architecture introduction for more information on these components.
The shared filesystem is usually an NFS mount. You must not share the whole Nuxeo installation tree (see below).
To set up clustering, please update the following parameters in
trueto enable clustering.
nuxeo.cluster.nodeid: must be set to a cluster node id. The id should be a string specific to this instance (and therefore all instances must have different cluster node ids).
The complete Nuxeo instance hierarchy must not be shared between all instances. However the following things must be shared.
All the binary stores must be shared by all Nuxeo instances in order for the document repository and transient stores to function correctly.
In addition to the default repository binary store used for documents, Nuxeo uses dynamically-named binary stores for the various transient stores it needs. These dynamic binary stores are created as siblings of the default one.
$NUXEO/nxserver/data/binaries and therefore by default Nuxeo would create:
However the rest of the
$NUXEO/nxserver/data directory must not be shared by several Nuxeo instances, as it contains instance-specific data.
Therefore in a cluster setting you should point
repository.binary.store to a folder like
/var/lib/nuxeo/binaries/binaries and mount/share
/var/lib/nuxeo/binaries. This way the default binary store and all the dynamic binary stores will be created under the mount point:
You can of course use a different path than
The above does not apply if binaries are stored in a network-based location, like S3.
The temporary directory configured through
nuxeo.tmp.dir must not be shared by all instances, because there are still a few name collision issues that may occur, especially during startup.
However, in order for various no-copy optimizations to be effective, the temporary directory should be on the same filesystem as the binaries directory. To do this, the recommended way is to have each instance's
nuxeo.tmp.dir point to a different subdirectory of the shared filesystem.
Using the above suggestions for the binaries directory, you could set
/var/lib/nuxeo/binaries/tmp/node1 for example, for a node with id
A clustered Nuxeo environment should be configured to use Quartz scheduling. The Quartz scheduling component allows nodes to coordinate scheduled tasks between themselves - a single task will be routed to a single node for execution on that one node. This ensures that scheduled events, like periodic cleanups or periodic imports, are executed only on one node and not on all nodes at the same time.
For DBS (MongoDB) everything is done automatically, you don't have to use any specific configuration or template. You can skip the rest of this section.
For VCS (SQL databases) the standard configuration is available from Nuxeo templates; each node in the cluster should be configured to include the relevant
- Populate the database with the tables needed by Quartz (names
QRTZ_*). The DDL scripts come from the standard Quartz distribution and are available in the Nuxeo templates in
- Enable the Quartz-specific cluster templates by adding the template
Any instance using a clustered Quartz configuration tries to get a lock on the next scheduled job execution. Those locks are managed and shared through the database. The time must be synchronized on all instances. You should use NTP for that.
While performing a rolling upgrade on Nuxeo servers, the lock may be swapped between the instances. In which case, you may encounter a warning on startup:
This scheduler instance (host-nuxeo.example.com1478524375548) is still active but was recovered by another instance in the cluster. This may cause inconsistent behavior.
This message is not a problem if the NTP configuration is fine.
We advise to use a session affinity mechanism: when a user is connected to a node, they should always be redirected to that node.
There are several reasons why we advise this configuration, described below.
The Nuxeo Cluster system takes care about propagating invalidations between all nodes of the clusters.
However, for performances reasons, there is a small delay by default: this means that without affinity you could have one call creating a document and the second one not seeing the document. Of course this state is transient, and after a few milliseconds it will be ok. However in the context of a "multi-page transaction" this could be an issue.
Having session affinity does solve the visible issues. If the session affinity can not be restored, for example because the target server has been shutdown, in 99,99% of the case, this won't be an issue.
The Nuxeo Platform requires all calls to be authenticated. Depending on your architecture, authentication can be stateless (ex: Basic Auth) or stateful (ex: Form + Cookie). Either way, you probably don't want to replay authentication during all calls.
That's why having a session based authentication + session affinity can make sense: you don't have to re-authenticate each time you call the server.
If the session affinity can not be restored, for example because the target server has been shut down:
- stateless authentication will be automatically replayed (ex: Basic Auth)
for stateful authentication:
- if you have a SSO this will be transparent
- if you don't have a SSO, user will have to authenticate again.
The UI can be stateful or stateless:
- The default Web UI is stateless,
- JSF is stateful.
If the UI layer you use is stateful, you have to use stateful load balancing for session affinity.
Set up an HTTP or AJP load balancer such as Apache with
mod_proxy_ajp or Pound, and configure it to keep session affinity by tracking the value of the
JSESSIONID cookie and the
;jsessionid URL parameter.
If you use a stateless load balancer such as Apache modules such as
mod_proxy_balancer, you need to make the HTTP server generate
JSESSIONID cookies with values that end with
.nxworker_n_ , where
nxworker_n_ is a string suffix specific to each node (you can use any string).
nuxeo.confspecify a different
nuxeo.server.jvmRoutefor each node, for instance
nuxeo.server.jvmRoute=nxworker1. This will instruct the Nuxeo preprocessing phase to correctly fill the
jvmRouteattribute of the
Engineelement in the generated
- Configure you stateless balancer to follow these routes, for instance here is the relevant configuration fragment when using
ProxyPass /nuxeo balancer://sticky-balancer stickysession=JSESSIONID|jsessionid nofailover=On <Proxy balancer://sticky-balancer> BalancerMember http://192.168.2.101:8080/nuxeo route=nxworker1 BalancerMember http://192.168.2.102:8080/nuxeo route=nxworker2 </Proxy>
To enable automatic unhealthy instance eviction on your balancer, you may require an health check.
The following ensures Nuxeo runtime is initialized and up:
To test that the load balancer forwards the HTTP requests of a given session to the same node:
Add a new file on each node (after Tomcat started),
On the first node:
and on the second node:
Using a browser with an active Nuxeo session (an already logged-in user), go to
http://yourloadbalancer/nuxeo/clusterinfo.htmland check that you always return to the same node when hitting the refresh button of the browser.