Repository Configuration

VCS Configuration

VCS (Visible Content Store) is the default storage engine for Nuxeo documents.

The following are the options available to configure VCS repository in Nuxeo Platform. They usually go in a file named default-repository-config.xml.

In a standard Nuxeo this file is generated from a template, and many elements or attributes actually take their values from parameters in nuxeo.conf.

Example File

This file is for illustration and contains many more options than are necessary by default.

<?xml version="1.0"?>
<component name="default-repository-config">
  <extension target="org.nuxeo.ecm.core.storage.sql.RepositoryService" point="repository">
    <repository name="default" label="label.default.repository">
      <pool minPoolSize="0" maxPoolSize="20"
        blockingTimeoutMillis="100"
        idleTimeoutMinutes="10"
        activeTimeoutMinutes="5" />
      <clustering id="12345" delay="1000" enabled="true" />
      <idType>varchar</idType>
      <childNameUniqueConstraintEnabled>true</childNameUniqueConstraintEnabled>
      <schema>
        <field type="largetext">note</field>
      </schema>
      <indexing>
        <includedTypes>
          <type>File</type>
          <type>Note</type>
        </includedTypes>
        <!-- sample for excluded types -->
        <!--
        <excludedTypes>
          <type>Root</type>
          <type>Workspace</type>
        </excludedTypes>
        -->
        <fulltext analyzer="english"> <!-- PostgreSQL -->
          <index name="default">
            <!-- all props implied -->
          </index>
          <index name="title">
            <field>dc:title</field>
          </index>
          <index name="description">
            <field>dc:description</field>
            <excludeField>content/data</excludeField>
          </index>
        </fulltext>
        <queryMaker class="org.nuxeo.ecm.core.storage.sql.NXQLQueryMaker" />
        <queryMaker class="org.nuxeo.ecm.core.chemistry.impl.CMISQLQueryMaker" />
      </indexing>
      <usersSeparator key="," />
      <aclOptimizations enabled="true" readAclMaxSize="4096"/>
      <pathOptimizations enabled="true"/>
      <ddlMode>execute</ddlMode>
      <noDDL>false</noDDL> <!-- deprecated -->
      <sqlInitFile>myconf.sql.txt</sqlInitFile>
    </repository>
  </extension>
</component>

Pooling Options

<pool minPoolSize="0" maxPoolSize="20"
  blockingTimeoutMillis="100"
  idleTimeoutMinutes="10"
  activeTimeoutMinutes="5" />

minPoolSize: the minimum pool size (default is 0) (see nuxeo.vcs.min-pool-size in nuxeo.conf).
maxPoolSize: the maximum pool size, above which connections will be refused (default is 20) (see nuxeo.vcs.max-pool-size in nuxeo.conf).
blockingTimeoutMillis: the maximum time (in milliseconds) the pool will wait for a new connection to be available before deciding that it cannot answer a connection request (pool saturated).
idleTimeoutMinutes: the time (in minutes) after which an unused pool connection will be destroyed.
activeTimeoutMinutes: the time (in minutes) after which a connection is killed even if it is still active. This is used to avoid connection leaks. This should always be set higher than the Nuxeo transaction timeout.

Clustering Options

<clustering id="12345" enabled="true" delay="1000" />

id: the cluster node id, which must be unique among all cluster nodes connected to the same database.
enabled: use true to activate Nuxeo clustering (default is false, i.e., no clustering) (see repository.clustering.enabled in nuxeo.conf).
delay: a configurable delay in milliseconds between two checks at the start of each transaction, to know if there are any remote invalidations (see repository.clustering.delay in nuxeo.conf).

Column Types

Large Text / CLOB Columns

If you need to specify a length on a Nuxeo field and cannot change the XSD of the document type, you should use the following code in the repository configuration:

<schema>
  <field type="largetext">note</field>
  <field type="largetext">my:field</field>
  ...
</schema>

field type="largetext": a field that should be stored as a CLOB column inside the database instead of a standard VARCHAR column.

However note that for your own schemas, you should specify length constraints on text fields using restrictions in the XSD of your document type.

If you want the text field to a specific length limit:

  <xs:simpleType name="longString">
    <xs:restriction base="xs:string">
      <xs:maxLength value="65536" />
    </xs:restriction>
  </xs:simpleType>

  <xs:element name="text" type="mail:longString"/>

If you want the text field to have no length limit:

  <xs:simpleType name="clob">
    <xs:restriction base="xs:string">
      <xs:maxLength value="999999999" />
    </xs:restriction>
  </xs:simpleType>

  <xs:element name="note" type="nxs:clob"/>

This is important for your large text fields, especially for MySQL, Oracle and SQL Server which have very small defaults for standard text fields.

Using Oracle, if you attempt to save a string too big for the standard NVARCHAR2(2000) field, you will get the error:

java.sql.SQLException: ORA-01461: can bind a LONG value only for insert into a LONG column

Id Column Type

In standard Nuxeo the document id is a UUID stored as a string, for instance 9ea9a461-e131-4127-9a57-08b5b9b80ecb.

It's possible on select databases to use a more efficient id representation:

 <idType>varchar</idType>

The following values for idType are possible:

varchar: a varchar-based UUID (the default),
uuid: a native uuid (only on PostgreSQL (NXP-4803)),
sequence: a sequence-based integer on:
- PostgreSQL NXP-10894
- Oracle NXP-13800
- SQL Server 2012 (not Azure) NXP-10912.
  Instead of just sequence you can also use sequence:your_sequence_name if you want to use another sequence than the default one (hierarchy_seq).

When using a sequence, the document ids will be simple incremental small integers instead of randomly-generated UUIDs.

Note that switching this option to a new value will require a full dump, manual conversion and restore of your database, so it should be specified before starting Nuxeo for the first time.

Child Name Unique Constraint

Low-level database constraints that disallow multiple children with the same name can be enabled:

<childNameUniqueConstraintEnabled>true</childNameUniqueConstraintEnabled>

This is available and enabled by default on PostgreSQL since Nuxeo 9.2 (NXP-22421).

Indexing Options

Configuring Which Types Will Be Indexed

It's possible to configure the document types you want to index or you want to exclude from full-text indexing. This is possible using the sub-elements includedTypes and excludedTypes of the indexing element:

<includedTypes>
  <type>File</type>
  <type>Note</type>
</includedTypes>

<excludedTypes>
  <type>Root</type>
  <type>Workspace</type>
</excludedTypes>

If you set both included and excluded types, only the included types configuration will be taken into account.

Full-Text

<fulltext disabled="false" searchDisabled="true" fieldSizeLimit="1000" analyzer="english" catalog="...">
  ...
</fulltext>

disabled:
- The default is false, i.e., fulltext enabled.
- Use true to disable full-text support.

searchDisabled:
- The default is false.
- Use true to disable VCS full-text search (based on database backend), the full-text extraction is done and searchable using Elasticsearch.

fieldSizeLimit: used to specify the maximum size of the full-text that is extracted and indexed.
- The default is 131072, i.e., 128 KB.
- Use 0 to remove any limit (but this will penalize Elasticsearch indexing).

analyzer: a full-text analyzer, the content of this attribute depends on the backend used:
- H2: a Lucene analyzer, for instance org.apache.lucene.analysis.fr.FrenchAnalyzer. The default is an English analyzer.
- PostgreSQL: a Text Search configuration, for instance french. The default is english. See the PostgreSQL documentation
- Oracle: an Oracle PARAMETERS for full-text, as defined by the Oracle Text Indexing Elements (see NXP-4035 for details).
- Microsoft SQL Server: a full-text LANGUAGE, for instance english, as defined in the Microsoft Documentation. The default is english.
- other backends don't have configurable full-text analyzers.

catalog: a full-text catalog, the content of this attribute depends on the backend used:
- Microsoft SQL Server: a full-text CATALOG. The default is nuxeo.
- other backends don't need a catalog. Full-text indexes are queried in NXQL through the ecm:fulltext pseudo-field. A non-default index "foo" can be queried using ecm:fulltext_foo.

If no <index> elements are present, then a default index with all string and blob fields is used.

<fulltext ...>
  <index name="title" analyzer="..." catalog="...">
    <field>dc:title</field>
    <field>dc:description</field>
  </index>
  <index name="blobs">
    <fieldType>blob</fieldType>
  </index>
  <index name="other">
    <fieldType>string</fieldType>
    <excludeField>dc:title</excludeField >
  </index>
</fulltext>

index name: the name of the index. The default is default.
index analyzer: a full-text analyzer just for this index. See full-text options above for details.
index catalog: a full-text catalog just for this index. See full-text options above for details.
fieldType: string or blob, the default being both. This selects all these fields for indexing.
field: the name of a field that should be selected for indexing.
excludeField: the name of a field that should not be in the index.

If no <fieldType>, <field> or <excludeField> are present, then all string and blob fields are used.

Optimizations

pathOptimizations enabled: for PostgreSQL, Oracle and MS SQL Server (and H2), it is possible to disable the path-based optimizations by using false. The default is true, i.e. path optimizations enabled.
```
  <pathOptimizations enabled="false"/>
```
aclOptimization enabled: for PostgreSQL, Oracle and MS SQL Server (and H2), you can disable the read ACL optimizations by using false. The default is true, i.e. ACL optimizations enabled.
```
  <aclOptimizations enabled="false"/>
```
aclOptimizations readAclMaxSize: can be set to define the size of the largest ACL for a document; this may be useful if you have mostly assigned permissions to a lot of users instead of using groups (do not set this attribute if you disable ACL optimizations).
In case the user/group names in your directories contains the separator character used in the Read ACL cache, you can change this value using the attribute usersSeparator. The default depends on the database, it may be , (for H2 and PostgreSQL) or | (for Oracle and SQL Server).
```
  <usersSeparator key="," />
```
If you change this value on an existing database, you will need to rebuild the ACL cache with the SQL command:
```
  SELECT nx_rebuild_read_acls();
```

Database Creation Option

<ddlMode>execute</ddlMode>

The <ddlMode> specifies how the DDL (Data Definition Language) for repository initialization should be executed at startup. DDL includes:

CREATE TABLE, CREATE INDEX, ALTER TABLE ADD CONSTRAINT for a new schema or complex property,
ALTER TABLE ADD column for a new property in a schema,
CREATE FUNCTION, CREATE PROCEDURE, CREATE TRIGGER for VCS internal stored procedures and migration steps.

Depending on the chosen mode, the DDL to be executed may not be executed at all and instead dumped to the file log/ddl-vcs-default.sql. The possible values are:

ignore: no DDL detected or executed.
compat: compatibility mode with previous version, always executes DDL that recreates stored procedures (and does not attempt to detect existing ones).
execute: executes the DDL and starts Nuxeo normally.
dump: dump the DDL (if any) to the file but still executes it and starts Nuxeo normally.
dump,ignore: dump the DDL (if any) to the file but does not execute it and starts Nuxeo normally (will likely result in errors).
dump,abort: dumps the DDL (if any) to the file, and if not empty aborts startup.

For Nuxeo 6.0-HF24+ and Nuxeo 7.10-HF01+ the default is compat. For Nuxeo 8.1 and above the default is execute.

See NXP-17396 for details about <ddlMode> implementation.

<noDDL>true</noDDL> <!-- deprecated -->

For compatibility with previous Nuxeo versions, if no <ddlMode> element is specified, then <noDDL> is checked. The value true is mapped to a ddlMode of ignore, and the value false mapped to the default ddlMode.

<sqlInitFile>myconf.sql.txt</sqlInitFile>

If you need to execute additional SQL when the database is initialized (at every Nuxeo startup), you can use this to specify an additional SQL file to read and execute (unless noDDL is true). The format of an SQL init file is described below. Examples can be found in the standard SQL init files used by Nuxeo, which are available at https://github.com/nuxeo/nuxeo/tree/master/nuxeo-core/nuxeo-core-storage-sql/nuxeo-core-storage-sql/src/main/resources/nuxeovcs (in the appropriate branch for your version).

A SQL init file is a series of SQL statements.

A # starting a line (as the first character) makes the line a comment (ignored), except for a few special #-starting tags (see below).

SQL statements have to be separated from every other by a blank line.

A statement may be preceded by one or more tag lines, which are lines starting with #SOMETAG: (including the final colon), and may be:

#CATEGORY: defines the category for all future statements, until a new category is defined. See below for the use of categories.
#TEST: specifies that the following statement returns a certain number of rows, and if that number of rows is 0 then the variable emptyResult will be set to true, otherwise it will be set to false.
#SET_IF_EMPTY: variable or #SET_IF_NOT_EMPTY: variable will set the specified variable to true if the following statement returns an empty (or not empty) set of rows. If the condition is not met, the variable is not changed.
#IF: condition conditions execution of the single following statement on the value of the condition. A condition can be variable or ! variable for negation. Several #IF: tags may be used in a row (in different lines), and they are effectively ANDed together.
#IF: condition OR: othercondition OR: anothercondition allows to OR several conditions.
#PROC: somename specifies that the following SQL is a CREATE FUNCTION, CREATE PROCEDURE or CREATE TRIGGER for the given name, and that an attempt to detect whether it is already present to avoid re-creation should be done. This is used when DDL is dumped through <ddlMode>.

The following boolean variables are predefined by Nuxeo and the various database dialects, and may be used in #IF: tags:

emptyResult: true if the previous #TEST: statement returned no row,
fulltextEnabled: true if full-text is enabled,
fulltextSearchEnabled: true if full-text search is enabled,
clusteringEnabled: true if clustering is enabled,
aclOptimizationsEnabled: true if ACL optimizations are enabled,
pathOptimizationsEnabled: true if path optimizations are enabled,
proxiesEnabled: true if proxies are enabled,
softDeleteEnabled: true if soft delete is enabled,
sequenceEnabled: true if sequence-based ids are enabled.

Note that not all dialects define all variables, consult the specific dialect code or the standard Nuxeo SQL init file to know more.

SQL statements are regular SQL statements and will be executed as-is by the database, with variable substitution (see below). Depending on the dialect, it may or may not be necessary of forbidden to end some kinds of statement with a semicolon, please consult the standard Nuxeo SQL init file for the dialect to be sure. Note also that when writing multi-line stored procedures, you must not include a blank line for readability, as this blank line would be interpreted as the end of the whole multi-line SQL statement.

The following variables provide additional dialect-specific values that may be used in SQL statements using the variable substitution syntax ${variablename}:

idType: the SQL type used for ids,
idTypeParam: the SQL type used for ids in stored procedures (not all dialects use this),
idSequenceName: when sequence-based ids are enabled, the name of the sequence to use,
idNotPresent: a representation of a "marker" id to use in stored procedures to represent a non-existent id,
fulltextAnalyzer: the full-text analyzer defined in the repository configuration.

A few pseudo-SQL statements can be used to provide addition logging actions:

LOG.DEBUG message: logs the message at DEBUG level in the standard logger,
LOG.INFO message: logs the message at INFO level in the standard logger,
LOG.ERROR message: logs the message at ERROR level in the standard logger,
LOG.FATAL message: logs the message at FATAL level in the standard logger and throws an exception that will stop database initialization and make it unusable by Nuxeo.

To initialize the database, the statements of the following categories are executed in this order:

first
beforeTableCreation (at this point Nuxeo does a CREATE or ALTER on the tables based on the Nuxeo Schema definitions)
afterTableCreation
last

The following categories are executed in special circumstances:

addClusterNode: when creating a cluster node,
removeClusterNode: when shutting down a cluster node.

DBS Configuration

For a DBS (Document-Based Storage) configuration, you need to contribute to the extension point repository of the component org.nuxeo.ecm.core.storage.mongodb.MongoDBRepositoryService:

<?xml version="1.0"?>
<component name="default-repository-config">
  <require>org.nuxeo.runtime.mongodb.MongoDBComponent</require>
  <extension target="org.nuxeo.ecm.core.storage.mongodb.MongoDBRepositoryService"
      point="repository">
    <repository name="default" label="label.default.repository">
      <cache enabled="true" maxSize="1000"
             concurrencyLevel="10" ttl="10" />
      ...
    </repository>
  </extension>
</component>

This repository can be configured with the same clustering and indexing options as for VCS.

For the DBS implementation based on MongoDB, the repository definition relies on MongoDBConnectionService to instantiate a database connection.