VCS Configuration

VCS (Visible Content Store) is the default storage engine for Nuxeo documents.

The following are the options available to configure VCS repository in Nuxeo Platform. They usually go in a file named default-repository-config.xml.

In a standard Nuxeo this file is generated from a template, and many elements or attributes actually take their values from parameters in nuxeo.conf.

Example File

This file is for illustration and contains many more options than are necessary by default.

<?xml version="1.0"?>
<component name="default-repository-config">
  <extension target="org.nuxeo.ecm.core.storage.sql.RepositoryService" point="repository">
    <repository name="default"
      factory="org.nuxeo.ecm.core.storage.sql.ra.PoolingRepositoryFactory">
      <pool minPoolSize="0" maxPoolSize="20"
        blockingTimeoutMillis="100" idleTimeoutMinutes="10" />
      <clustering enabled="true" delay="1000" />
      <idType>varchar</idType>
      <schema>
        <field type="largetext">note</field>
      </schema>
      <indexing>
        <includedTypes>
          <type>File</type>
          <type>Note</type>
        </includedTypes>
        <!-- sample for excluded types -->
        <!--
        <excludedTypes>
          <type>Root</type>
          <type>Workspace</type>
        </excludedTypes>
        -->
        <fulltext analyzer="english"> <!-- PostgreSQL -->
          <index name="default">
            <!-- all props implied -->
          </index>
          <index name="title">
            <field>dc:title</field>
          </index>
          <index name="description">
            <field>dc:description</field>
            <excludeField>content/data</excludeField>
          </index>
        </fulltext>
        <queryMaker class="org.nuxeo.ecm.core.storage.sql.NXQLQueryMaker" />
        <queryMaker class="org.nuxeo.ecm.core.chemistry.impl.CMISQLQueryMaker" />
      </indexing>
      <binaryStore path="binaries"/>
      <binaryManager class="org.nuxeo.ecm.core.storage.sql.DefaultBinaryManager"/>
      <usersSeparator key="," />
      <aclOptimizations enabled="true" readAclMaxSize="4096"/>
      <pathOptimizations enabled="true"/>
      <noDDL>false</noDDL>
      <sqlInitFile>myconf.sql.txt</sqlInitFile>
    </repository>
  </extension>
</component>

Pooling Options

<pool minPoolSize="0" maxPoolSize="20"
  blockingTimeoutMillis="100" idleTimeoutMinutes="10" />

minPoolSize: the minimum pool size (default is 0) (see nuxeo.vcs.min-pool-size in nuxeo.conf).
maxPoolSize: the maximum pool size, above which connections will be refused (default is 20) (see nuxeo.vcs.max-pool-size in nuxeo.conf).
blockingTimeoutMillis: the maximum time (in milliseconds) the pool will wait for a new connection to be available before deciding that it cannot answer a connection request (pool saturated).
idleTimeoutMinutes: the time (in minutes) after which an unused pool connection will be destroyed.

This is available only when using using Tomcat (see NXP-9763). When using JBoss, the pool is configured through default-repository-ds.xml.

Clustering Options

<clustering enabled="true" delay="1000" />

clustering enabled: use true to activate Nuxeo clustering (default is false, i.e., no clustering) (see repository.clustering.enabled in nuxeo.conf).
clustering delay: a configurable delay in milliseconds between two checks at the start of each transaction, to know if there are any remote invalidations (see repository.clustering.delay in nuxeo.conf).

Column Types

Large Text / CLOB Columns

To specify length constraints on text fields, use restrictions in the XML Schemas of your document type.

If you want the text field to a precise length limit:

  <xs:simpleType name="longString">
    <xs:restriction base="xs:string">
      <xs:maxLength value="65536" />
    </xs:restriction>
  </xs:simpleType>

  <xs:element name="text" type="mail:longString"/>

If you want the text field to have no length limit:

<xs:simpleType name="clob">
    <xs:restriction base="xs:string">
      <xs:maxLength value="999999999" />
    </xs:restriction>
  </xs:simpleType>

  <xs:element name="note" type="nxs:clob"/>

This is important for your large text fields, especially for MySQL, Oracle and SQL Server which have very small defaults for standard text fields.

Using Oracle, if you attempt to save a string too big for the standard NVARCHAR2(2000) field, you will get the error:

java.sql.SQLException: ORA-01461: can bind a LONG value only for insert into a LONG column

Note for default Nuxeo fields

If you need to specify a length on a Nuxeo field, you should use the following code:

<schema>
  <field type="largetext">note</field>
  <field type="largetext">my:field</field>
  ...
</schema>

field type="largetext": a field that should be stored as a CLOB column inside the database instead of a standard VARCHAR column.

Id Column Type

In standard Nuxeo the document id is a UUID stored as a string, for instance "9ea9a461-e131-4127-9a57-08b5b9b80ecb".

Starting with Nuxeo 5.7.1, it's possible on select databases to use a more efficient id representation:

 <idType>varchar</idType>

The following values for idType are possible:

varchar: a varchar-based UUID (the default),
uuid: a native uuid (only on PostgreSQL (NXP-4803)),
sequence: a sequence-based integer (on PostgreSQL (NXP-10894) and SQL Server 2012 (not Azure) (NXP-10912)). Instead of just sequence you can also use sequence:your_sequence_name if you want to use another sequence than the default one (hierarchy_seq).

When using a sequence, the document ids will be simple incremental small integers instead of randomly-generated UUIDs.

Note that switching this option to a new value will require a full dump, manual conversion and restore of your database, so it should be specified before starting Nuxeo for the first time.

Indexing Options

Configuring Which Types Will Be Indexed

It is possible to configure the document types you want to index or you want to exclude from full-text indexing. This is possible using the tags includedTypes and excludedTypes inside the indexing tag:

<includedTypes>
  <type>File</type>
  <type>Note</type>
</includedTypes>

<excludedTypes>
  <type>Root</type>
  <type>Workspace</type>
</excludedTypes>

If you set both included and excluded types, only the included types configuration will be taken into account.

Full-Text

<fulltext disabled="true" analyzer="english" catalog="...">
  ...
</fulltext>

full-text disabled: use true to disable full-text support, the repository configuration must be updated to have (default is false, i.e., fulltext enabled).
full-text analyzer: a full-text analyzer, the content of this attribute depends on the backend used:
- H2: a Lucene analyzer, for instance org.apache.lucene.analysis.fr.FrenchAnalyzer. The default is an English analyzer.
- PostgreSQL: a Text Search configuration, for instance french. The default is english. See http://www.postgresql.org/docs/8.3/static/textsearch-configuration.html
- Oracle: an Oracle PARAMETERS for full-text, as defined by http://download.oracle.com/docs/cd/B19306_01/text.102/b14218/cdatadic.htm (see NXP-4035 for details).
- Microsoft SQL Server: a full-text LANGUAGE, for instance english, as defined in http://msdn.microsoft.com/en-us/library/ms187787(v=SQL.90).aspx.aspx). The default is english.
- other backends don't have configurable full-text analyzers.
full-text catalog: a full-text catalog, the content of this attribute depends on the backend used:
- Microsoft SQL Server: a full-text CATALOG, the default is nuxeo.
- other backends don't need a catalog.

Full-text indexes are queried in NXQL through the ecm:fulltext pseudo-field. A non-default index "foo" can be queried using ecm:fulltext_foo.

If no <index> elements are present, then a default index with all string and blob fields is used.

<fulltext ...>
  <index name="title" analyzer="..." catalog="...">
    <field>dc:title</field>
    <field>dc:description</field>
  </index>
  <index name="blobs">
    <fieldType>blob</fieldType>
  </index>
  <index name="other">
    <fieldType>string</fieldType>
    <excludeField>dc:title</excludeField >
  </index>
</fulltext>

index name: the name of the index (the default is default).
index analyzer: a full-text analyzer just for this index. See full-text options above for details.
index catalog: a full-text catalog just for this index. See full-text options above for details.
fieldType: string or blob, the default being both. This selects all these fields for indexing.
field: the name of a field that should be selected for indexing.
excludeField: the name of a field that should not be in the index.

If no <fieldType>, <field> or <excludeField> is present, then all string and blob fields are used.

Binary Store

<binaryManager class="org.nuxeo.ecm.core.storage.sql.XORBinaryManager" key="abc"/>

binaryManager class: the default Binary Manager can be changed using this (the default is to use the standard binary manager that stores files normally). A new XORBinaryManager has been added, it knows how to do XOR with a pattern on read/write (see the key below). The on-disk binary store is unchanged (the hash of the files is still the filename), but of course it's now unreadable by humans by default. One consequence is that for the same file the application-level digest in the Binary object is now different if encryption is enabled.
binaryManager key: the encryption key for the binary manager (if it's doing any encryption). Changing this value will of course render existing binaries unreadable.

<binaryStore path="/foo/bar"/>

binaryStore path: the filesystem path where the binary store should live. A relative path is interpreted relative to the Nuxeo Framework home. The default is the binaries directory. (See repository.binary.store in nuxeo.conf.)

Optimizations

<pathOptimizations enabled="false"/>

pathOptimizations enabled: for PostgreSQL, Oracle and MS SQL Server (and H2), it is possible to disable the path-based optimizations by using false (the default is true, i.e., path optimizations enabled).

<aclOptimizations enabled="false"/>

aclOptimizations enabled: for PostgreSQL, Oracle and MS SQL Server (and H2), you can disable the read ACL optimizations by using false (the default is true, i.e., ACL optimizations enabled).
You can set the property readAclMaxSize to define the size of the larger ACL for a document : this may be useful if you have mainly affected permissions to a lot of users, instead of using groups (do not set this attribute if you disable ACL optimizations).

<usersSeparator key="," />

in case the user/group names in your directories contains the separator character used in the Read ACL cache(comma by default), you can change this value using the attribute usersSeparator
if you change this value on an existing database, you will need to rebuild the ACL cache with the SQL command: SELECT nx_rebuild_read_acls();

Database Creation Option

<noDDL>true</noDDL>

Set the value noDDL to true to execute no DDL (Data Definition Language). The default is false.

When this is true, VCS will assume that no new structure has to be created in the database. This means that none of these statements will be executed:

CREATE TABLE, CREATE INDEX, ALTER TABLE ADD CONSTRAINT for a new schema or complex property,
ALTER TABLE ADD column for a new property in a schema,
CREATE FUNCTION, CREATE PROCEDURE, CREATE TRIGGER for VCS internal stored procedures and migration steps.

The only statements that VCS will execute are:

INSERT, UPDATE, DELETE for data changes,
calling of stored procedures.

This means that all tables, indexes, triggers and stored procedures needed by VCS have to be created beforehand, either by a previous execution when the flag was false, or by a manual execution of a SQL script from a previously-created Nuxeo instance.

This option is typically needed if you configure the VCS connection with a username who is not the owner of the database, usually for security considerations.

<sqlInitFile>myconf.sql.txt</sqlInitFile>

If you need to execute additional SQL when the database is initialized (at every Nuxeo startup), you can use this to specify an additional SQL file to read and execute (unless noDDL is true). The format of an SQL init file is described below. Examples can be found in the standard SQL init files used by Nuxeo, which are available at https://github.com/nuxeo/nuxeo-core/tree/release-6.0/nuxeo-core-storage-sql/nuxeo-core-storage-sql/src/main/resources/nuxeovcs (in the appropriate branch for your version).

A SQL init file is a series of SQL statements.

A # starting a line (as the first character) makes the line a comment (ignored), except for a few special #-starting tags (see below).

SQL statements have to be separated from every other by a blank line.

A statement may be preceded by one or more tag lines, which are lines starting with #SOMETAG: (including the final colon), and may be:

#CATEGORY: defines the category for all future statements, until a new category is defined. See below for the use of categories.
#TEST: specifies that the following statement returns a certain number of rows, and if that number of rows is 0 then the variable emptyResult will be set to true, otherwise it will be set to false.
#IF: variable or #IF: ! variable, conditions execution of the single following statement on the value of the variable. Several #IF: tags may be used in a row (in different lines), and they are effectively ANDed together.

The following boolean variables are predefined by Nuxeo and the various database dialects, and may be use in #IF: tags:

emptyResult: true if the previous #TEST: statement returned no row,
fulltextEnabled: true if fulltext is enabled in the repository configuration,
clusteringEnabled: true if clustering is enabled,
aclOptimizationsEnabled: true if ACL optimizations are enabled,
pathOptimizationsEnabled: true if path optimizations are enabled,
proxiesEnabled: true if proxies are enabled,
softDeleteEnabled: true if soft delete is enabled,
sequenceEnabled: true if sequence-based ids are enabled.

Note that not all dialects define all variables, consult the specific dialect code or the standard Nuxeo SQL init file to know more.

SQL statements are regular SQL statements and will be executed as-is by the database, with variable substitution (see below). Depending on the dialect, it may or may not be necessary of forbidden to end some kinds of statement with a semicolon, please consult the standard Nuxeo SQL init file for the dialect to be sure. Note also that when writing multi-line stored procedures, you must not include a blank line for readability, as this blank line would be interpreted as the end of the whole multi-line SQL statement.

The following variables provide additional dialect-specific values that may be used in SQL statements using the variable substitution syntax ${variablename} :

idType: the SQL type used for ids,
idTypeParam: the SQL type used for ids in stored procedures (not all dialects use this),
idSequenceName: when sequence-based ids are enabled, the name of the sequence to use,
idNotPresent: a representation of a "marker" id to use in stored procedures to represent a non-existent id,
fulltextAnalyzer: the fulltext analyzer defined in the repository configuration.

A few pseudo-SQL statements can be used to provide addition logging actions:

LOG.DEBUG message: logs the message at DEBUG level in the standard logger,
LOG.INFO message: logs the message at INFO level in the standard logger,
LOG.ERROR message: logs the message at ERROR level in the standard logger,
LOG.FATAL message: logs the message at FATAL level in the standard logger and throws an exception that will stop database intialization and make it unusable by Nuxeo.

To initialize the database, the statements of the following categories are executed in this order:

first
beforeTableCreation
(at this point Nuxeo does a CREATE or ALTER on the tables based on the Nuxeo Schema definitions)
afterTableCreation
last

The following categories are executed in special circumstances:

addClusterNode: when creating a cluster node,
removeClusterNode: when shutting down a cluster node.