Automation

Content Automation Concepts

Updated: July 17, 2023

Content Automation is a Nuxeo service that exposes common actions you do on a Nuxeo application as atomic operations. You can then assemble them to create complex business rules and logic, without writing any Java code. In other words, Content Automation provides a high level API over Nuxeo services — an API made of operations that can be assembled in more complex automation chains. These operations and chains can be called locally in Java, or remotely as being exposed via the REST API. Examples of business logic that you can implement with Automation are:

  • My documents should have a topic field;
  • Documents should be moved to another department when they are validated;
  • Procedures should become obsolete after one year;
  • Only the group “validators” can see it while the document is not published;
  • ...

You can also create new atomic operations (i.e. write a Java class that defines an operation) and contribute them, in addition to the set of built-in operations.

If you need to dynamically define the values of the operation parameters you can use scripting (e.g. EL syntax) to fetch the actual parameter values at execution time.

Studio can be used for defining automation chains.

The main goal of Content Automation is to enable users to rapidly build complex business logic without writing any Java code. First, they assemble the built-in set of atomic operations into complex chains. Then they can plug these chains inside the Nuxeo Platform as UI actions, event handlers, REST bindings and workflow logic.

Operation

An operation is a Java class with specific annotations, that usually is associated to an action. This action can be triggered by the user either directly through the user interface, or by responding to an event, or by a REST call to a remote server.

The operations an user can invoke usually deal with the document repository (like creating or updating documents), but they can also do some other tasks like sending emails, converting binaries, etc. The Automation service already provides tens of frequently used operations that you may need when building your business logic. Developers can contribute their own operations.  

The main elements that compose an operation are:

  • The category, useful for finding the operation in the Studio editor. All operations are grouped in categories depending on what they do (ex: document related, services, blob related, UI related ...);
  • The input: An operation has an input (provided by the cause);
  • The parameters: An operation may have zero or more parameters (used to parametrize the way an operation is behaving);
  • The output: An operation has an output (that can be used by the next operation in the chain as the input).

The Operation Input

The operation input can be a Document or a Blob (i.e. a file).

The execution context provides the input, which can come from:

  • The user action in the case of a single operation or when it is the first operation in the chain;
  • The output of the previous operation when executing a chain.

There are some special operations that don't need any input. For example you may want to run a query in the repository. In this case, you don't need an input for your query operation. Thus, operations can accept void as an input. To pass a void input to an operation, just use a null value as the input. If an operation doesn't expect any input (i.e, void input) and an input is given, it will be ignored.

Note that for advanced use cases, it is possible to contribute new input-output types.

The Operation Parameters

An operation can define parameters to be able to modify its execution at runtime depending on those parameter values.

Any parameter value can be expressed as a string. The string will be converted into the right type at runtime if possible. If not possible an exception is thrown.

There are several types of predefined parameters:

  • string: any string,
  • boolean: a boolean parameter,
  • integer: an integer number,
  • float: a floating point number,
  • date: a date (in W3C format if it is specified as a string),
  • resource: a URL to a resource,
  • properties: a Java properties content (key=value pairs separated by new lines),
  • document: a Nuxeo Document (use its absolute PATH or its UID when expressing it as a string),
  • blob: a Nuxeo blob (the raw content of the blob in the case of a REST invocation),
  • documents: a list of documents,
  • bloblist: a list of blobs,
  • any other object that is convertible from a string: you can register new object converters trough the adapters extension point of the org.nuxeo.ecm.core.operation.OperationServiceComponent component;
  • an expression: this represents a MVEL expression (which is compatible with basic EL expressions) that can output dynamic values. When using expressions you must prepend it with the prefix expr: or between @{ }. Example:

    expr:Document['dc:title']
    or
    @{Document['dc:title]}
    

    For more details about scripting you can look at the page Use of MVEL in Automation Chains.

  • an expression template: this is the same as an expression but it will be interpreted as a string (by doing variable substitution). This is very useful when you want to create expressions like this:

    expr: SELECT * FROM Document WHERE dc:title LIKE @{mytitle}
    

    where mytitle is a variable name that will be substituted with its string form. You can notice that you still need to prepend your template string with an expr: prefix. For more details about scripting you can look at the page Use of MVEL in Automation Chains.

The Operation Output

The operation output is either a Document, a Blob or void (like the input).

In some rare cases you may want your operation to not return anything (a void operation). For example your operation may send an email without returning anything. When an operation is returning void (i.e. nothing), then a null Java object will be returned.

As said before, the output of an operation is the input of the next operation when running in a chain.

Automation Chain

The power of operations is that they can be chained into a sort of macro operation that is composed of atomic operations. This way you can for example construct an automation chain that creates a document, then attaches a blob to the document, then publishes it, and so on. Each operation in the chain does the required step by using the result of the previous operation as input. When it finishes it outputs a result that will be used by the next operation as its input. This means that inside a chain, an input type of an operation must be compatible with the output type of the previous one. That's what is called the "execution path" of the chain. If your chain is not consistent in terms of operation input and output, you may get an "Cannot find valid path" exception.

Chains are operations too, and thus have the same characteristics as an atomic operation: they expect an input, provide an output and may have some parameters.

The following chain example creates a document of type invoice and automatically validates it.

Document > Create {"type":"invoice", "name":"2013May", "properties": "dc:title="May 2013 \n dc:description=hello world"}
Document > Follow Life Cycle Transition {"value":"approve"}

Calling Operations and Chains in the Framework

The framework makes it easy to call automation chains from:

Dynamical Expressions in Operation Parameters

Any operation parameter accepts dynamic expressions based on the use of MVEL. This is very useful to add more complexity to what your chain can do, and provides accessors to some useful data/functions.

In the following example, we compute the valid date store on the Dublin Core schema.

chainA - using dynamical languate for parameter values

Fetch > Contextual Document(s)
Document > Update Property {"xpath":"dc:valid", "value":"@{CurrentDate.days(7).calendar}"} // set the valid date on the document to current date plus seven days

See the dedicated page for learning all about Automation expression language.

Note that a dedicated operation exist when you want to execute a complete script: Scripting > Run Script. That operation does nothing else than executing the script that you set as the "script" parameter.

Chain Context

Another concept added to Automation is the notion of "Context". The context allows to share objects among several operations inside an automation chain. Context is accessed using the EL, via the array "Context": @{Context["my_value"]}.

The Automation module provides several operations to manage the context, under the category "Execution Context":

  • Set Context Variable (name, value): Allows to set a variable in the context. (Would be equivalent to Context["my_variable"]="toto", in a script)
  • Set Context Variable from Input (name): Allows to set a variable in the context from what comes in the operation input.
  • All the "Restore ..." operations: Allow to restore in input of the next operation a document, a blob, a list of document, or a list of blobs.

Note that the category of operations "Push & Pop" provides some facilitators for doing the same as with the Execution Context category. It just prevents you from naming the variables, as you play with a stack (you can push / pull / pop on, from and out of the stack). Here there is no specific recommendation, this is just a matter of style! :smile:

An example of a situation where you would need to use the context is when you want to implement inheritance. Let's say you want every document to inherit the dc:source field value from the parent workspace. You would implement the following chain:

Using the context for implementing inheritance

- Fetch > Context Document
- Execution Context > Set Context Variable from Input {"name": "docToBeUpdated"}  // we store the document for which we need to update the dc:source property
- Document > Get Parent {"type":"workspace"}  // This operation will return the first parent document of type "Workspace"
- Execution Context > Set Context Variable {"name":"workspaceSourceValue","value":"@{Document['dc:source']}"} // We store in the context the dc:source property value of the parent Workspace
- Execution Context > Restore Document From Input {"name": "docToBeUpdated"} // We restore the document for which we need to update the dc:source property in the next input
- Document > Update Property {"xpath":"dc:source", "value":"@{Context['workspaceSourceValue']"} // We update the property dc:source

Execution Flow: Sub Calls, Looping, Conditional Execution

The category "Execution flow" provides several operations that allow to call a chain from another one. As you will see, these operations do not have all the same behavior in terms of looping and transaction management. Those chains are also the way to implement conditional executions.

Simple Sub Calls

You can use the "Execution Flow > Run Chain" operation, that strictly works like an "include".

Let the following chains be:

chain2

- Document > Update Property {"xpath":"dc:title", "value":"hello world"}
- Document > Update Property {"xpath":"dc:source", "value":"hello source"}

chain3

- Document > Follow Life Cycle Transition {"value":"approve"}
- Document > Lock {"owner":"Administrator"}

Then, the following chains all have the same result and are equivalent:

chain4

- Document > Update Property {"xpath":"dc:title", "value":"hello world"}
- Document > Update Property {"xpath":"dc:source", "value":"hello source"}
- Document > Follow Life Cycle Transition {"value":"approve"}
- Document > Lock {"owner":"Administrator"}

chain5

- Document > Update Property {"xpath":"dc:title", "value":"hello world"}
- Document > Update Property {"xpath":"dc:source", "value":"hello source"}
- Execution Flow > Run Chain {"id":"chain3"}}

chain6

- Execution Flow > Run Chain {"id":"chain2"}}
- Execution Flow > Run Chain {"id":"chain3"}}

Conditional Call

The same "Run Chain" operation can be used for implementing a conditional flow, by using a scripted ternary expression as the value of the "id" parameter. The following chain will run chain1 if the type of the document is File. Otherwise it will run chain2.

chain7

- Fetch > Context Document(s)
- Execution Flow > Run Chain {"id":"@{Document.type=="File"?:"chain1":"chain2"}"}

You can use this technique with all the operations who fire execution of a sub chain (category Execution Flow).

When using sub chain calls, pay attention to have compatible input/output between parent and children chains, as otherwise, you will face "Cannot compute any valid path" errors.

Loops

Native Loop

Some operations have a list of objects as a signature: document, documents or blob, blobs. That means that they will execute their "run" method for each of the elements of the list. For example, if you have a chain that first does a query (thus returns a list of documents) and then uses the "Update Property" operation (which accepts document, documents), then each of the documents returned by the query will be updated:

chain8 - Native Loop

- Fetch > Query {"query":"SELECT * FROM File"} // Doing a query that will return multiple documents.
- Document > Update Property {"xpath":"dc:description", "value":"Showing native looping"} // The update property will be executed as many time as there are documents returned by the previous query, before going to the next operation.
- Document > Lock {"owner":"Administrator"}// Each of the documents returned by the previous operation will be locked.

An algorithm equivalent to the chain above would be something like :

//Note: The following code doesn't match any valid syntax, it is here to illustrate the algorithm logic behind the automation chain "chain8".
DocumentList list = Nuxeo.query("SELECT * FROM File");
For (DocumentModel doc:list){
doc.updatePropertyValue("dc:description","Showing native looping");
}
For (DocumentModel doc:list){
doc.lock("Administrator")
}

Loop on the Execution of a Complete Chain

Sometimes you don't want to do a loop at each operation level. You want to execute a whole chain as many times as you have documents in your list, for instance. In this case, you can use one of the following operations:

  • "Execution Flow > Run Document Chain" to iterate over the input document list.
  • "Execution Flow > Run File Chain" to iterate over the input blobs list.
  • "Execution Flow > Run For Each" to iterate over a given list (can be a string array, or what ever is a list).

An example use case would be if you want for all the documents resulting from a query to copy one field value into another field value, like value of dc:source in dc:description:

chain9 - Copy of the Property, for a given Document

- Fetch > Context Document(s)
- Execution Context > Set Context Variable {"name":"sourceValue","value":"@{Document['dc:source']}"}
- Document > Update Property {"xpath":"dc:description", "value":"@{Context['sourceValue']"}
- Document > Lock {"owner":"Administrator"}

chain10

- Fetch > Query {"query":"SELECT * FROM File"} // Doing a query that will return multiple documents.
- Execution Flow > Run Document Chain {"id": "chain8"}

An algorithm equivalent to executing chain9 would be:

//Note: The following code doesn't match any valid syntax, it is here to illustrate the algorithm logic behind the automation chain "chain10".
DocumentList list = Nuxeo.query("SELECT * FROM File");
For (DocumentModel doc:list){
sourceValue= doc.getPropertyValue("dc:source");
doc.updatePropertyValue("dc:description",sourceValue);
doc.lock;
}
// We see that there is only one loop, compared to native looping.

Loop per Page

It is possible to query documents in a paginated style using the Fetch > PageProvider operation. You can subsequently use the Execution Flow > Run For Each Page operation to execute a chain as many times as there are pages in the result set. This chain will receive as input a documentList corresponding to the content of each page of the query result set. (TODO: illustrate with an example). This is particularly useful when the number of documents in the query result prevents you from loading all the results in memory, otherwise firing a memory stack overflow.

Transaction Management

When looping over a great number of documents/items, you may get a transaction timeout (default value being 5 min). As a consequence, you should create a transaction for each execution of your chain, item per item. In the Execution Flow, you will find some operations that start a new transaction for each sub chain execution. That's what you want to use to create, update or delete tens of thousands of documents in one Automation execution.

User Principal Used for Operations Execution

The one used by default depends on the context of your Automation call:

  • REST bridge: The user authenticated by HTTP.
  • User Action: The connected user in the application.
  • Workflow: Sometimes the connected user, sometime system (depending on where in the workflow).
  • Event Handler: The user whose action fired the event.

If inside a chain you need to execute some operations under the identity of another principal, you can use the Users & Groups > Login As and Users & Groups > Logout operations. All operations in between will be executed under the session of the user id provided as a parameter for Login As (System if none is provided). TODO: explain limitations in terms of documents valid sessions and context.

Main Automation Categories Available

You now understand how powerful the Automation module is. All available operations are listed on the Nuxeo Platform Explorer. You can also write your own operations. In this section, we just stop by a few noteworthy ones.

Conversion

In this category, you will find operations for generating PDFs (Convert To PDF), Resizing pictures, Render FreeMarker Templates, (Render Document), Convert to a specific mime type.

Document

This category holds all the operations that are about handling documents: creation, deletion, security, updating properties... It is a highly populated category.

Execution Context

As we have seen, operations here are useful for managing a chain context.

Execution Flow

As we have seen operations here are useful for managing sub chains calls, looping, conditional executions.

Fetch

All the operations that can be helpful in the first position of a chain, as their role is to fetch a document or file, so as to work on it on the other operations of the chain. You can fetch documents by their path, id, by query, the one selected in the User interface, ... and much more.

Files

With operations here you can get a binary stored on a given property of a document, export a binary to file system, zip some files. All operations that are around handling binaries.

Notifications

Not a lot of operations in there, but essentials ones: send an email, add entries in log4j logs, send events to the Nuxeo Platform's event bus.

Push & Pop

Complementary operations related to context management.

Scripting

For executing scripts.

Services

Various operations here that wrap some of the Nuxeo Platform services: relations, tasks, audit, File Manager.

User interface

Operations here are quite "magical", as they enable to simulate user clicks in the user interface. It enables to automate document selections, fire file download, play with the worklist, change the current document, show a specific creation form and much more.

Users & Groups

One of the operations here is "Login As", that allows you to execute an automation chain in the name of someone else than the currently logged in user. Useful for doing things under system session for instance.

Workflow Context

All operations there are around running and managing workflow instances. It is a very powerful API on top of our advanced workflow engine. More information in the workflow section.

Installing some modules from the Nuxeo Marketplace also deploys new operations. This is for instance the case with the Nuxeo Drive add-on, as well as with Template Rendering add-on. You can always check what operations are available on your Nuxeo Platform instance at the following URL: http://nuxeo_host/nuxeo/site/automation/doc.