The Amazon S3 Online Storage is a Nuxeo Binary Manager for S3. It stores Nuxeo's binaries (the attached documents) in an Amazon S3 bucket.
Before You Start
You should be familiar with Amazon S3 and be in possession of your credentials.
Installation
This addon requires no specific installation steps. It can be installed like any other package with nuxeoctl command line or from the Update Center.
Nuxeo Configuration
In order to configure the package, you will need to provide values for the configuration parameters that define your S3 credentials, bucket and encryption choices.
For the case of a single repository, you can do the configuration using the nuxeo.conf
properties described below. For more complex setups, you will need to use an XML extension point, see further down for details.
Specifying Your Amazon S3 Parameters
In nuxeo.conf
, add the following lines:
nuxeo.s3storage.bucket=your_s3_bucket_name
nuxeo.s3storage.awsid=your_AWS_ACCESS_KEY_ID
nuxeo.s3storage.awssecret=your_AWS_SECRET_ACCESS_KEY
You can also use IAM instance roles (since 5.9.1), in which case you do not need to specify the AWS ID and secret (the credentials will be fetched automatically from the instance metadata).
If you are using an S3 compatible storage service, then you will most likely also need to set the endpoint parameter in nuxeo.conf
nuxeo.s3storage.endpoint=hostname
If you installed the bundle JAR manually instead of using the Nuxeo Package you will also need:
nuxeo.core.binarymanager=org.nuxeo.ecm.core.storage.sql.S3BinaryManager
The bucket name is unique across all of Amazon, you should find something original and specific.
The file nuxeo.conf
now contains S3 secret access keys, you should protect it from prying eyes.
You can also add the following optional parameters:
nuxeo.s3storage.region=us-west-1
nuxeo.s3storage.bucket_prefix=myfolder/
The region code can be:
- For us-east-1 (the default), don't specify this parameter
- For us-west-1 (Northern California), use
us-west-1
- For us-west-2 (Oregon), use
us-west-2
- For eu-west-1 (Ireland), use
EU
- For ap-southeast-1 (Singapore), use
ap-southeast-1
- For ap-southeast-2 (Tokyo), use
ap-southeast-2
- For sa-east-1 (Sao Paulo), use
sa-east-1
You can also use a bucket prefix to localize your binaries within specific S3 folder (the bucket_prefix
syntax is available since Nuxeo 7.10-HF03, before that you'll need to modify explicitly the binary manager XML file and use nuxeo.s3storage.bucket.prefix
but this syntax was removed).
Client-Side Crypto Options
With S3 you have the option of storing your data encrypted using S3 Client-Side Encryption. Note that the local cache will not be encrypted.
The S3 Binary Manager can use a keystore containing a keypair, but there are a few caveats to be aware of:
The Sun/Oracle JDK doesn't always allow the AES256 cipher which the AWS SDK uses internally. Depending on the US export restrictions for your country, you may be able to modify your JDK to use AES256 by installing the "Java Cryptography Extension Unlimited Strength Jurisdiction Policy Files". See the following link to download the files and installation instructions: http://www.oracle.com/technetwork/java/javase/downloads/index.html
Don't forget to specify the key algorithm if you create your keypair with the
keytool
command, as this won't work with the default (DSA). The S3 Binary Manager has been tested with a keystore generated with this command:keytool -genkeypair -keystore </path/to/keystore/file> -alias <key alias> -storepass <keystore password> -keypass <key password> -dname <key distinguished name> -keyalg RSA
If you get
keytool error: java.io.IOException: Incorrect AVA format
, then ensure that the distinguished name parameter has a form such as:-dname "CN=AWS S3 Key, O=example, DC=com".
Don't forget to make backups of the /path/to/keystore/file
file along with the store password, key alias and key password. If you lose them (for instance if the EC2 machine hosting the Nuxeo instance with the original keystore is lost) you will lose the ability to recover any encrypted blob from the S3 backet.
With all that above in mind, here are the crypto options that you can add to nuxeo.conf
(they are all mandatory once you specify a keystore):
nuxeo.s3storage.crypt.keystore.file=/absolute/path/to/the/keystore/file
nuxeo.s3storage.crypt.keystore.password=the_keystore_password
nuxeo.s3storage.crypt.key.alias=the_key_alias
nuxeo.s3storage.crypt.key.password=the_key_password
Server-Side Crypto Options
Alternatively, you can use S3 Server-Side Encryption with the following option:
nuxeo.s3storage.crypt.serverside=true
Client-Side Encryption is safer than Server-Side Encryption. With Client-Side Encryption an attacker need both access to the AWS credentials and the key to be able to access the unencrypted data while Server-Side Encryption will only require the potential attacker to provide the AWS credentials.
Cache Options
Files retrieved from S3 are cached locally for speed. You can configure the maximum cache size (in bytes or with the standard KB, MB, GB or TB suffixes), the maximum number of files in the cache, and the minimum age (in seconds) a file should have before being eligible for purge (the age is the time since last file access).
nuxeo.s3storage.cachesize=100MB
nuxeo.s3storage.cachecount=10000
nuxeo.s3storage.cacheminage=3600
cachecount
and cacheminage
are available since Nuxeo 7.10-HF03.
Download From S3 Options
Since Nuxeo 7.4, you can configure downloads to be directly served to the user from S3 without going through Nuxeo. To do so, use:
nuxeo.s3storage.directdownload=true
nuxeo.s3storage.directdownload.expire=3600
The expire time is expressed in seconds (the default is one hour) and determines how long the generated S3 URLs are valid. Having short-lived URLs is better for security, but too short an expiration time could be problematic if your server clock is not exactly in sync with the absolute official time use by S3.
Before Nuxeo 7.10 the configuration was done using property nuxeo.s3storage.downloadfroms3
instead of nuxeo.s3storage.directdownload
(same with expire
). This is still available for backward compatibility after Nuxeo 7.10 but will be removed in a future version, so the nuxeo.s3storage.directdownload
version above should be preferred.
Connection Pool Options
You can configure the internal S3 connection pool. This pool has a size of 50 by default, so if you've configured Nuxeo to use more sessions than this and all the sessions are accessing S3, you may run out of connections.
The following parameters can be used to change some connection pool parameters (the defaults are shown):
nuxeo.s3storage.connection.max=50
nuxeo.s3storage.connection.retry=3
nuxeo.s3storage.connection.timeout=50000
nuxeo.s3storage.socket.timeout=50000
The timeouts are expressed in milliseconds.
You can read more about these parameters on the AWS ClientConfiguration documentation page.
Checking Your Configuration
To check that installation went well, you can check your startup logs and look for a line like:
INFO [S3BinaryManager] Repository 'default' using S3BinaryManager
Don't forget to enable the INFO
level for the group org.nuxeo
in $NUXEO_HOME/lib/log4j.xml
to see INFO level messages from Nuxeo classes.
If your configuration is incorrect, this line will be followed by some error messages describing the problems encountered.
AWS Configuration
You must have appropriate permissions set on your bucket. In particular, note that the less commonly-used permissions s3:AbortMultipartUpload
, s3:ListMultipartUploadParts
and s3:ListBucketMultipartUploads
are required.
Here is a sample AWS S3 Policy that you can use; make sure that you replace yourbucketname
with your own bucket name.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListAllMyBuckets"
],
"Resource": "arn:aws:s3:::*"
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts",
"s3:ListBucketMultipartUploads"
],
"Resource": "arn:aws:s3:::yourbucketname"
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts",
"s3:ListBucketMultipartUploads"
],
"Resource": "arn:aws:s3:::yourbucketname/*"
}
]
}
Nuxeo Configuration Through Extension Point
The above configuration uses nuxeo.conf
properties prefixed with nuxeo.s3storage
, which is useful for simple configurations. However if you plan on using several S3 blob managers, you must configure them using an XML extension point. The following is an example for the default
blob manager:
<extension target="org.nuxeo.ecm.core.blob.BlobManager" point="configuration">
<blobprovider name="default">
<class>org.nuxeo.ecm.core.storage.sql.S3BinaryManager</class>
<property name="awsid">your_AWS_ACCESS_KEY_ID</property>
<property name="awssecret">your_AWS_SECRET_ACCESS_KEY</property>
<property name="region">us-west-1</property>
<property name="bucket">your_s3_bucket_name</property>
<property name="bucket_prefix">myprefix/</property>
<property name="directdownload">true</property>
<property name="directdownload.expire">3600</property>
<property name="cachesize">100MB</property>
<property name="crypt.keystore.file">/my/keystore.jks</property>
<property name="crypt.keystore.password">password</property>
<property name="crypt.key.alias">mykey</property>
<property name="crypt.key.password">password</property>
<property name="connection.max">50</property>
<property name="connection.retry">3</property>
<property name="connection.timeout">50000</property>
<property name="socket.timeout">50000</property>
</blobprovider>
</extension>
Note that this needs to override the default configuration present in the default Nuxeo template default-repository-config.xml.nxftl
, which already defines the standard configuration for the default
blob provider. You may need to <require>default-repository-config</require>
in order for the override to be correctly taken into account.