Blob Upload for Batch Processing

Extract from the course "Working with the REST API" on Hyland University

Motivations

The default way Automation deals with Blobs is to use the standard HTTP MultiPart Encoding.

This strategy can not fit when:

Your client does not natively support multipart encoding; Ex: JavaScript (without using a Form), Android SDK 2.x.
You have several files to send, but prefer to send them as separated chunks; Ex: You have an HTTP proxy that will limit POST size.
You want to upload files as soon as possible and then run the operation when everything has been uploaded on the server; Ex: You upload pictures you select from a mobile device.

Uploading Files

Since Nuxeo 7.4 the batch upload API has changed to be exposed as a REST resource endpoint. The old API using /site/automation/batch/upload is deprecated but kept for backward compatibility.

Batch Initialization

Before uploading any file you need to initialize a batch, even if there is only one file to upload.

POST /api/v1/upload/

This request returns a 201 CREATED status code with the following JSON data:

{"batchId": batchId}

This handshake phase is mandatory to acquire a server-side generated batch id to be used in subsequent requests as part of the REST resource path.

The batch id can be seens as an upload session id, especially for a resumable upload.

Uploading a File

You can do a simple POST with the payload containing your file, yet we also support multipart encoded upload.

POST /api/v1/upload/{batchId}/{fileIdx}

The batchId is the batch identifier, you need to use the one returned by the batch initialization request, otherwise you will get a 404 Not Found status code.

The fileIdx is the index of the file inside the batch, it is here to later reference the file by its index and also to keep track of the client side ordering, because the order the server receives the files may not be the same.

The batch identifier should be common to all the files you want to upload and attach to the same batch.

You also need to set some custom HTTP headers:

Header name	Description
`X-File-Name`	Name of the file
`X-File-Type`	Mime type of the file
`Content-Type`	Should be set to `application/octet-stream`
`Content-Length`	Size of the file in bytes, required if your HTTP client doesn't add this header, typically the Nuxeo JavaScript Client

Returns a 201 CREATED status code with the following JSON data:

{"batchId": batchId, "fileIdx": fileIdx, "uploadType": "normal", "uploadedSize": xxx}

The value of the uploadType field is normal by default, it can be chunked if the file was uploaded by chunks.

About the file storage implementation

The files uploaded to the batch are stored on a temporary disk space until the batch is executed or dropped.

For this purpose the batch upload relies on the default Transient Store that stores the uploaded files inside ${nuxeo.data.dir}/transientstores/default).

Getting Information about the Batch Files

GET /api/v1/upload/{batchId}

Returns a 200 _OK_ status code if the batch contains at least one file and a 204 No Content status code if the batch doesn't contain any file. JSON response data:

[{"name": file1, "size": yyy, "uploadType": "normal"}, {"name": file2, "size": zzz, "uploadType": "normal"}]

Getting Information about a Specific Batch File

GET /api/v1/upload/{batchId}/{fileIdx}

Returns a 200 _OK_ status code if the batch contains a file with the given index and a 404 Not Found status code otherwise. JSON response data:

{"name": xxx, "size": yyy, "uploadType": "normal"}

Dropping a Batch

DELETE /api/v1/upload/{batchId}

Returns a 200 _OK_ status code with the following JSON data:

{"batchId": batchId, "dropped": "true"}

Executing a batch will automatically remove it.

Uploading a File by Chunks

Resumable upload became a requirement otherwise uploading large files over a broken connection could take days.

Using chunking is a good idea since:

It allows to manage upload resume with enough granularity (restart with chunk x).
It allows multiplexing (upload on multiple TCP streams)
It allows to overcome the limitations of some reverse proxies (limits the risk of having a POST considered as too big).

Uploading a Chunk

As for uploading a whole file, you can do a simple POST with the payload containing your chunk.

POST /api/v1/upload/{batchId}/{fileIdx}

The batchId and fileIdx serve the same purpose as for uploading a whole file. They should be common to all the chunks you want to upload for a given file in the batch.

You need to set the same HTTP headers as for a whole file, adding some extra ones:

Header name	Description
`X-Upload-Type`	`chunked`
`X-Upload-Chunk-Index`	Index of the chunk
`X-Upload-Chunk-Count`	Total chunk count
`X-File-Name`	Name of the file
`X-File-Size`	Size of the file in bytes
`X-File-Type`	Mime type of the file
`Content-Type`	Should be set to `application/octet-stream`
`Content-Length`	Size of the chunk in bytes, required if your HTTP client doesn't add this header, typically the Nuxeo JavaScript Client

X-Upload-Chunk-Index must be the number of the chunk in the ordered list of chunks, starting from 0.

For instance if the file is made of 5 chunks you will send 5 requests with the following headers and i between 0 and 4:

X-Upload-Chunk-Index: i
X-Upload-Chunk-Count: 5

Optionally depending on the HTTP client you are using you might need to add the Content-Length header to specify the size of the chunk in bytes.

As for a file uploaded in one go, the chunks attached to the batch are stored on a temporary disk storage until the batch is executed or dropped.

Returns a 201 CREATED status code for a complete chunked file and a 308 Resume Incomplete status code for an incomplete chunked file.

JSON response data:

{"batchId": batchId, "fileIdx": fileIdx, "uploadType": "chunked", "uploadedSize": xxx, "uploadedChunkIds": [0, 1, 2], "chunkCount": 5}

Getting Information about a Chunked File

GET /api/v1/upload/{batchId}/{fileIdx}

Returns a 200 _OK_ status code for a complete chunked file and a 308 Resume Incomplete status code for an incomplete chunked file.

It is this specific 308 Resume Incomplete status code that lets you know that you either need to upload the missing chunks or to resume an interrupted file upload.

If the batch doesn't contain any file with the given index, returns a 404 Not Found status code.

JSON response data:

{"name": xxx, "size": yyy, "uploadType": "chunked", "uploadedChunkIds": [0, 1, 2, 4], "chunkCount": 5}

Using Files From a Batch

Batch Execute

You can execute an automation chain or an automation operation using the blobs associated to a batch as input.

To place the blobs as input, call a specific batch operation by passing the operationId and batchId path parameters:

POST /api/v1/upload/{batchId}/execute/{operationId}
Accept: application/json+nxentity, */*
Content-Type: application/json+nxrequest; charset=UTF-8
X-NXDocumentProperties: *

{"params":{"operationParam":"value", ...},"context":{...}}

Optionally you can use the fileIdx path parameter to specify the index of the file inside the batch that you want to use as input of the chain or operation to execute.

POST /api/v1/upload/{batchId}/{fileIdx}/execute/{operationId}

This way of calling automation operation is actually used in the default UI to manage Drag&Drop:

Files are progressively uploaded to the server:
- You can drop several sets of files,
- There is a maximum number of concurrent uploads.
When upload is finished you can select the operation or chain to execute.

More info about Drag and Drop configuration.

Referencing a Blob From a Batch

An other option is to reference the file within the batch to create input parameters of an operation.

For that you can add a parameter of type properties that will automatically be resolved to the correct blob if the provided properties are the correct ones:

type = blob
length = 657656
mime-type = application/pdf
name = myfile.pdf
upload-batch = batchId-50b2ccb2-ce69-4fdc-b24e-b4ea8c155a05
upload-fileId = myfile.pdf

When using Java automation client, this would look like:

PropertyMap blobProp = new PropertyMap();
blobProp.set("type", "blob");
blobProp.set("length", new Long(blobUploading.getLength()));
blobProp.set("mime-type", blobUploading.getMimeType());
blobProp.set("name", blobToUpload.getFileName());
// set information for server side Blob mapping
blobProp.set("upload-batch", batchId);
blobProp.set("upload-fileId", blobUploading.getFileName());

Referencing a Blob from a JSON Document Resource

You can use the batchId property for blob in the JSON document you're sending to the REST API.

{
    "entity-type": "document",
    "repository": "default",
    "uid": "531d9636-46c2-497d-996b-1ae7a8f43e89",
    "path": "/default-domain",
    "type": "Domain",
    "state": "project",
    "versionLabel": "",
    "title": "Default domain",
    "lastModified": "2013-09-06T08:53:10.00Z",
    "properties": {
        "file:content": {
             "upload-batch":"batchId-50b2ccb2-ce69-4fdc-b24e-b4ea8c155a05",
             "upload-fileId":"0" // referencing the first file of the batch
         }
     },
    "facets": [
        "SuperSpace",
        "Folderish"
    ],
    "changeToken": "1378457590000",
    "contextParameters": {}
}

Blob Upload for Batch Processing

Motivations

Uploading Files

Batch Initialization

Uploading a File

Getting Information about the Batch Files

Getting Information about a Specific Batch File

Dropping a Batch

Uploading a File by Chunks

Uploading a Chunk

Getting Information about a Chunked File

Using Files From a Batch

Batch Execute

Referencing a Blob From a Batch

Referencing a Blob from a JSON Document Resource

Java API