REST API

How to Upload a File in Nuxeo Platform Using REST API Batch Processing Endpoint

Updated: July 17, 2023

 

Extract from the course "Working with the REST API" at Hyland University

The Platform provides facilities for uploading binaries under a given "batch id" on the server, and then to reference that batch id when posting a document resource, or for fetching it from a custom automation chain. For instance if you need to create a file with some binary content, first you have to upload the file into the BatchManager. It's a place on the system where you can upload temporary files to bind them later.

There are two ways to upload a file:

  1. In one go: the full content of the file is transferred to the server as a binary stream in a single HTTP request. Such an upload is not resumable: in case of interruption you will need to start all over again.
  2. In chunks: the file content is transferred to the server as several binary streams in separate HTTP requests. Such an upload is resumable: in case of interruption you will only need to upload the remaining chunks.

Before uploading any file or chunk you need to initialize an upload batch.

Batch Initialization

POST /api/v1/upload/

Or with cURL:

curl -u Administrator:Administrator -X POST http://<host>:<port>/nuxeo/api/v1/upload

Response:

201 Created
{"batchId": myBatchId}

You need to save this batch id as it willl be used in subsequent requests.

Uploading a File in One Go (not Resumable)

File upload

POST /api/v1/upload/<myBatchId>/0
X-File-Name: myFile.doc
X-File-Type: application/msword
-----------------------
The content of the file
...

Or with cURL:

curl -u Administrator:Administrator -X POST -H "X-File-Name:myFile.doc" -H "X-File-Type:application/msword" -T myFile.doc http://<host>:<port>/nuxeo/api/v1/upload/<myBatchId>/0

Batch File Verification

You can verify that the file has actually been uploaded.

GET /api/v1/upload/<myBatchId>

Or with cURL:

curl -u Administrator:Administrator -G http://<host>:<port>/nuxeo/api/v1/upload/<myBatchId>

Response:

200 OK
[{"name": "myFile.doc", "size": 115090, "uploadType": "normal"}]

Uploading a File in Chunks (Resumable)

Such an upload allows:

  • To have a simple resume process that does not require to be able to access a specific byte.
  • To multiplex / parallelize the upload of the different chunks.

This is a standard approach, as very well described in the Google Drive API documentation about Resumable Upload.

Here is an example of a resumable upload of a file cut up into 5 chunks.

Uploading Chunk i out of 5

This step will be repeated 5 times, one for each chunk. Let's just start with <i> = 0.

POST /api/v1/upload/<myBatchId>/0
X-Upload-Type: chunked
X-Upload-Chunk-Index: <i>
X-Upload-Chunk-Count: 5
X-File-Name: myFile.doc
X-File-Type: application/msword
X-File-Size: 115090
-----------------------
The content of the chunk
...

Or with cURL:

curl -u Administrator:Administrator -X POST -H "X-Upload-Type:chunked" -H "X-Upload-Chunk-Index:<i>" -H "X-Upload-Chunk-Count:5" -H "X-File-Name:myFile.doc" -H "X-File-Type:application/msword" -H "X-File-Size:115090" -T <chunk_i> http://<host>:<port>/nuxeo/api/v1/upload/<myBatchId>/0

Response: there are three cases here.

  1. The chunk has been uploaded but the file is incomplete, meaning some chunks are missing.

    308 Resume Incomplete
    {"batchId": myBatchId, "fileIdx": "0", "uploadType": "chunked", "uploadedSize": chunkSize, "uploadedChunkIds": [0, 1, 2], "chunkCount": 5}
    

    => Repeat the step Uploading Chunk i out of 5 with X-Upload-Chunk-Index = index of the next chunk to upload, the easiest being <i + 1>. At this point a request to know the chunk completion and determine the next chunk to upload can be made.

  2. The chunk has been uploaded and the file is now complete, meaning this was the last chunk to upload.

    201 Created
    {"batchId": myBatchId, "fileIdx": "0", "uploadType": "chunked", "uploadedSize": chunkSize, "uploadedChunkIds": [0, 1, 2, 3, 4], "chunkCount": 5}
    

    => End of upload.

  3. The request is interrupted or you recieve HTTP 503 Service Unavailable or any other 5xx response from the server, go to the Resume an Interrupted Upload step.

Resume an Interrupted Upload

Note the importance here of having saved the batch id: it can be seen as a resumable upload session id.

GET /api/upload/<myBatchId>/0

Or with cURL:

curl -u Administrator:Administrator -G http://<host>:<port>/nuxeo/api/v1/upload/<myBatchId>/0

Response: again there are three cases here.

  1. The file is incomplete.

    308 Resume Incomplete
    {"name": myFile.doc, "size": 115090, "uploadType": "chunked", "uploadedChunkIds": [0, 1, 2], "chunkCount": 5}
    

    => Repeat the step Uploading Chunk i out of 5 with X-Upload-Chunk-Index = index of the next chunk to upload, in this case 3.

  2. The file is now complete, meaning all chunks have been uploaded. This could happen if the connection broke after all bytes were uploaded but before the client received a response from the server.

    200 OK
    {"name": myFile.doc, "size": 115090, "uploadType": "chunked", "uploadedChunkIds": [0, 1, 2, 3, 4], "chunkCount": 5}
    

    => End of upload.

  3. The request is interrupted or you recieve HTTP 503 Service Unavailable or any other 5xx response from the server, go to the Resume an Interrupted Upload step.

Best Practices

You should follow the Best Practices advised in the Google Drive API documentation about File Upload, especially the Exponential backoff strategy.

Creating a Document from an Uploaded File

You can create a document of type File and attach to it a file uploaded to a given batch by using the specific syntax on the file:content property.

That fact that the file has been uploaded in one go or in chunks has no incidence here.

POST /api/v1/path/default-domain/workspaces/myworkspace
{
  "entity-type": "document",
  "name":"myNewDoc",
  "type": "File",
  "properties" : {
    "dc:title":"My new doc",
    "file:content": {
      "upload-batch":"<myBatchId>",
      "upload-fileId":"0"
    }
  }
}

Or with cURL:

curl -X POST -H 'Content-Type: application/json' -u Administrator:Administrator -d '{"entity-type": "document", "name": "myNewDoc", "type": "File", "properties": {"dc:title": "My new doc", "file:content": {"upload-batch": "<myBatchId>", "upload-fileId": "0"}}}' http://<host>:<port>/nuxeo/api/v1/path/default-domain/workspaces/myworkspace

Finally you now can access the content of your file by pointing to the following resource:

GET /api/v1/path/default-domain/workspaces/myworkspace/myNewDoc/@blob/file:content