The Nuxeo Platform allows you to upload binaries under a given "batch ID" on the server and then reference the batch ID when posting a document resource, or for fetching it from a custom Automation chain. For instance if you need to create a file with some binary content, first you have to upload the file into the BatchManager
. It's a place on the system where you can upload temporary files to bind them later.
There are two ways to upload a file:
- In one go: the full content of the file is transferred to the server as a binary stream in a single HTTP request. Such an upload is not resumable: in case of interruption you will need to start all over again.
- In chunks: the file content is transferred to the server as several binary streams in separate HTTP requests. This upload is resumable: in case of interruption you will only need to upload the remaining chunks.
Before uploading any file or chunk you need to initialize an upload batch.
Batch Initialization
POST http://NUXEO_SERVER/nuxeo/api/v1/upload/
Or with cURL:
curl -u Administrator:Administrator -X POST http://NUXEO_SERVER/nuxeo/api/v1/upload
Response:
201 Created
{"batchId": myBatchId}
Save this batch ID as it will be used in subsequent requests.
Uploading a File in One Go (not Resumable)
File upload
POST http://NUXEO_SERVER/nuxeo/api/v1/upload/<myBatchId>/0
X-File-Name: myFile.doc
X-File-Type: application/msword
-----------------------
The content of the file
...
Or with cURL:
curl -u Administrator:Administrator -X POST -H "X-File-Name:myFile.doc" -H "X-File-Type:application/msword" -T myFile.doc http://NUXEO_SERVER/nuxeo/api/v1/upload/<myBatchId>/0
Batch File Verification
You can verify that the file has actually been uploaded.
GET /api/v1/upload/<myBatchId>
Or with cURL:
curl -u Administrator:Administrator -G http://NUXEO_SERVER/nuxeo/api/v1/upload/<myBatchId>
Response:
200 OK
[{"name": "myFile.doc", "size": 115090, "uploadType": "normal"}]
Uploading a File in Chunks (Resumable)
This upload allows you to:
- Have a simple resume process that does not require starting the upload from the beginning to be able to access a specific byte.
- Upload of the different chunks in parallel.
This is the standard approach, as described in the Google Drive API documentation, Resumable Upload.
Here is an example of the resumable upload of a file cut up into 5 chunks.
Uploading Chunk i
out of 5
This step will be repeated 5 times, one for each chunk. Let's just start with <i>
= 0.
POST /api/v1/upload/<myBatchId>/0
X-Upload-Type: chunked
X-Upload-Chunk-Index: <i>
X-Upload-Chunk-Count: 5
X-File-Name: myFile.doc
X-File-Type: application/msword
X-File-Size: 115090
-----------------------
The content of the chunk
...
Or with cURL:
curl -u Administrator:Administrator -X POST -H "X-Upload-Type:chunked" -H "X-Upload-Chunk-Index:<i>" -H "X-Upload-Chunk-Count:5" -H "X-File-Name:myFile.doc" -H "X-File-Type:application/msword" -H "X-File-Size:115090" -T <chunk_i> http://NUXEO_SERVER/nuxeo/api/v1/upload/<myBatchId>/0
Response: there are 3 cases here.
The chunk has been uploaded but the file is incomplete, meaning some chunks are missing.
308 Resume Incomplete {"batchId": myBatchId, "fileIdx": "0", "uploadType": "chunked", "uploadedSize": chunkSize, "uploadedChunkIds": [0, 1, 2], "chunkCount": 5}
=> Repeat the step Uploading Chunk
i
out of 5 withX-Upload-Chunk-Index
= index of the next chunk to upload, the easiest being<i + 1>
. At this point a request to verify the chunk completion and determine the next chunk to upload can be made.The chunk has been uploaded and the file is now complete, meaning this was the last chunk to upload.
201 Created {"batchId": myBatchId, "fileIdx": "0", "uploadType": "chunked", "uploadedSize": chunkSize, "uploadedChunkIds": [0, 1, 2, 3, 4], "chunkCount": 5}
=> End of upload.
The request is interrupted or you recieve HTTP 503 Service Unavailable or any other 5xx response from the server, go to the Resume an Interrupted Upload step.
Resume an Interrupted Upload
Note the importance here of having saved the batch ID: it can be seen as a resumable upload session ID.
GET /api/upload/<myBatchId>/0
Or with cURL:
curl -u Administrator:Administrator -G http://NUXEO_SERVER/nuxeo/api/v1/upload/<myBatchId>/0
Response: again there are 3 cases here.
The file is incomplete.
308 Resume Incomplete {"name": myFile.doc, "size": 115090, "uploadType": "chunked", "uploadedChunkIds": [0, 1, 2], "chunkCount": 5}
=> Repeat the step Uploading Chunk
i
out of 5 withX-Upload-Chunk-Index
= index of the next chunk to upload, in this case 3.The file is now complete, meaning all chunks have been uploaded. This could happen if the connection broke after all bytes were uploaded but before the client received a response from the server.
200 OK {"name": myFile.doc, "size": 115090, "uploadType": "chunked", "uploadedChunkIds": [0, 1, 2, 3, 4], "chunkCount": 5}
=> End of upload.
The request is interrupted or you receive HTTP 503 Service Unavailable or any other 5xx response from the server, go to the Resume an Interrupted Upload step.
Best Practices
You should follow the Best Practices advised in the Google Drive API documentation about File Upload, especially the Exponential backoff strategy.
Creating a Document from an Uploaded File
You can create a document of type File and attach to it a file uploaded to a given batch by using the specific syntax on the file:content
property.
That fact that the file has been uploaded in one go or in chunks has no importance here.
POST /api/v1/path/default-domain/workspaces/myworkspace
{
"entity-type": "document",
"name":"myNewDoc",
"type": "File",
"properties" : {
"dc:title":"My new doc",
"file:content": {
"upload-batch":"<myBatchId>",
"upload-fileId":"0"
}
}
}
Or with cURL:
curl -X POST -H 'Content-Type: application/json' -u Administrator:Administrator -d '{"entity-type": "document", "name": "myNewDoc", "type": "File", "properties": {"dc:title": "My new doc", "file:content": {"upload-batch": "<myBatchId>", "upload-fileId": "0"}}}' http://NUXEO_SERVER/nuxeo/api/v1/path/default-domain/workspaces/myworkspace
Finally you now can access the content of your file by pointing to the following resource:
GET /api/v1/path/default-domain/workspaces/myworkspace/myNewDoc/@blob/file:content
Uploading Several Files and Using Them
The batch-upload API lets you upload several files, and then reference them by their indices in the batch (see examples above, using the "upload-fileId": "0"
field).
Something important to understand is that as soon as a batch is used, it is cleared. So, if you upload 10 files, as soon as you create a Document and reference the first file of the batch, the batch is cleared and the other files are lost, referencing them in the batch will just create a null
blob.
This can be avoided by passing a special header, X-Batch-No-Drop
and setting it to true
. When a batch is used and this header exists and is true
, the batch is not deleted.
Ultimately, as an example, to create 10 documents in a single batch upload:
- Create the batch and upload the 10 files
- Create the first 9 Documents as explained above, and pass the
X-Batch-No-Drop
, and set it totrue
- For the last one, omit the header so Nuxeo can clean up the batch and free memory and space.
Documentation can be found here.
Learn More
- Follow the courses Importing Files with the REST API and Importing Documents / REST API Import at Hyland University.