Uploads (aka Imports, aka Batch Processing)¶
POSTing to /rest/v1/upload/ allows you to send many actions for processing to ActionKit in a single request. The actions will be processed using the same code and rules as User Imports in the ActionKit admin.
You should familiarize yourself with the details of User imports before using this API endpoint.
This documentation covers using uploads from the API - requirements for uploading import files, the meaning of status values in the returned data, how to track progress, how to access warnings and errors, and how to stop an in-process upload.
Overview¶
Format Your Upload File¶
Your file must:
- Be formatted as a TSV or CSV.
- Include a header row with correct field names.
- Include a column that identifies the user. The choices are "email", "user_id" or "akid".
- Be saved in the UTF-8 encoding.
- Specify times in UTC.
Large files should be compressed using gzip or Zip compression to reduce upload times.
See the User Import documentation for full details of the formatting your import file.
Create A Multipart POST Request¶
How you create a correctly formatted multipart POST will depend on how you are connecting to ActionKit. ActionKit relies on Django's HttpRequest class to parse the incoming POST, which expects a standard http://tools.ietf.org/html/rfc2387 style HTTP POST request.
All parameters must be sent in the POST payload, parameters in the query string will be ignored.
The request must contain at least two parameters:
- upload, the file to be processed
- page, the name of the ImportPage to use for the actions.
The request can also include:
- autocreate_user_fields, create allowed custom user fields (i.e. "user_xxx") if they don't exist; must be 'true' or 'false'; defaults to 'false'.
By default, we'll raise an error if you send a user field that isn't yet created as an allowed field in your instance. Use this parameter to automatically create those allowed user fields.
- user_fields_only, if 'true', we will attempt a fast custom user field upload. Note that your upload must only contain a user-identifying field (id, akid, email) and custom user field columns. If other columns are present, we'll downgrade to a regular upgrade.
You must send UTF-8 encoded Unicode data in the uploaded file.
We recommend compressing the file to reduce upload times, but even if you don't compress the file, you should treat your upload file as binary data.
Example¶
import requests
import sys
page = 'my_previously_created_import_page'
url = 'https://docs.actionkit.com/rest/v1/upload/'
upload_file = sys.argv[1]
r = requests.post(url,
files={'upload': open(upload_file, 'rb')},
data={ 'page': page, 'autocreate_user_fields': 'true' },
auth=('USER', 'PASS'))
print r.status_code
print r.headers['Location']
Poll For Progress¶
On success, your initial POST will return a 201 CREATED response. The Location header will point to your new upload.
Example Response¶
HTTP/1.1 201 CREATED
Server: openresty
Date: Tue, 03 Feb 2015 14:44:43 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
X-Machine-Id: dev.actionkit.com
Vary: Accept,Cookie,Accept-Encoding,User-Agent
Location: https://docs.actionkit.com/rest/v1/upload/47/
Set-Cookie: sid=4kcpskldoek31gxw0c7v4yojt43v6rj0; expires=Tue, 03-Feb-2015 22:44:43 GMT; httponly; Max-Age=28800; Path=/
Other possible response status codes are:
STATUS CODE | LIKELY REASON |
---|---|
400 BAD REQUEST. | The response body should contain more detail. |
500 INTERNAL SERVER ERROR. | Contact ActionKit support. |
404 NOT FOUND. | The page parameter doesn't not refer to a valid, non-hidden page. |
401 UNAUTHORIZED. | The credentials were invalid. |
403 FORBIDDEN. | This user does not have permission to perform this action. |
When you upload a file, it's added to a processing queue. You'll need to poll the Upload to see it's current status. That's easy enough, just GET the Location you got back from the upload until the field is_completed is a true value.
We'll set is_completed to true if the upload completes without error, but also if there's a problem reading or unpacking the file, if the header is invalid, if there are too many errors to continue, or if you or someone else stops the upload.
Once an upload finishes processing you must check has_errors and has_warnings. We'll cover that in more detail in the section below, Review errors and warnings.
Note that if you're seeing dropped connections rather than a response status code, this may be due to incorrect authorization credentials. Upload attempts that fail to auth are disconnected for security reasons. You can check your auth credentials by loading any simple API request with them.
Pseudo-code Polling For Completion¶
while not upload['is_completed']:
upload = requests.get(upload_uri)
progress = upload['progress']
print "%s/s, remaining %ss, %s ok, %s failed, %s warned" % (
progress['rate'],
progress['time_remaining'],
progress['rows']['ok'],
progress['rows']['failed'],
progress['rows']['warned'])
time.sleep(1)
Response Field Reference¶
Field name | Description |
---|---|
id | Unique identifier for this upload |
resource_uri | Uri of this upload |
line_count | Approximate line count of uploaded file |
path | Internal path to the uploaded file on the ActionKit cluster |
autocreate_user_fields | Boolean indicating if the upload should autocreate user fields in the file |
compression | Boolean indicating if the uploaded file was compressed |
format | Detected format of the uploaded file: 'tsv' or 'csv' |
page | Resource_uri of the ImportPage used to process actions |
created_at | Timestamp when upload was created |
updated_at | Timestamp when upload was last updated |
started_at | Timestamp when upload started processing |
finished_at | Timestamp when upload finished processing |
has_errors | Count of errors found during processing |
errors | URI of the full list of UploadErrors |
has_warnings | Count of warnings found during processing |
warnings | URI of the full list of UploadWarnings |
is_completed | Boolean indicating if the processing is done, whether it was successful or not |
original_header | Header parsed out of the file |
override_header | Corrected header, provided by admin or API |
progress | Dictionary with details of processing progress, see below for details |
status | Current status of upload, see below for possible values |
submitter | URI of user who submitted this Upload for Processing |
stop | URI to stop the upload as soon as possible |
Progress updates:
Field name | Description |
---|---|
rate | Rows per second |
time_remaining | Estimated seconds remaining to finish processing |
rows | Dictionary of total processing counts for 'failed', 'ok' and 'warned' rows |
all | URI of all previous progress reports |
Possible status values:
Status | Description |
---|---|
new | New Upload |
downloading | Downloading file from S3 (not relevant for API uploads) |
unpacking | Unpacking files or archive |
checking | Checking Header |
header_failed | Header Failed Check |
header_ok | Header OK |
loading | Loading Data |
died | Died |
stopped | Stopped |
completed | Completed |
Review Errors And Warnings¶
The returned JSON object from the Upload resource includes a count of errors and a count of warnings, as well as links to the full lists of errors and warnings. (Both URIs are simply pointers to resources filtered by the relationship with this upload.)
For example, you will see something like this in the returned JSON.
"errors": "/rest/v1/uploaderror/?upload=47",
"warnings": "/rest/v1/uploadwarning/?upload=47"
You can use those URIs to page through the (potentially very numerous!!) errors and warnings.
Warnings and errors will have useful information even if the uploaded file failed to start processing. Problems with the format, headers and encodings will all be stored in these resources.
Be sure to include a check for errors and warnings in your integration regardless of the status of the upload.
Stop An Upload In Progress¶
Large uploads may take a long time to process. We've provided a stop function for when you know that something is incorrect and you'll need to redo the upload.
POSTing to the URI: /rest/v1/upload/{{ upload.id }}/stop/ will stop the Upload.
This URI is included in the returned JSON for an upload that is being processed. Stopping a processing upload may take several seconds, you should keep polling the status if you wish to verify that it was stopped.
The stop endpoint will return a 202 ACCEPTED if the upload was stopped, and a 404 NOT FOUND if the upload id in the URI was not found.
See the next section, Override the header and restart the Upload, for how to restart an upload.
Override The Header And Restart The Upload¶
If your upload can't complete, or has many warnings due to a problem with the header, you can PATCH the Upload with a JSON encoded override_header and restart the upload. Restarting the processing will delete the errors, warnings and progress records from the previous processing run.
Restarting will not undo the previous upload. It will simply re-run the Upload, using any changes you've made to the override_header.
The override_header allows you to rename columns, including using the magical prefix "skip_column" to tell ActionKit to ignore a column.
Your must send valid JSON, possibly inside a JSON encoded request, so you'll need to be careful about the escaping of the value. We won't validate the override_header field until you try to re-run the upload.
Let's say you have an Upload with an original_header with two columns, "email" and "user_color". You need to change "email" to "email", so it's a valid identifier for users. And let's say you want to ignore the column "user_color" by renaming it to "skip_column_user_color".
The original_header field is JSON encoded list of field names:
"original_header": "[\"email\", \"user_color\"]",
So you're going to PATCH the upload with a modified list. Note the escaping of JSON within JSON.
"override_header": "[\"email\", \"skip_column_user_color\"]",
The PATCH request returns 202 ACCEPTED:
$ curl -X PATCH -uuser:password https://docs.actionkit.com/rest/v1/upload/50/ \
-d'{ "override_header": "[\"email\", \"skip_column_user_color\"]" }' \
-H'Content-type: application/json' -i
HTTP/1.1 202 ACCEPTED
Server: openresty
Date: Wed, 04 Feb 2015 10:50:16 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
X-Machine-Id: dev.actionkit.com
Vary: Accept,Cookie,Accept-Encoding,User-Agent
Set-Cookie: sid=t586psott5901yocc71mf2d86ljgra7e; expires=Wed, 04-Feb-2015 18:50:16 GMT; httponly; Max-Age=28800; Path=/
Now you need to tell ActionKit to restart the upload by POSTing to the restart link in the Upload resource. It will look something like /rest/v1/upload/50/restart/ and will be included in an Upload resource if is_completed = True.
$ curl -X POST -uuser:password -H'Content-type: application/json' \
-i https://docs.actionkit.com:8807/rest/v1/upload/50/restart/
HTTP/1.1 202 ACCEPTED
Server: openresty
Date: Wed, 04 Feb 2015 11:07:18 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 0
Connection: keep-alive
X-Machine-Id: dev.actionkit.com
Vary: Accept,Cookie,Accept-Encoding,User-Agent
Set-Cookie: sid=g89koa7nmjwnillwslvbypung5hz4xsz; expires=Wed, 04-Feb-2015 19:07:18 GMT; httponly; Max-Age=28800; Path=/
$ curl -uuser:passowrd -X GET \
https://docs.actionkit.com/rest/v1/upload/50/ \
| python -mjson.tool
{
"autocreate_user_fields": false,
"compression": "none",
"created_at": "2015-02-03T17:01:33",
"errors": "/rest/v1/uploaderror/?upload=50",
"finished_at": "2015-02-03T17:02:23",
"format": "csv",
"has_errors": 0,
"has_warnings": 0,
"id": 50,
"is_completed": false,
"line_count": 100001,
"original_header": "[\"email\", \"user_color\"]",
"override_header": "[\"email\", \"skip_column_user_color\"]",
"page": "/rest/v1/importpage/13/",
"path": "dev.actionkit.com:upload-4e9faf14-abc6-11e4-8f48-00163e0e21b4.tsv.gz",
"progress": {
"all": "/rest/v1/uploadprogress/?upload=50",
"rate": 0,
"rows": {
"failed": 0,
"ok": 0,
"warned": 0
},
"time_remaining": null
},
"resource_uri": "/rest/v1/upload/50/",
"started_at": "2015-02-03T17:01:38",
"status": "new",
"stop": "/rest/v1/upload/50/stop/",
"submitter": "/rest/v1/user/1/",
"updated_at": "2015-02-04T11:07:18",
"warnings": "/rest/v1/uploadwarning/?upload=50"
}
Note that the status is "new" , that errors and warnings have been cleared out, and is_completed is once again False. You can now recommence polling for progress and errors!
A More Complete Example¶
Here is an example in Python using the requests library.
import requests
from requests_toolbelt import MultipartEncoderMonitor
import requests
import sys
import time
def progressing(monitor):
sys.stdout.write(".")
def authorization():
return {'auth': ('username','password')}
def poll(upload_uri):
response = requests.get(upload_uri, **authorization())
if response.status_code != 200:
raise Exception("Unexpected response code: %s: %s" % (response.status_code, response.content))
return response.json()
def do_upload(page, url):
m = MultipartEncoderMonitor.from_fields(
fields={
'page' : page,
'autocreate_user_fields': 'true',
'upload': ('bigger.tsv.gz', open('bigger.tsv.gz', 'rb'), 'text/gzip')
},
callback=progressing
)
sys.stdout.write("\nStarting upload request: ")
r = requests.post(url,
data=m,
headers={'Content-Type': m.content_type},
**authorization()
)
if r.status_code != 201:
raise Exception(r.content)
upload_uri = r.headers['location']
sys.stdout.write(" uploaded!\n")
print "Polling for results @ %s." % (upload_uri)
upload = poll(upload_uri)
try:
while not upload['is_completed']:
upload = poll(upload_uri)
print upload['status']
progress = upload['progress']
print " rate %s/s, remaining %ss, %s ok, %s failed, %s warned" % (
progress['rate'],
progress['time_remaining'],
progress['rows']['ok'],
progress['rows']['failed'],
progress['rows']['warned'])
time.sleep(1)
except KeyboardInterrupt:
print "Caught interrupt! Stopping upload!"
requests.post(upload_uri + 'stop/', **authorization())
upload = poll(upload_uri)
while upload['status'] != 'stopped':
upload = poll(upload_uri)
print "status: %s, waiting for stop" % (upload['status'])
time.sleep(1)
print "Done!"
if upload['has_errors']:
print "Errors: %s" % (upload['errors'])
if upload['has_warnings']:
print "Warnings: %s" % (upload['warnings'])
# Page.name of a previously created ImportPage
page = 'my_previously_created_import_page'
# The full URI of the Upload resource endpoint
url = 'https://docs.actionkit.com/rest/v1/upload/'
do_upload(page, url)