Monday, November 17, 2014

Workflow for Adding Files in Parts for Portal or ArcGIS Online

Python 2.7.5 is not very good at handling big files.  Actually it stinks especially if you need to upload them via a multi-part post.  There are many 3rd party solutions, but I found issues with both poster and requests with the tool belt add-in.  They simply didn't work.  Luckily for us, the folks who designed the AGOL/Portal REST API provided an alternative way of uploading large files.  The method is called Add Item Part.  The documentation can be found here: http://resources.arcgis.com/en/help/arcgis-rest-api/index.html#/Add_Item_Part/02r300000094000000/.

When trying to figure out how to use this function, you also need to read the Add Item method as well to get how Add Item Part should be used.  The workflow is as follows:
  1. Call Add Item with no data (file data) and pass in multipart=True along with the filename=.extension.
  2. Break the file into parts, but they must be over 5 MBs except for the last chunk which signals an end of file. (I break everything into 50 MB chunks because I don't want to perform tons of POST calls)
  3. Call Add Item Part by using a POST.  Pass in the parameters partNum, which is a unique integer from 1 to 10,000.  You must pass that in, or each part will be overwritten.  Also, pass in the file via multi-part POST.
  4. Now call Commit function to tell the server, hey I'm done.  This is an asynchronous call, so don't forget to check the item's status before moving on, or you will not be able to update the item's properties.
    • If you look at your 'My Content' page on AGOL, you'll notice and item with everything stated as 'null' for the title, type, etc.. All you can do is delete the item right now.  This makes step 5 very important
  1. Update the item using the updateItem REST call.  Here we state the type, title, tags, etc.. all the good stuff you need to know so users can access the data.  

Great now we have the item in AGOL or Portal.  Simple right?  Not really, but it's a great way to get around Python's 2.7.5 annoying memory issue.  I haven't tested this on x64 python or python 3.4.  Hopefully some of my readers will post a comment letting me know if the memory issues with StringIO/cStringIO still exist when performing large file POSTs.


The add item part is in ArcREST today!

Get ArcREST: http://www.github.com/Esri/ArcREST
Also vote for my idea for a GUI builder for Python Add-Ins.