Another GIS Blog: 2016

Wednesday, December 14, 2016

Microsoft Compiler for Python 2.7

Doesn't everyone hate this message:

Microsoft Visual C++ 9.0 is required (Unable to find vcvarsall.bat).

I sure do, and I solved it by downloading a helpful program from Microsoft! Don't believe me, just google it! Install Microsoft's compiler for Python 2.7 from here: https://www.microsoft.com/en-us/download/details.aspx?id=44266 and most of the pip installs should work!

Enjoy

Thursday, December 1, 2016

So long VBA and Thanks for all the Memories

Microsoft has stopped providing fixes or troubleshooting for VB and VBA. Esri just announced the same in the 10.5 release. It's time to update that old code.

Some options moving forwards for your old desktop applications are:

.NET
Python

It IS time to re-evaluate the application and see how it fits into the other development frameworks.

VB/VBA is officially dead.

Check out the article here: (https://blogs.esri.com/esri/arcgis/2016/11/14/arcgis-desktop-and-vba-moving-forward/)

Monday, September 19, 2016

Configuring Juypter Notebook Startup Folder

By default when you install jupyter notebook (formally iPython), the product will point to Window's My Document folder. I find this to be less than optimal because My Documents can contain a mishmash of various documents. To change the start up directory, there is a run time option where you can specify a folder, but that is not a permanent solution. A better solution is to create a configuration file.

After Jupyter is installed (I used anacoda's distribution of Python 3.5), navigate to the folder containing the jupyter.exe

Type the following: jupyter notebook --generate-config
This will generate an entry in your user profile: ~/.jupyter
Edit the jupyter_notebook._config.py file and find c.NotebookApp.notebook_dir
Uncomment the entry and enter in your path
Save any file changes and start jupyter

The ipython notebooks should now be saved in your new directory.

Friday, August 19, 2016

Panda Dataframe as a Process Tracker (postgres example)

Sometimes you need to keep track of the number of rows processed for a given table.

Let's assume you are working in postgres and you want want to do row by row operations to do some sort of data manipulation. Your user requires you to keep track of each row's changes and wants to know the number of failures with the updates and the number of successful updates. The output must be in a text file with pretty formatting.

There are many ways to accomplish this task, but let's use Pandas, arcpy.da Update Cursor, and some sql.


#--------------------------------------------------------------------------

def create_tracking_table(sde, tables):

    """

    creates a panadas dataframe from a sql statement

    Input:

       sde - sde connection file

       tables - name of the table to get the counts for

    Ouput:

       Panda Dataframe with column names: Table_Name, Total_Rows and

       Processed

    """

    desc = arcpy.Describe(sde)

    connectionProperties = desc.connectionProperties

    username = connectionProperties.user

    sql = """SELECT

       nspname AS schemaname,relname,reltuples

    FROM pg_class C

     LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)

    WHERE

       nspname NOT IN ('pg_catalog', 'information_schema') AND

       relkind='r' AND

       nspname='{schema}' AND

       relname in ({tables})

    ORDER BY reltuples DESC;""".format(

                                   schema=username,

                                   tables=",".join(["'%s'" % t for t in tables])

                               )

    columns = ['schemaname','Table_Name','Total_Rows']



    con = arcpy.ArcSDESQLExecute(sde)

    rows = con.execute(sql)

    count_df = pd.DataFrame.from_records(rows, columns=columns)

    del count_df['schemaname']

    count_df['Processed'] = 0

    count_df['Errors'] = 0

    return count_df

Now we have a function that will return a dataframe object from a SQL statement. It contains 3 fields; Table_Name, Total_Rows, and Processed. Table_name is the name of the table in the database. Total_Rows is the length of the table. Processed is where you are going to modify every a row gets updated successfully. Errors is the numeric column where if an update fails, the value will be added to.

So let's use what we just made:

count_df = create_tracking_table(sde, tables)
for table in tables:
with arcpy.da.UpdateCursor(table, "*") as urows:
for urow in urows:
try:
urow[3] += 1
urows.updateRow(urow)
df.loc[df['Table_Name'] == '%s' % table, 'Processed'] += 1
except:
df.loc[df['Table_Name'] == '%s' % table, 'Errors'] += 1

The pseudo code above shows that whenever an exception is raised, 'Errors' get 1 added to it, and when it successfully updates a row 'Processed' gets updated.

The third part of the task was to output the count table to a text file which can be done easily using the to_string() method.

with open(, 'w') as writer:
writer.write(count_df.to_string(index=False, col_space=12, justify='left'))
writer.flush()

So there you have it. We have a nice human readable output table in a text file.

Enjoy

Wednesday, August 3, 2016

More on Pandas Data Loading with ArcGIS (Another Example)

Large datasets can be a major problem with systems that are running 32-bit Python because there is an upper limit on memory use: 2 GB. Most times programs fail before they even hit the 2 GB mark, but there it is.

When working with large data that cannot fit into the 2 GB of RAM, how can we push the data into DataFrames?

One way is to chunk it into groups:


#--------------------------------------------------------------------------

def grouper_it(n, iterable):

    """

    creates chunks of cursor row objects to make the memory

    footprint more manageable

    """

    it = iter(iterable)

    while True:

        chunk_it = itertools.islice(it, n)

        try:

            first_el = next(chunk_it)

        except StopIteration:

            return

        yield itertools.chain((first_el,), chunk_it)

This code takes an iterable object (has next() defined at Python 2.7 or __next__() for Python 3.4) and makes other iterators of size n where n is a whole number (integer).

Example Usage:


import itertools

import os

import json

import arcpy

import pandas as pd

with arcpy.da.SearchCursor(fc, ["Field1", "Field2"]) as rows:
groups = grouper_it(n=50000, iterable=rows)
for group in groups:
df = pd.DataFrame.from_records(group, columns=rows.fields)
df['Field1'] = "Another Value"
df.to_csv(r"\\sever\test.csv", mode='a')
del group
del df
del groups

This is one way to manage your memory footprint by loading records in smaller bits.

Some considerations on 'n'. I found the following effects the size of 'n': number of columns, field length, and data types.

Wednesday, July 27, 2016

Reading Spatial Data Into a Pandas Dataframe

At 10.4.x scipy is included in your basic python install, which is great!

Working with Pandas DataFrame can make life easy, especially if you need to do it quickly.


import arcpy
import pandas as pd
import sys
#--------------------------------------------------------------------------
def trace():
    """
        trace finds the line, the filename
        and error message and returns it
        to the user
    """
    import traceback
    tb = sys.exc_info()[2]
    tbinfo = traceback.format_tb(tb)[0]
    # script name + line number
    line = tbinfo.split(", ")[1]
    # Get Python syntax error
    #
    synerror = traceback.format_exc().splitlines()[-1]
    return line, __file__, synerror

with arcpy.da.SearchCursor(r"d:\temp\scratch.gdb\INCIDENTS_points",
                           ["OBJECTID", "SHAPE@X", "SHAPE@Y"]) as rows:
    try:
        df = pd.DataFrame.from_records(data=rows,
                                       index=None,
                                       exclude=None,
                                      columns=rows.fields,
                                      coerce_float=True)
        print ((df.columns[1], df.columns[2]))
        print ((df[df.columns[1]].mean(), df[df.columns[2]].mean()))


    except:
        print trace()

Like normal, you create an arcpy.da cursor, then pass that generator into the DataFrame's from_records(). Once the data is loaded, like in my example, you can perform operations on the frame itself. For example let's say you needed the mean location of points. This can be quickly done by loading in all the location XY columns (SHAPE@X and SHAPE@Y) and performing a mean call on each column.

With this method you can't control the chunksize when loading the data, so be careful of your memory.

Friday, June 10, 2016

ArcREST is now on PyPi

Installing ArcREST just got easier because you can use pip.

It's easy as:

pip install arcrest_package

Enjoy to much fanfare.

Friday, February 19, 2016

ArcREST 3.5.3 Help Now Online

ArcREST help documents has been updated with the latest release last week!

It can be found here. (http://esri.github.io/ArcREST/index.html)

As always check out the project here: http://www.github/com/Esri/ArcREST

10.4 is Released

Check it out, 10.4 is here!

enjoy

Thursday, January 7, 2016

Opendata Added to ArcREST

I am very proud to say that the open data REST has been added to ArcREST. OpenData sites hosted on ArcGIS.com allow groups to share authoritative information to users with just a few clicks on their site. Here is a simple usage example:

import arcrest
url = "http://opendata.arcgis.com"
opendata = arcrest.opendata.OpenData(url=url)
#Search by Query
searchResults = opendata.search(q="parcels")
print (searchResults)

The other big thing that you can do with the open data API is export information:

import arcrest
url = "http://opendata.arcgis.com"
itemId = "f59603825818413f87d9d819c3acff88_0"
opendata = arcrest.opendata.OpenData(url=url)
item = opendata.getDataset(itemId=itemId)
print (item.export(outFormat="kml", outFolder=r"c:\temp4"))
#supports: 'shp', 'kml', 'csv', and 'geojson' in the outFormat parameter.

Hope you enjoy searching and exporting the Open Data site!

ArcREST can be found here.
ArcREST Issues should be logged here.