Monday, December 22, 2014

Converting PDF to TIFF (10.3) using ArcPy

New at 10.3, is a handy tool many of us in the GIS world have wanted for a long time.  That is converting geo-referenced PDFs to Tiff files.
The help describes this tool as:
Exports an existing PDF file to a Tagged Image File Format (TIFF). If the PDF has georeference information, the TIFF can be a GeoTIFF. These TIFFs can be used as a source for heads-up digitizing and viewing in ArcMap. Both GeoPDF and ISO standards of georeferenced PDFs are supported. 

It is very straight forward to use:

import arcpy
arcpy.PDFToTIFF_conversion(in_pdf_file="C:/temp/sample.pdf", 
                           out_tiff_file="C:/temp/sample.tif", 
                           pdf_password="", 
                           pdf_page_number="1", 
                           pdf_map="Layers", 
                           clip_option="NO_CLIP", 
                           resolution="250", 
                           color_mode="RGB_TRUE_COLOR", 
                           tiff_compression="LZW", 
                           geotiff_tags="GEOTIFF_TAGS")



For this sample, I just created a map with one of the AGOL imagery layers in a blank ArcMap layout and exported it to PDF with the GeoReference information embedded.  This creates a TIFF that can be used for additional spatial data in the ArcMap session.

Enjoy,

A

Monday, November 17, 2014

Workflow for Adding Files in Parts for Portal or ArcGIS Online

Python 2.7.5 is not very good at handling big files.  Actually it stinks especially if you need to upload them via a multi-part post.  There are many 3rd party solutions, but I found issues with both poster and requests with the tool belt add-in.  They simply didn't work.  Luckily for us, the folks who designed the AGOL/Portal REST API provided an alternative way of uploading large files.  The method is called Add Item Part.  The documentation can be found here: http://resources.arcgis.com/en/help/arcgis-rest-api/index.html#/Add_Item_Part/02r300000094000000/.

When trying to figure out how to use this function, you also need to read the Add Item method as well to get how Add Item Part should be used.  The workflow is as follows:
  1. Call Add Item with no data (file data) and pass in multipart=True along with the filename=.extension.
  2. Break the file into parts, but they must be over 5 MBs except for the last chunk which signals an end of file. (I break everything into 50 MB chunks because I don't want to perform tons of POST calls)
  3. Call Add Item Part by using a POST.  Pass in the parameters partNum, which is a unique integer from 1 to 10,000.  You must pass that in, or each part will be overwritten.  Also, pass in the file via multi-part POST.
  4. Now call Commit function to tell the server, hey I'm done.  This is an asynchronous call, so don't forget to check the item's status before moving on, or you will not be able to update the item's properties.
    • If you look at your 'My Content' page on AGOL, you'll notice and item with everything stated as 'null' for the title, type, etc.. All you can do is delete the item right now.  This makes step 5 very important
  1. Update the item using the updateItem REST call.  Here we state the type, title, tags, etc.. all the good stuff you need to know so users can access the data.  

Great now we have the item in AGOL or Portal.  Simple right?  Not really, but it's a great way to get around Python's 2.7.5 annoying memory issue.  I haven't tested this on x64 python or python 3.4.  Hopefully some of my readers will post a comment letting me know if the memory issues with StringIO/cStringIO still exist when performing large file POSTs.


The add item part is in ArcREST today!

Get ArcREST: http://www.github.com/Esri/ArcREST
Also vote for my idea for a GUI builder for Python Add-Ins.

Monday, October 13, 2014

Searching for User Content on AGOL Using ArcREST

ArcGIS Online (AGOL) and Portal organizations are great, but managing the whole organization's content can be tough.  Here is a brief example on how you could search an site (in our case AGOL) and look at what a given user has shared publicly.

Searching is fairly straight forward, and do not require any security logins to perform.  You can see this by just going to www.arcgis.com and typing something into the search box.  For this example though, I am going to use login credentials just in case I needed to perform an administrative task down the line.
if __name__ == "__main__":
    username = "MY USERNAME"
    password = "MY PASSWORD"
    url = "http://www.arcgis.com/sharing"
    proxy_url = None#"127.0.0.1" # for fiddler
    proxy_port = None#"8888" # for fiddler
    q2 = "owner:esri"
    securityHandler = arcrest.AGOLTokenSecurityHandler(username,
                                                       password,
                                                       proxy_port=proxy_port,
                                                       proxy_url=proxy_url)    
    admin = arcrest.manageagol.Administration(url=url, 
                                              securityHandler=securityHandler,
                                              proxy_url=proxy_url,
                                              proxy_port=proxy_port)    
    res = admin.query_without_creds( q=q2, start=1, num=1) # Want to find only public item on owner:esri
    items = []
    total = int(res['total'])
    steps = int(total / 100)
    if (total % 100) > 0:
        steps += 1
    step = 1
    while step <= steps:
        res = admin.query_without_creds( q=q2, start= 1 + ((step-1)*100), num=100)
        for r in res['results']:
            print r # do something with item!
            del r
        del res
        step += 1    

The item is returned from the query_without_creds() as a dictionary.  It contains information about each item that is unique to that item.

The script isn't doing much with the items that it finds, but it does show how you can loop through the owner's items (esri).  If you could administer the owner's item, you could change those public items to private.

To get a better sense of what you cannot and can query, and how the syntax works, I would check out this (http://doc.arcgis.com/en/arcgis-online/reference/search.htm).  It shows the advanced search syntax in great detail.

You can get ArcREST here (www.github.com/esri/ArcREST) if you do not have it.

Happy searching!

Thursday, October 9, 2014

Using ArcREST to Export Hosted Feature Services

THIS POST IS FOR RELEASE ARCREST 2.0 AND WILL NOT WORK PERFECTLY FOR ARCREST RELEASE 3.0.

When you upload a feature class to ArcGIS Online (AGOL), you have the option to publish those feature classes. Now let's assume you have enabled editing on that data set. The question is how do you pull down the data so you can have an updated version in office? Well luckily for us, you have ArcREST to automate this task! ArcREST can be found here (www.github.com/Esri/ArcREST) if you do not have it.

This example will show how to automate the download of a hosted feature class as a shapefile to a centralized storage location.

The workflow is as follows:

  1. Connect to the AGOL site
  2. Export the item to a shapefile
  3. Download the item to disk
  4. Erase the exported item

import arcrest
import uuid
import os
import time
#   Inputs
#
itemID = "THE ITEM ID TO EXPORT"
username = "SOME USERNAME"
pw = "MY PASSWORD"
url = "http://www.arcgis.com/sharing"
filePath = r"c:\temp"
#   Logic
#
#  Create security handler
shAGOL = arcrest.AGOLTokenSecurityHandler(username, pw)
#  Connect to AGOL
org = arcrest.manageagol.Administration(url=url, securityHandler=shAGOL)
# Grab the user's content (items)
content = org.content  
usercontent = content.usercontent(username=username)
#  Create a export item with a random name using UUID
#  and export it to a zipfile
#
result =  usercontent.exportItem(title="%s" % uuid.uuid4().get_hex(),
                                     itemId=itemID,
                                     exportFormat=exportDataAs,
                                     exportParameters=None)

exportedItemId = result['exportItemId']
jobId = result['jobId']
exportItem = content.item(itemId=exportedItemId)
#   Ensure the item is finished exporting before downloading
#
status =  usercontent.status(itemId=exportedItemId, jobId=jobId, jobType="export")
    
while status['status'].lower() == "processing":
    time.sleep(3)
    status =  usercontent.status(itemId=exportedItemId,
                                 jobId=jobId,
                                 jobType="export")
filePath = exportItem.itemData(f="json", savePath=filePath)
#   Erase the exported item to clean up
#   AGOL
#
usercontent.deleteItems(items=exportItem.id)
del exportItem
print 'finishe!'

So what we have is a quick little script that downloads a file to the temp folder.  The key thing to look at is the while part of the code.  When perform actions like publish, generateFeatures, export, and createService, they are performed asynchronously on the system.  This means that you need to check if they are finished before performing further actions on the items.  The status method returns a dictionary of values, and you need to ensure that the value is not equal to 'processing', or any further actions like itemData() will return invalid information or data.  In the case of the itemData() you will download an empty zip file.

Enjoy and happy coding.

Monday, September 15, 2014

Using ArcPy to Determine the Underlying Database

Sometime you need to you what the underlying database is because you do not know what the database is.  I know it sounds strange, but it does happen.  I created a small script to solve this problem.

import arcpy
def checkDBType(dbPathOrConnectionFile):
    """ checks the db to get the database type """

    if dbPathOrConnectionFile.lower().endswith(".mdb"):
        return "PGDB"
    elif dbPathOrConnectionFile.lower().endswith(".gdb"):
        return "FGDB"
    elif dbPathOrConnectionFile.lower().endswith(".sde"):
        queries = {
            "Informix" : """select first 1 dbinfo("version", "full") from systables;""",
            "MSSQLServer" : """SELECT @@VERSION""",
            "Oracle" : """select * from v$version""",
            "PostGreSQL" : """select version()::varchar(255);"""
            }
        fail = True
        conn = arcpy.ArcSDESQLExecute(dbPathOrConnectionFile)
        for k,v in queries.iteritems():
            try:
                conn = arcpy.ArcSDESQLExecute(dbPathOrConnectionFile)
                conn.execute(v)
                return k
            except:
                pass
    return "Unknown"
Basically script just a tries the SQL snippet and it it works, then that is the database you are using. To run the sql statement, you use the arcpy.ArcSDESQLExecute()'s connection object which has a function called execute().

The results seem to be promising, but I didn't have an Informix DB to test this on. So if you have one, please let me know if that statement will work! Oracle, SQL Server and PostGreSQL all work well.

 Enjoy

Friday, August 29, 2014

Speed Up Polygon Grid Generation

There are many tools that generate regular uniform grids with geoprocessing, but I found a novel was of generating grids pretty quickly for large areas.  Rasters are fast and can be created on the fly easily using numpy.  To make the raster a feature class, ArcPy has a Raster to Polygon tool that should be available at all levels of desktop.

import arcpy
import numpy
import math
fc = r"c:\temp\somedata.shp" # assume projection is projected coordinate system not Lat/Long where measurements are in decimal degrees
width = 10000 # meters
height = 10000 # meters
#  Gets the feature classes' extent
#
desc = arcpy.Describe(fc)
extent = desc.extent
#   Set the Mask
#
arcpy.env.mask = fc
#   Calculate the number of columns
#

dist_x = math.sqrt(math.pow((extent.XMax - extent.XMin), 2))
dist_y = math.sqrt(math.pow((extent.YMax - extent.YMin),2))
col_x = int(dist_x / width)
col_y = int(dist_y / height)
if col_x % distanceMeters > 0:
   col_x += 1
if col_y % distanceMeters > 0:
  col_y += 1
#   Construct the structure numpy array with random values
#
numpyArray = (numpy.random.permutation(col_x * col_y) + 1).reshape(col_y, col_x)
#   Convert array to raster 
#
myData = arcpy.NumPyArrayToRaster(numpyArray,
                                  arcpy.Point(extent.XMin, extent.YMin),
                                  width, height)
#   Convert the raster to a polygon feature class
#
grid = r"%s\gridData" % arcpy.env.scratchGDB
grid = arcpy.RasterToPolygon_conversion(in_raster=myData,
                                            out_polygon_features=grid,
                                            simplify="NO_SIMPLIFY",
                                            raster_field="VALUE")[0]
The code takes the extent of the data and creates a grid. The mask environmental parameter essentially clips the raster to the shape of the feature class. This way you only gets cells within the given polygon area. If you wanted cells to fall outside the feature classes' shape, then just comment out that line.

So that is that.


If you haven't done so, check out ArcREST.
Also vote for my idea for a GUI builder for Python Add-Ins.

Monday, August 11, 2014

Checkout ArcREST!

Need to work with ArcGIS Online (AGOL) Feature Services?
Tired of republishing all your data to AGOL?
Want an easier way to work with REST endpoints of ArcGIS Server using Python?

Look no further, check out ArcREST on github!

Version 1 is pretty solid, and version 2 can be downloaded on the 'dev' branch.

Get it here: http://www.github.com/Esri/ArcREST



Wednesday, June 18, 2014

Finding the Location of Your Python File

Another issue I have run into depending on the OS, is the ability to find where the python file is located that is being run.  I like the use the __file__ property, but it's not always present.  Thanks to google, I found this:

__file__ is the path name of the file from which the module was loaded, if it was loaded from a file. The __file__ attribute is not present for C modules that are statically linked into the interpreter; for extension modules loaded dynamically from a shared library, it is the path name of the shared library file.
 This means you can't always rely on os.path.dirname(__file__) to find the folder of where your python file is located.  Here is a function that has four potential checks to get the python file's folder location:


import os
def getPythonFileLocation():
    """  returns the location of where the python file is located """
    if os.path.dirname(__file__) != "":
        return os.path.dirname(__file__)
    elif os.path.dirname(os.path.abspath(__file__)) != "":
        return os.path.dirname(os.path.abspath(__file__))
    elif os.path.dirname(os.getcwd()) != "":
        return os.path.dirname(os.getcwd())
    else:
        from inspect import getsourcefile
        return os.path.dirname(os.path.abspath(getsourcefile(lambda _:None)))

The function first tries the __file__ property using the os.path.dirname() then it tries another flavor of __file__ that I found from good old stack exchange. The 3rd check uses the os.getcwd() and finally it falls back on the inspect library.

Hopefully this will help prevent 'NoneType' issues when trying to join string paths.

Enjoy

Monday, June 16, 2014

Validating a File GeoDatabase Function

I recently created a bunch of data loading scripts that load data from some URL and copy it down local.  Nothing too special, but I noticed and interesting issue.  The file geodatabase (fgdb) would sometime randomly get corrupted and instead of the ArcGIS software seeing the fgdb as a database, it would see it as a folder in the ArcGIS software.  I create a little function to check the fgdb using the arcpy.Describe() to check the workspace ID and if it was a folder instead of a fgdb, I would erase and recreate the corrupted fgdb.  This seems to resolve the issue.
import os
import arcpy
from arcpy import env
def validate_fgdb(fgdb=env.scratchGDB):
    """
       Checks to see if the FGDB isn't corrupt.  If
       the FGDB is corrupt, it then erases the database
       and replaces it with a new FGDB of the same name.
       Input:
          fgdb - string - full path of the file geodatabase
       Output:
          Boolean
    """
    try:
        fgdb = str(fgdb)
        if os.path.isdir(fgdb) and \
           fgdb.lower().endswith(".gdb"):
            print 'workspace possible fgdb'
            desc = arcpy.Describe(fgdb)
            if hasattr(desc, "workspaceFactoryProgID") and \
               desc.workspaceFactoryProgID != "esriDataSourcesGDB.FileGDBWorkspaceFactory.1":
                shutil.rmtree(fgdb, ignore_errors=True)
                arcpy.CreateFileGDB_management(out_folder_path=os.path.dirname(fgdb),
                                               out_name=os.path.basename(fgdb))
                return True
            elif hasattr(desc, "workspaceFactoryProgID") and \
                 desc.workspaceFactoryProgID == "esriDataSourcesGDB.FileGDBWorkspaceFactory.1":
                return True
            else:
                return False

        elif os.path.isdir(fgdb) == False and \
             fgdb.lower().endswith(".gdb"):
            arcpy.CreateFileGDB_management(out_folder_path=os.path.dirname(fgdb),
                                           out_name=os.path.basename(fgdb))
            return True
        else:
            return False
    except:
        arcpy.AddError(arcpy.GetMessages(2))

Basically all it does is take a path to anything and sees if it 1 exists, 2, if it is a fgdb. I did add some additional benefits. If the path ends in ".fgdb" but does not exists, it will create the fgdb.

I have a bunch of theories on why the fgdb gets corrupted, but the leading offender is erasing tables from the fgdb causes something to get hosed in the database.  Table overwrites tend to cause this issue to happen as well.  .


Anyway enjoy and hopefully you can not eliminate this issue with your scheduled tasks or services.


Wednesday, May 28, 2014

Playing with GeoPackages - Shrinking the Database

Awhile back Esri announced the support of the GeoPackage, which is just a sqlite database that holds vector data, and soon raster data too.

When you add and remove data to the sqlite database, it grows but never shrinks.  Think of it like a river that rises and falls, but the water line always remains.  This waterline is the space used on your hard drive.

To reduce the footprint of the database, you can run the VACUUM command to shrink the data base.
import sqlite3
import arcpy
SQLITE_FILE = r"c:\temp\example.gpkg"
conn = sqlite3.connect(SQLITE_FILE)
conn.execute("VACUUM")
conn.close()

Pretty simple to reduce the size of the database.  What VACUUM does is reconstruct the database from scratch. This will leave the database with an empty free-list and a file that is minimal in size. Note, however, that the VACUUM can take some time to run and it can use up to twice as much temporary disk space as the original file while it is running.

Enjoy

Wednesday, April 30, 2014

10.2.x - Getting Data Extents as Feature Classes (4 Methods)

Over the past couple of years, I have posted many posts on creating polygon extents from data.  Today, I am going to provide 4 ways to get the extent of a polygon.
  1. Use the 'Minimum Bounding Geometry' tool to create 'ENVELOPE' geometry types with or without the 'Group Option'.  This tool is available at the 'Advanced' level, so if you don't know how to/want to use python, this is your go to option to use.  The output is a feature class.
  2. For whole feature class extent use the arcpy.Describe(), and write it out to disk
    import arcpy
    from arcpy import env
    env.overwriteOutput = True
    fc = r"c:\States.shp"desc = arcpy.Describe(fc)
    extent = desc.extent
    pts = [arcpy.Point(extent.XMin, extent.YMin),
           arcpy.Point(extent.XMax, extent.YMin),
           arcpy.Point(extent.XMax, extent.YMax),
           arcpy.Point(extent.XMin, extent.YMax)]
    array = arcpy.Array(items=pts)
    poly = arcpy.Polygon(array)
    
    arcpy.CopyFeatures_management(poly, r"%scratchgdb%\way1")
    
  3. Method 1 for individual features: Use a cursor with extent XMin, YMin, XMax, and YMax, and arcpy.CreateFeatureClass()
    import arcpy
    from arcpy import env
    env.overwriteOutput = True
    fc = r"c:\States.shp"
    out_fc = r"%scratchgdb%\way2"
    if not arcpy.Exists(out_fc):
        arcpy.CreateFeatureclass_management(out_path="%scratchgdb%",
                                            out_name="way2", geometry_type="POLYGON",
                                            spatial_reference=arcpy.SpatialReference(4326))
    icur = arcpy.da.InsertCursor(out_fc, "SHAPE@")
    with arcpy.da.SearchCursor(fc, "SHAPE@") as rows:
        for row in rows:
            extent = row[0].extent
            pts = [arcpy.Point(extent.XMin, extent.YMin),
                   arcpy.Point(extent.XMax, extent.YMin),
                   arcpy.Point(extent.XMax, extent.YMax),
                   arcpy.Point(extent.XMin, extent.YMax)]
            array = arcpy.Array(items=pts)
            poly = arcpy.Polygon(array)
            icur.insertRow([poly])
            del array
            del poly
            del pts
            del extent
            del row
    
  4. Method 2 for individual features: Use a cursor objects and arcpy.CopyFeatures()
    import arcpy
    from arcpy import env
    env.overwriteOutput = True
    fc = r"c:\States.shp"
    out_fc = r"%scratchgdb%\way3"
    with arcpy.da.SearchCursor(fc, "SHAPE@") as rows:
        polys = []
        array = arcpy.Array()    
        for row in rows:
            extent = row[0].extent
            array.add(extent.lowerLeft)
            array.add(extent.lowerRight)
            array.add(extent.upperRight)
            array.add(extent.upperLeft)
            array.add(extent.lowerLeft)
            polys.append(arcpy.Polygon(array))
            array.removeAll()
            del row
        del array
        arcpy.CopyFeatures_management(polys, out_fc)
        del polys
    
Here are 4 examples on how to create extents using python or the built in system tools in ArcToolbox.
 Enjoy

Tuesday, April 29, 2014

Convert Your Custom Object to a Dictionary

Creating your own classes is a great in python.  It's easy and straight forward, and allows python users to mask repetitive code in our scripts, and create custom packages.

Let's assume that you are working with JSON, and you want to create a bunch of custom classes that return the data stored inside of them back as dictionaries.  This can be easily done by implementing the __iter__() in the class you are designing.

The function __iter__() returns an iterator object.  The object is required to support the iterator protocol.

For example, take the class below:

import json
class test(object):
    _a = None
    _b = None
    def __init__(self, a,b):
        self._a = a
        self._b = b
    #----------------------------------------------------------------------
    @property
    def a(self):
        """"""
        return self._a
    #----------------------------------------------------------------------
    @property
    def b(self):
        """"""
        return self._b
    #----------------------------------------------------------------------
    def __str__(self):
        """ returns object as string """
        o = {}
        for k,v in self.__iter__():
            o[k] = v
        return json.dumps(o)
        
    #----------------------------------------------------------------------
    def __iter__(self):
        """ iterator generator for public values/properties """
        attributes = [attr for attr in dir(self)
                      if not attr.startswith('__') and \
                      not attr.startswith('_')]
        for att in attributes:
            yield (att, getattr(self, att))

Here __iter__() and __str__() are implemented, along with two properties 'a' and 'b'.  From the user's standpoint, they are probably only interested in the public properties, so in the __iter__(), the properties without '__' and '_' are returned in a key/value pair object.  __str__() returns the data as JSON using the json library built into python.  __str__ is another built in class that is called when you do either a print or print str().

The result if you print out as a string is: {"a": 1, "b": 2}
For the dictionary: {'a': 1, 'b': 2}

The first is a string and the second is dictionary.

Enjoy

Friday, March 21, 2014

Searching a List of Dictionaries

Often when I work in Python, I get JSON text which I convert using the JSON module in python.  This is great for me because I now have lots of data in dictionaries and lists.  Dictionaries are easy to use because everything is in a Key/Value format.  I view it as the data has nice neat labels.

Sometimes there is a need to search for data, and if that dictionary is in a list of dictionary items, how can you easily check if that item exists within a given list of dictionaries?  Enter the built-in function called any.

Python's 2.7.x help describes any as:
Return True if any element of the iterable is true. If the iterable is empty, return False.
Any it is a short cut for:
def any(iterable):
    for element in iterable:
        if element:
            return True
    return False 

Now let's use the function.  Assume we have a given list, aListofDicts, that contains n number of dictionaries defined as such:
aListofDicts = [{"alpha": "a"}, {"alpha": "b"},
               {"alpha": "joker"},{"alpha": 1.2}]

To search this data, one would off the cuff have to write the above function to find the value in an iterable, but any gives us the magic shortcut because no one wants to write long scripts.
>>> print any(d['alpha'].lower() == 'a'.lower() for d in aListofDicts)
True
>>> print any(d['alpha'].lower() == 'tank'.lower() for d in aListofDicts)
False

So what happened in the above code?  We used generator expression, which is just a big word that creates an object that we can iterate on, ie a list.  That is this part:
aListofValues = [d for d in aListofDicts]
 Next we appends a conditional to store a list of boolean values (True/False)
aListofValues = [d[''] == "
This gives you a list of True/False values instead of a list of the actual dictionary or dictionary values.  This means if you have 1 million items in your list, you will have 1 million True/False values.  Not very efficient.

Now enters the any function:
exists = any([d['alpha'] == "
As soon as the list hit that True, if it's in position 1 or position 1 million of the list, it will stop the iteration, thus saving time and memory.  Hooray!

Happy Coding!


Wednesday, March 12, 2014

ArcGIS Pro and Python

At the Esri DevSummit 2014, it was announce that Python 3.4 will be the future of the language.

Goodbye 2.7.x and hello 3.4.

Most programmers will probably want to brush up on Python 3.x, so check out Dive into Python 3! http://getpython3.com/diveintopython3/

Enjoy


Monday, February 10, 2014

SQL Fiddle - Your Database Scratchpad

Ever want to test something out on your database but did not want to mess everything up. Now is your chance, check out: sqlfiddle.  Just go to this site and select your database.  You can then create a quick schema and dummy data and bam! Test away.


Enjoy

Tuesday, February 4, 2014

Add Geometry Attributes - ArcGIS 10.2.1 Tool

New at 10.2.1 is a simplified ways of adding geometry properties to a feature class.  The Add Geometry Attributes geoprocessing tool is a short cut from using the calculate field's geometry calculations.

Let's use the old way first:

    fc = r"some datasource"
    arcpy.AddField_management(fc, fieldName1, "DOUBLE", 
                              fieldPrecision, fieldScale)
    arcpy.AddField_management(fc, fieldName2, "DOUBLE", 
                              fieldPrecision, fieldScale)
 
    # Calculate centroid
    arcpy.CalculateField_management(fc, fieldName1, 
                                    "!SHAPE.CENTROID.X!",
                                    "PYTHON_9.3")
    arcpy.CalculateField_management(fc, fieldName2, 
                                    "!SHAPE.CENTROID.Y!",
                                    "PYTHON_9.3")
So here we create the field and then perform a calculation on the data set to get the X/Y centroid of the data.

At 10.2.1 you can clean this up and just do the following:

fc = "some dataset"
arcpy.AddGeometryAttributes_management(fc, "CENTROID")
With just two lines of code, we did what took multiple line previously to 10.2.1.

The Add Geometry Attributes tool is a time saver, and makes life just a tad easier.  It's a convenient tool that allows for quick addition of geometry attributes.

Check out more about this tool here.

Enjoy

Monday, January 13, 2014

Reading OSM Data Using Python

OpenStreetMap is a free editable community driven map.  The data can be downloaded in full from planet.openstreetmap.org, but you can also pull smaller sections from various web services.

For world pulls, I wouldn't recommend this code because you will run out of memory.  For smaller areas, same between 1-10 MB this should work depending on your desktop's specifications.


import xml
from xml.dom.minidom import parse
 
data = {}
keys = []
 
ways = {}
wkeys = []

relations = {}
rkeys = []

xmls = [r"c:\temp\small.osm"]
for f in xmls:
    dom = xml.dom.minidom.parse(f)
    #  parse out osm nodes
    #
    for n in dom.getElementsByTagName("node"):
        nid = n.getAttribute("id")
        data[nid] = {}
        data[nid]["lat"] = n.getAttribute("lat")
        data[nid]["lon"] = n.getAttribute("lon")
        for tag in n.getElementsByTagName("tag"):
            if(tag.hasAttribute("k")):
                k = tag.getAttribute("k")
                if(k not in keys):
                    keys.append(k)
                if(tag.hasAttribute("v")):
                    data[nid][k] = tag.getAttribute("v")
    # parse out osm ways/polygons
    #
    for n in dom.getElementsByTagName("way"):
        wid = n.getAttribute("id")
        ways[wid] = {}
        ways[wid]['ref'] = []
        ways[wid]['geomType'] = ""
        for nd in n.getElementsByTagName('nd'):
            if nd.hasAttribute('ref'):
                ref = nd.getAttribute('ref')
                ways[wid]['ref'].append(ref)
            del nd
        for tag in n.getElementsByTagName("tag"):
            if tag.hasAttribute("k") and \
               tag.hasAttribute('v'):
                k = tag.getAttribute("k")
                if k not in wkeys:
                    wkeys.append(k)
                ways[wid][k] = tag.getAttribute('v')
            del tag
        first = ways[wid]['ref'][0]
        last = ways[wid]['ref'][len(ways[wid]['ref'])-1]
        if ways[wid]['ref'][0] == ways[wid]['ref'][len(ways[wid]['ref'])-1]:
            ways[wid]['geomType'] = "polygon"
        else:
            ways[wid]['geomType'] = "polyline"
    for n in dom.getElementsByTagName('relation'):
        rid = n.getAttribute('id')
        relations[rid] = {}
        relations[rid]['member'] = []
        for mem in n.getElementsByTagName('member'):
            member = {}
            if mem.hasAttribtue('type'):
                member['type'] = mem.getAttribute('type')
            if mem.hasAttribute('ref'):
                member['ref'] = mem.getAttribute('ref')
            if mem.hasAttribute('role'):
                member['role'] = mem.getAttribute('role')
            relations[rid]['member'].append(member)
            del mem
        for tag in mem.getElementsByTagName("tag"):
            if tag.hasAttribute("k") and \
               tag.hasAttribute('v'):
                k = tag.getAttribute("k")
                relations[rid][k] = tag.getAttribute('v')            
            del tag
# do stuff with nodes, ways (lines), and polygons like converting the data from dictionaries to feature classes or shapefiles!

OSM files are just XML files that can be parsed with various xml processing tools in python. Some common ones are the minidom (core to python), lxml, and beautiful soup.

It is not the most efficient piece of code, but it demonstrates the power of python.

Enjoy, and feel free to make tweaks!