Another GIS Blog: Working with numpy's Structured Array and numpy.dtype

Friday, April 26, 2013

Working with numpy's Structured Array and numpy.dtype

In my previous post, I showed how to quickly get access data (2007/2010) into a file geodatabase without creating an ODBC connection in ArcCatalog/Map using pyODBC. You might have noticed that I used numpy to create a table pretty easily, but you might be wondering what are the dtypes?

numpy.dtype are data type objects that describe how the bytes in a fixed-size block of memory are seen by the system. So is the data a string, number, etc... It describes the follow aspects of data:

Type of data (integer, float, object, ...)
size of data (number of bytes)
describes an array of data
names of fields in the record
if the data is a sub-array, it's shape

Getting the dtype formats for common python data types if fairly easy in python. The numpy.dtype() will return the proper format for almost any python data type:

>>> print numpy.dtype(str)
|S0

For array protocol type strings, there are various data types supported:

`'b'`	Boolean
`'i'`	(signed) integer
`'u'`	unsigned integer
`'f'`	floating-point
`'c'`	complex-floating point
`'S'`, `'a'`	string
`'U'`	unicode
`'V'`	anything (`void`)

(source: numpy help documents)

This allows you to specify thing like string length.
Example:

>>> dt = numpy.dtype('a25')  # 25-character string

After you know what your data types are, you will want to associate these types with the fields in the records. I prefer to use the optional dictionary method where there are two keys: 'names' and 'formats'. You would then pass this information to create your 'structured array'.

Example:

>>> dt = numpy.dtype({'names': ['r','g','b','a'],

     'formats': [numpy.uint8, numpy.uint8, numpy.uint8, numpy.uint8]})

>>> colors = numpy.zeros(5, dtype = dt)

>>> print colors

[(0, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0, 0)]

There are many ways to declare dtypes for your data, and you can read them all here.
More on structured arrays in numpy can be found here.

Enjoy