Near Analysis: ArcPy vs. NumPy/SciPy

I’ve been slowly exploring the NumPy Python library. Without delving into the technical details, which I don’t understand, suffice it to say that using NumPy arrays allows certain speed advantages over nested Python lists, and presumably, ArcPy cursors.

Take the Near Analysis. For every point in a feature class, calculate the distance to and record the object ID of the nearest point. Given 1,000 points and no fancy indexing algorithms, this means calculating 1,000,000 distances (or 999,000 – we don’t care about distance to self). 10,000 points? That explodes or distance calculations to 100,000,000.

Some options:

  1. We could run the out-of-the-box ArcGIS Near tool, but that requires Advanced licensing, which I don’t have.
  2. We could make our own Near tool, using two arcpy.da cursors.
  3. We could load our points into a NumPy array, using FeatureClassToNumPyArray, and use NumPy/SciPy matrix operators for the analysis.

Here’s the code I came up with for options 2 and 3:

>>> import arcpy, numpy, numpy.lib.recfunctions, scipy.spatial, random, cProfile
... fc = "bc_geoname_albers"
... def numpyNear():
...   npArray = arcpy.da.FeatureClassToNumPyArray(fc,["OID@","SHAPE@XY"])
...   distArray = scipy.spatial.distance.pdist(npArray['SHAPE@XY'],'euclidean')
...   sqDistArray = scipy.spatial.distance.squareform(distArray)
...   nearFID = npArray['OID@'][numpy.argsort(sqDistArray)[:,1]].transpose()
...   npAppend = numpy.lib.recfunctions.append_fields(npArray,'NearFID',nearFID)
...   nearDist = numpy.sort(sqDistArray)[:,1].transpose()
...   npAppend = numpy.lib.recfunctions.append_fields(npAppend,'NearDist',nearDist)
...   outFC = r'in_memory\outPts'+str(random.random()*1000)
...   outFeat = arcpy.da.NumPyArrayToFeatureClass(npAppend,outFC,"SHAPE@XY")
...   arcpy.CopyFeatures_management(outFC,r'in_memory\numpy_Pts')
... def arcpyNear():
...   lyr = 'arcpy_Pts'
...   arcpy.MakeFeatureLayer_management(fc,lyr)
...   arcpy.AddField_management(lyr,'NearFID','LONG')
...   arcpy.AddField_management(lyr,'NearDist','DOUBLE')
...   with arcpy.da.UpdateCursor(lyr,['SHAPE@','OID@','NearFID','NearDist']) as cursor1:
...     for row in cursor1:
...       dist = float("inf")
...       with arcpy.da.SearchCursor(lyr,['SHAPE@','OID@']) as cursor2:
...         for row2 in cursor2:
...           if row[1] != row2[1]:
...             curDist = row[0].distanceTo(row2[0])
...             if curDist < dist:
...               dist = curDist
...               nearFID = row2[1]
...       row[2] = nearFID
...       row[3] = dist
...       cursor1.updateRow(row)
... cProfile.runctx('numpyNear()',None,locals())
... cProfile.runctx('arcpyNear()',None,locals())

Even though the result of each function is essentially the same, using 1,000 points the NumPy function completes in 0.516 seconds, while the ArcPy takes 202.062 seconds.

Side note: see use of cProfile for benchmarking method.

Here is the NumPy method, using three points as an example.

1

1. Create the NumPy Array (FeatureClassToNumPyArray):

npArray = arcpy.da.FeatureClassToNumPyArray(fc,["OID@","SHAPE@XY"])

[(0, [0.0, 0.0])
 (1, [3.0, 0.0])
 (2, [3.0, 4.0])]

2. Calculate all distances using SciPy pdist function (result = distance from 0 to 1, 0 to 2, and 0 to 3):

distArray = scipy.spatial.distance.pdist(npArray['SHAPE@XY'],'euclidean')

[ 3.  5.  4.]

3. Create a squareform of the distance array (row 1 is all distances from point 0, row 2 is all distances from point 1, etc.):

sqDistArray = scipy.spatial.distance.squareform(distArray)

[[ 0. 3. 5.]
 [ 3. 0. 4.]
 [ 5. 4. 0.]]

4. For each object ID in the original points, find the second nearest value’s index (so, not the distance to itself) – result is list of positions, by row, in squareform distance array holding the second-nearest value:

nearFID = npArray['OID@'][numpy.argsort(sqDistArray)[:,1]].transpose()

[1 0 1]

5. Append the array of NearFIDs to original point array, using numpy.lib.recfunctions.append_fields:

npAppend = numpy.lib.recfunctions.append_fields(npArray,'NearFID',nearFID)

6. Get second-nearest distance value for each row in squareform distance array:

nearDist = numpy.sort(sqDistArray)[:,1].transpose()

[ 3.  3.  4.]

7. Append the array of NearDist values to original point array:

npAppend = numpy.lib.recfunctions.append_fields(npAppend,'NearDist',nearDist)

8. Convert back to feature class and save (NumPyArrayToFeatureClass):

outFC = r'in_memory\outPts'+str(random.random()*1000)
outFeat = arcpy.da.NumPyArrayToFeatureClass(npAppend,outFC,"SHAPE@XY")
arcpy.CopyFeatures_management(outFC,r'in_memory\numpy_Pts')

Final attribute table, including original object ID, Near object ID, and Near Distance:

2

The Real PG Snow Priority Routes?

PG_roadsThe City of Prince George publishes an annual Snow Operations Map, however, according to it my street (lowest priority) should not get plowed nearly as often as it does (almost always within 24 hours of snowfall). The accompanying, confusing infographic (which does not seem to differentiate between Priority 1 and 2, by the way), makes a passing reference to Bylaw 8625, which does shed some light on the mystery. My street is listed in the bylaw, amongst almost all other highest priority streets in PG.

I spent some of the morning modifying the PG Street Centrelines file (downloaded from the PG Open Data Catalogue) and you can inspect the resulting map here.

Don’t get me wrong: I’m not complaining that my street gets plowed regularly (thank you, plow operators!), I just think the City could use some clarification on the snow operations policy.

Word tables to Excel: Python

Question: I’ve got a folder of 100+ Word documents, each containing a table or two. Can you format that into an Excel spreadsheet for me?

Answer: Yes – bask in the power and glory of Python.

Using the Python for Windows Extension (specifically the win32com.client library), you can drive both Word and Excel with Python. Knowing that this functionality exists is half the battle.

# import libraries
import win32com.client as win32
import os

myDir = r'C:\Projects\ProjectX'

# open invisible Excel app
XL = win32.Dispatch('Excel.Application')
XL.Visible = 0
# load pre-made workbook
XLbook = XL.Workbooks.Open(os.path.join(myDir,'ProjectX_Spreadsheet.xlsx'))
# navigate to first worksheet
XLsheet = XLbook.Worksheets(1)
# counter to keep track of Excel row
XLrow = 2

# loop through files in directory
for myFile in os.listdir(myDir):
 filepath = os.path.join(myDir,myFile)
 filename = os.path.splitext(myFile)[0]
 ext = os.path.splitext(myFile)[1]

 # check if *.docx {optional}
 if ext == '.docx':
 # open invisible Word app
 word = win32.Dispatch('Word.Application')
 word.Visible = 0
 # open Word doc to read
 word.Documents.Open(filepath)
 doc = word.ActiveDocument

 # access first table in Word doc. For subsequent tables, increase index.
 table = doc.Tables(1)

 # get (Word) and set (Excel) some data
 XLsheet.Cells(XLrow,1).Value = table.Cell(Row=1, Column=1).Range.Text

 # get (Word) and set (Excel) some more data
 XLsheet.Cells(XLrow,2).Value = table.Cell(Row=2, Column=3).Range.Text

 # move to next row
 XLrow = XLrow + 1
 # close the current Word doc
 doc.Close()

# exit the Word app
word.Quit()
del word
# save and close Excel app
XLbook.Close(True)
XL.Quit()
del XL

Raspberry Pi, Minecraft, & Python

mc3

I rediscovered my Raspberry Pi, the credit card-sized computer intended to make programming accessible to kids (and everyone), this weekend. After loading an SD card with a fresh copy of NOOBS (here), I was ready to go.

News to me, a free version of Minecraft is included in the NOOBS build, so I tried it out. Not surprisingly in my old age, I was unable to figure out the point of Minecraft without Googling a tutorial, which led me to discover that there is also a Minecraft Python API packaged in the Raspberry Pi, and here’s how I used it to insert a 3×3 pillar of new blocks below my player.

mc1

  • With a fresh world open in Minecraft (above), open Python IDLE 2.x.
  • Import the Minecraft Python library, which should have been installed with NOOBS.
from mcpi import minecraft
  • Create a connection to the currently open Minecraft game
mc = minecraft.Minecraft.create()
  • Get the player’s current position
x,y,z = mc.player.getPos()
  • Loop through the nine adjacent pixels (along the x and z directions; y is vertical)
for x1 in range (-1,2,1):
    for z1 in range (-1,2,1):
  • Get the height of the adjacent pixels – they may not all be at the same height as the player’s position
        h = mc.getHeight(x+x1,z+z1)
  • Set each of the adjacent pixels to a new block type. Block type IDs can be found in the API documentation. Below I set the new blocks to 3, which is dirt.
        mc.setBlock(x+x1,h,z+z1,3)
  • Continuing this pattern while moving the player upward results in a 3×3 pillar of blocks, as below.

mc2

The entire script above would look like so:

from mcpi import minecraft
 mc = minecraft.Minecraft.create()
 x,y,z = mc.player.getPos()
 for x1 in range (-1,2,1):
     for z1 in range (-1,2,1):
         h = mc.getHeight(x+x1,z+z1)
         mc.setBlock(x+x1,h,z+z1,3)

NIR Camera Test

nir_testsI got an exciting new toy yesterday – a digital camera modified to capture near infrared (NIR), in addition to visible light. As I mentioned in a previous post, the combination of NIR and visible light allows for vegetation extraction on a pixel-by-pixel basis, after applying the NDVI formula. See an example of a raw NIR photo I took this morning, compared to the processed NDVI image, here. It’s not perfect, but it shows promise.

As a side note, my example makes use of the TwentyTwenty javascript widget, which makes sliding image overlays extremely easy.

Interactive Pixel Masking: NDVI

ndviI’ve been thinking a lot recently about how to visualize NIR (Near Infrared) photography, and here’s where I’m at.

In brief, NIR photos (combined with a regular visual light photo) allow us to calculate an index of plant health, or more technically, the ratio of absorbed photosynthetically available radiation. Healthy plants reflect almost all infrared radiation and absorb a high percentage of visually red radiation – this is why healthy plants appear green (they absorb red and blue and reflect green light).

The most often used formula for interpreting NIR photos is NDVI (Normalized Difference Vegetation Index). NDVI values range from -1 to +1. I calculated NDVI in ArcGIS, exporting to 8-bit unsigned PNG format (in retrospect, I should have exported to a higher bit value in order to retain the original NDVI values), which resulted in values ranging from 0 – 255.

The end objective was to calculate percent coverage of vegetation in photographs, whether healthy or not. My preliminary website seems to do an adequate job, although it surely needs refinement. The jQuery UI slider controls which pixels to mask as “vegetation.” The HTML5 Canvas elements take care of the rest, manipulating the NDVI image at the pixel level, coloring those above the threshold level (set in the slider) red.

Original NIR and RGB images found here: Image and Visual Representation Group

Arcpy: Polygons to Centroids (within Polygons, and with all Attributes)

Here’s a fairly simple python script that creates centroid points (constrained to fall within polygons) and attaches all the attributes (plus a link field, “ORIG_ID”) from the polygons to the centroids. Notably, it uses cursors to read raw geometry objects – no geoprocessing tools (except CreateFeatureclass). Also, while Advanced licensees may use the Feature to Point tool for this procedure, the method presented here is completely free, using no extensions or non-basic licensing – I don’t know of another free way to achieve this in ArcGIS.

Wouldn’t it be easier to run a spatial join on the points to attach the attributes? Yes, but spatial join is slow. In my test of 1000 polygons, the cursor method completed in about 2 seconds, while adding a spatial join call extended the run time to more than 10 seconds. This could mean big savings on large datasets.

  1. # import libraries
  2. import arcpy, os
  3. # set input/output parameters
  4. polyFC = arcpy.GetParameterAsText(0)
  5. outCentroids = arcpy.GetParameterAsText(1)
  6. # set overwrite environment
  7. arcpy.env.overwriteOutput = True
  8. # if the output file does not exist, create it. Add “ORIG_ID” field.
  9. if not arcpy.Exists(outCentroids):
  10.     arcpy.CreateFeatureclass_management(os.path.dirname(outCentroids),
  11.                                     os.path.basename(outCentroids),
  12.                                     “POINT”,
  13.                                     polyFC,
  14.                                     “”,
  15.                                     “”,
  16.                                     polyFC)
  17.     arcpy.AddField_management(outCentroids,‘ORIG_ID’‘LONG’)
  18. # create an InsertCursor containing all the fields in ourCentroids, which are all the fields in polyFC plus “ORIG_ID”
  19. cursor = arcpy.da.InsertCursor(outCentroids, [‘*’])
  20. # read all features in polyFC
  21. for row in arcpy.da.SearchCursor(polyFC, [“SHAPE@”,‘*’,‘OID@’]):
  22.      # create an array to hold a new, modified row
  23.     rowArray = []
  24.      # read all the fields except “SHAPE@”
  25.     for fieldnum in range(1,len(row)):
  26.         # if this is the second field [i.e. SHAPE], replace it with the centroid coordinates
  27.         if fieldnum == 2:
  28.             rowArray.append(row[0].centroid)
  29.         else:
  30.             rowArray.append(row[fieldnum])
  31.     # write the new row to the cursor
  32.     cursor.insertRow(rowArray)
  33. del row

Looping through the fields in this way (lines 28 – 35) feels cumbersome – if anyone knows a better way to do this, please let me know!