Near Analysis: ArcPy vs. NumPy/SciPy

I’ve been slowly exploring the NumPy Python library. Without delving into the technical details, which I don’t understand, suffice it to say that using NumPy arrays allows certain speed advantages over nested Python lists, and presumably, ArcPy cursors.

Take the Near Analysis. For every point in a feature class, calculate the distance to and record the object ID of the nearest point. Given 1,000 points and no fancy indexing algorithms, this means calculating 1,000,000 distances (or 999,000 – we don’t care about distance to self). 10,000 points? That explodes or distance calculations to 100,000,000.

Some options:

  1. We could run the out-of-the-box ArcGIS Near tool, but that requires Advanced licensing, which I don’t have.
  2. We could make our own Near tool, using two arcpy.da cursors.
  3. We could load our points into a NumPy array, using FeatureClassToNumPyArray, and use NumPy/SciPy matrix operators for the analysis.

Here’s the code I came up with for options 2 and 3:

>>> import arcpy, numpy, numpy.lib.recfunctions, scipy.spatial, random, cProfile
... fc = "bc_geoname_albers"
... def numpyNear():
...   npArray = arcpy.da.FeatureClassToNumPyArray(fc,["OID@","SHAPE@XY"])
...   distArray = scipy.spatial.distance.pdist(npArray['SHAPE@XY'],'euclidean')
...   sqDistArray = scipy.spatial.distance.squareform(distArray)
...   nearFID = npArray['OID@'][numpy.argsort(sqDistArray)[:,1]].transpose()
...   npAppend = numpy.lib.recfunctions.append_fields(npArray,'NearFID',nearFID)
...   nearDist = numpy.sort(sqDistArray)[:,1].transpose()
...   npAppend = numpy.lib.recfunctions.append_fields(npAppend,'NearDist',nearDist)
...   outFC = r'in_memory\outPts'+str(random.random()*1000)
...   outFeat = arcpy.da.NumPyArrayToFeatureClass(npAppend,outFC,"SHAPE@XY")
...   arcpy.CopyFeatures_management(outFC,r'in_memory\numpy_Pts')
... def arcpyNear():
...   lyr = 'arcpy_Pts'
...   arcpy.MakeFeatureLayer_management(fc,lyr)
...   arcpy.AddField_management(lyr,'NearFID','LONG')
...   arcpy.AddField_management(lyr,'NearDist','DOUBLE')
...   with arcpy.da.UpdateCursor(lyr,['SHAPE@','OID@','NearFID','NearDist']) as cursor1:
...     for row in cursor1:
...       dist = float("inf")
...       with arcpy.da.SearchCursor(lyr,['SHAPE@','OID@']) as cursor2:
...         for row2 in cursor2:
...           if row[1] != row2[1]:
...             curDist = row[0].distanceTo(row2[0])
...             if curDist < dist:
...               dist = curDist
...               nearFID = row2[1]
...       row[2] = nearFID
...       row[3] = dist
...       cursor1.updateRow(row)
... cProfile.runctx('numpyNear()',None,locals())
... cProfile.runctx('arcpyNear()',None,locals())

Even though the result of each function is essentially the same, using 1,000 points the NumPy function completes in 0.516 seconds, while the ArcPy takes 202.062 seconds.

Side note: see use of cProfile for benchmarking method.

Here is the NumPy method, using three points as an example.

1

1. Create the NumPy Array (FeatureClassToNumPyArray):

npArray = arcpy.da.FeatureClassToNumPyArray(fc,["OID@","SHAPE@XY"])

[(0, [0.0, 0.0])
 (1, [3.0, 0.0])
 (2, [3.0, 4.0])]

2. Calculate all distances using SciPy pdist function (result = distance from 0 to 1, 0 to 2, and 0 to 3):

distArray = scipy.spatial.distance.pdist(npArray['SHAPE@XY'],'euclidean')

[ 3.  5.  4.]

3. Create a squareform of the distance array (row 1 is all distances from point 0, row 2 is all distances from point 1, etc.):

sqDistArray = scipy.spatial.distance.squareform(distArray)

[[ 0. 3. 5.]
 [ 3. 0. 4.]
 [ 5. 4. 0.]]

4. For each object ID in the original points, find the second nearest value’s index (so, not the distance to itself) – result is list of positions, by row, in squareform distance array holding the second-nearest value:

nearFID = npArray['OID@'][numpy.argsort(sqDistArray)[:,1]].transpose()

[1 0 1]

5. Append the array of NearFIDs to original point array, using numpy.lib.recfunctions.append_fields:

npAppend = numpy.lib.recfunctions.append_fields(npArray,'NearFID',nearFID)

6. Get second-nearest distance value for each row in squareform distance array:

nearDist = numpy.sort(sqDistArray)[:,1].transpose()

[ 3.  3.  4.]

7. Append the array of NearDist values to original point array:

npAppend = numpy.lib.recfunctions.append_fields(npAppend,'NearDist',nearDist)

8. Convert back to feature class and save (NumPyArrayToFeatureClass):

outFC = r'in_memory\outPts'+str(random.random()*1000)
outFeat = arcpy.da.NumPyArrayToFeatureClass(npAppend,outFC,"SHAPE@XY")
arcpy.CopyFeatures_management(outFC,r'in_memory\numpy_Pts')

Final attribute table, including original object ID, Near object ID, and Near Distance:

2

Advertisements

2 thoughts on “Near Analysis: ArcPy vs. NumPy/SciPy

  1. Kim

    Nice idea to test out using Numpy functions but this is a bit of a straw man comparison. Numpy is working on an optimised array in memory.

    Who would do a Near using two cursors and a slow geometry function? For a start a python hypot() function on the coordinates would be 10 times faster. You have also got out @SHAPE when the help (or training course) specifically suggests @SHAPEXY is much much faster in a cursor. Even the Near() command has a better tool: GenerateNearTable() that has some practical options such as a search tolerance so that you don’t have to square the point counts for real problems.
    But staying with your self-imposed Basic licence, I did a similar exercise to find the farthest town from the sea (inverted to make it harder) and after several refactoring I got it faster than Near as well but only by removing the generality of the solution.

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s