I’ve been slowly exploring the NumPy Python library. Without delving into the technical details, which I don’t understand, suffice it to say that using NumPy arrays allows certain speed advantages over nested Python lists, and presumably, ArcPy cursors.

Take the Near Analysis. For every point in a feature class, calculate the distance to and record the object ID of the nearest point. Given 1,000 points and no fancy indexing algorithms, this means calculating 1,000,000 distances (or 999,000 – we don’t care about distance to self). 10,000 points? That explodes or distance calculations to 100,000,000.

Some options:

- We could run the out-of-the-box ArcGIS Near tool, but that requires Advanced licensing, which I don’t have.
- We could make our own Near tool, using two arcpy.da cursors.
- We could load our points into a NumPy array, using FeatureClassToNumPyArray, and use NumPy/SciPy matrix operators for the analysis.

Here’s the code I came up with for options 2 and 3:

>>> import arcpy, numpy, numpy.lib.recfunctions, scipy.spatial, random, cProfile ... fc = "bc_geoname_albers" ... def numpyNear(): ... npArray = arcpy.da.FeatureClassToNumPyArray(fc,["OID@","SHAPE@XY"]) ... distArray = scipy.spatial.distance.pdist(npArray['SHAPE@XY'],'euclidean') ... sqDistArray = scipy.spatial.distance.squareform(distArray) ... nearFID = npArray['OID@'][numpy.argsort(sqDistArray)[:,1]].transpose() ... npAppend = numpy.lib.recfunctions.append_fields(npArray,'NearFID',nearFID) ... nearDist = numpy.sort(sqDistArray)[:,1].transpose() ... npAppend = numpy.lib.recfunctions.append_fields(npAppend,'NearDist',nearDist) ... outFC = r'in_memory\outPts'+str(random.random()*1000) ... outFeat = arcpy.da.NumPyArrayToFeatureClass(npAppend,outFC,"SHAPE@XY") ... arcpy.CopyFeatures_management(outFC,r'in_memory\numpy_Pts') ... def arcpyNear(): ... lyr = 'arcpy_Pts' ... arcpy.MakeFeatureLayer_management(fc,lyr) ... arcpy.AddField_management(lyr,'NearFID','LONG') ... arcpy.AddField_management(lyr,'NearDist','DOUBLE') ... with arcpy.da.UpdateCursor(lyr,['SHAPE@','OID@','NearFID','NearDist']) as cursor1: ... for row in cursor1: ... dist = float("inf") ... with arcpy.da.SearchCursor(lyr,['SHAPE@','OID@']) as cursor2: ... for row2 in cursor2: ... if row[1] != row2[1]: ... curDist = row[0].distanceTo(row2[0]) ... if curDist < dist: ... dist = curDist ... nearFID = row2[1] ... row[2] = nearFID ... row[3] = dist ... cursor1.updateRow(row) ... cProfile.runctx('numpyNear()',None,locals()) ... cProfile.runctx('arcpyNear()',None,locals())

Even though the result of each function is essentially the same, using 1,000 points the NumPy function completes in **0.516** seconds, while the ArcPy takes **202.062** seconds.

Side note: see use of cProfile for benchmarking method.

Here is the NumPy method, using three points as an example.

1. Create the NumPy Array (FeatureClassToNumPyArray):

npArray = arcpy.da.FeatureClassToNumPyArray(fc,["OID@","SHAPE@XY"]) [(0, [0.0, 0.0]) (1, [3.0, 0.0]) (2, [3.0, 4.0])]

2. Calculate all distances using SciPy pdist function (result = distance from 0 to 1, 0 to 2, and 0 to 3):

distArray = scipy.spatial.distance.pdist(npArray['SHAPE@XY'],'euclidean') [ 3. 5. 4.]

3. Create a squareform of the distance array (row 1 is all distances from point 0, row 2 is all distances from point 1, etc.):

sqDistArray = scipy.spatial.distance.squareform(distArray) [[ 0. 3. 5.] [ 3. 0. 4.] [ 5. 4. 0.]]

4. For each object ID in the original points, find the second nearest value’s index (so, not the distance to itself) – result is list of positions, by row, in squareform distance array holding the second-nearest value:

nearFID = npArray['OID@'][numpy.argsort(sqDistArray)[:,1]].transpose() [1 0 1]

5. Append the array of NearFIDs to original point array, using numpy.lib.recfunctions.append_fields:

npAppend = numpy.lib.recfunctions.append_fields(npArray,'NearFID',nearFID)

6. Get second-nearest distance value for each row in squareform distance array:

nearDist = numpy.sort(sqDistArray)[:,1].transpose() [ 3. 3. 4.]

7. Append the array of NearDist values to original point array:

npAppend = numpy.lib.recfunctions.append_fields(npAppend,'NearDist',nearDist)

8. Convert back to feature class and save (NumPyArrayToFeatureClass):

outFC = r'in_memory\outPts'+str(random.random()*1000) outFeat = arcpy.da.NumPyArrayToFeatureClass(npAppend,outFC,"SHAPE@XY") arcpy.CopyFeatures_management(outFC,r'in_memory\numpy_Pts')

Final attribute table, including original object ID, Near object ID, and Near Distance:

KimNice idea to test out using Numpy functions but this is a bit of a straw man comparison. Numpy is working on an optimised array in memory.

Who would do a Near using two cursors and a slow geometry function? For a start a python hypot() function on the coordinates would be 10 times faster. You have also got out @SHAPE when the help (or training course) specifically suggests @SHAPEXY is much much faster in a cursor. Even the Near() command has a better tool: GenerateNearTable() that has some practical options such as a search tolerance so that you don’t have to square the point counts for real problems.

But staying with your self-imposed Basic licence, I did a similar exercise to find the farthest town from the sea (inverted to make it harder) and after several refactoring I got it faster than Near as well but only by removing the generality of the solution.

dkwiensPost authorThanks for your response. Of course, you’re right: hypot()/@SHAPEXY would definitely be faster.