# Mapbox Visual Center Algorithm: arcpy (attempt #2)

Yesterday, I attempted to replicate the Mapbox Visual Center algorithm using ArcGIS arcpy (Python). Today, I actually looked at the source code (code, license) and attempted to fully port it over from JavaScript. I think it works properly, but you tell me.

def polylabel(polygon, centroid, precision, debug):
cells = []
precision = precision or 1.0
extent = polygon.extent
minX, minY, maxX, maxY = (extent.XMin, extent.YMin, extent.XMax, extent.YMax)
width = extent.width
height = extent.height
cellSize = min([width, height])
h = cellSize / 2
cellQueue = []
x = minX
while x < maxX:
y = minY
while y < maxY:             cell = Cell(x+h, y+h, h, polygon)             cells.append(cell.geom)             cellQueue.append(cell)             y += cellSize         x += cellSize     bestCell = Cell(centroid[0], centroid[1], 0, polygon)     numProbes = len(cellQueue)     while len(cellQueue):         cellQueue.sort(key=lambda cell: cell.d)         cell = cellQueue.pop()         cells.append(cell.geom)         if cell.d > bestCell.d:
bestCell = cell
if cell.max - bestCell.d <= precision:
continue
h = cell.h/2
cellQueue.append(Cell(cell.x-h, cell.y-h, h, polygon))
cellQueue.append(Cell(cell.x+h, cell.y-h, h, polygon))
cellQueue.append(Cell(cell.x-h, cell.y+h, h, polygon))
cellQueue.append(Cell(cell.x+h, cell.y+h, h, polygon))
numProbes += 4
return [bestCell.x, bestCell.y, cells]
class Cell():
def __init__(self, x, y, h, polygon):
self.x = x
self.y = y
self.h = h
self.d = pointToPolygonDist(x, y, polygon)
self.max = self.d + self.h * math.sqrt(2)
self.geom = arcpy.Polygon(arcpy.Array([arcpy.Point(x-h,y-h),arcpy.Point(x+h,y-h),arcpy.Point(x+h,y+h),arcpy.Point(x-h,y+h)]))
def pointToPolygonDist(x, y, polygon):
point_geom = arcpy.PointGeometry(arcpy.Point(x, y),sr)
polygon_outline = polygon.boundary()
min_dist = polygon_outline.queryPointAndDistance(point_geom)
sign_dist = min_dist[2] * ((min_dist[3]-0.5)*2)
return sign_dist
fc = 'BEC_POLY selection'
sr = arcpy.Describe(fc).spatialReference
centroids = []
label_points = []
cells = []
with arcpy.da.SearchCursor(fc,['SHAPE@','SHAPE@XY']) as cursor:
for row in cursor:
best_point = polylabel(row[0],row[1],.75,'#')
centroids.append(arcpy.PointGeometry(row[0].centroid,sr))
label_points.append(arcpy.PointGeometry(arcpy.Point(best_point[0], best_point[1]),sr))
for cell in best_point[2]:
cells.append(cell)
arcpy.CopyFeatures_management(centroids,r'in_memory\centroids')
arcpy.CopyFeatures_management(label_points, r'in_memory\label_points')
arcpy.CopyFeatures_management(cells, r'in_memory\cells')

# Mapbox Visual Center Algorithm: arcpy

On Monday, Mapbox published a JavaScript implementation of a fast algorithm for finding the visual center of a polygon using quadtrees (blog, code). I took a stab at it using Python/arcpy in ArcGIS:

def quadify(poly): # split polygon into quads
ext = poly.extent
TM = arcpy.Point((ext.XMin+ext.XMax)/2,ext.YMax)
LM = arcpy.Point(ext.XMin,(ext.YMin+ext.YMax)/2)
BM = arcpy.Point((ext.XMin+ext.XMax)/2,ext.YMin)
RM = arcpy.Point(ext.XMax,(ext.YMin+ext.YMax)/2)
TL = arcpy.Polygon(arcpy.Array([ext.upperLeft,LM,poly.centroid,TM]),sr)
TR = arcpy.Polygon(arcpy.Array([TM,poly.centroid,RM,ext.upperRight]),sr)
BL = arcpy.Polygon(arcpy.Array([LM,ext.lowerLeft,BM,poly.centroid]),sr)
BR = arcpy.Polygon(arcpy.Array([poly.centroid,BM,ext.lowerRight,RM]),sr)
return [TL,TR,BL,BR]
poly_boundary = poly.boundary()
dist = q_dist[2] if q_dist[3] else -q_dist[2]
max_dist = 0
dists = {}
max_dist = cur_dist if cur_dist > max_dist else max_dist
for k,v in dists.iteritems():
if k > max_dist * 0.90: # precision = 90%
fc = 'wetlands_select' # feature class/layer
sr = arcpy.Describe(fc).spatialReference # spatial reference
out_polys = []
out_points = []
with arcpy.da.SearchCursor(fc,'SHAPE@',spatial_reference=sr) as cursor: # loop polygons
for row in cursor:
ext = row[0].extent
center = arcpy.PointGeometry(arcpy.Point((ext.XMin+ext.XMax)/2,(ext.YMin+ext.YMax)/2),sr)
for i in range(10): # quadify 10x
arcpy.CopyFeatures_management(out_polys,r'in_memory\polys')
arcpy.CopyFeatures_management(out_points,r'in_memory\points')

Notes:

I’m not 100% sure that this emulates the base Mapbox algorithm, and I didn’t attempt to implement the “priority queue” enhancement.

The results are somewhat sensitive to the precision factor, but I suppose that’s the be expected with a speed algorithm.

I realize that you can calculate a label point much easier within ArcGIS using out-of-the-box tools. This was just me exploring the algorithm.

# Near Analysis: ArcPy vs. NumPy/SciPy

I’ve been slowly exploring the NumPy Python library. Without delving into the technical details, which I don’t understand, suffice it to say that using NumPy arrays allows certain speed advantages over nested Python lists, and presumably, ArcPy cursors.

Take the Near Analysis. For every point in a feature class, calculate the distance to and record the object ID of the nearest point. Given 1,000 points and no fancy indexing algorithms, this means calculating 1,000,000 distances (or 999,000 – we don’t care about distance to self). 10,000 points? That explodes or distance calculations to 100,000,000.

Some options:

1. We could run the out-of-the-box ArcGIS Near tool, but that requires Advanced licensing, which I don’t have.
2. We could make our own Near tool, using two arcpy.da cursors.
3. We could load our points into a NumPy array, using FeatureClassToNumPyArray, and use NumPy/SciPy matrix operators for the analysis.

Here’s the code I came up with for options 2 and 3:

>>> import arcpy, numpy, numpy.lib.recfunctions, scipy.spatial, random, cProfile
... fc = "bc_geoname_albers"
... def numpyNear():
...   npArray = arcpy.da.FeatureClassToNumPyArray(fc,["OID@","SHAPE@XY"])
...   distArray = scipy.spatial.distance.pdist(npArray['SHAPE@XY'],'euclidean')
...   sqDistArray = scipy.spatial.distance.squareform(distArray)
...   nearFID = npArray['OID@'][numpy.argsort(sqDistArray)[:,1]].transpose()
...   npAppend = numpy.lib.recfunctions.append_fields(npArray,'NearFID',nearFID)
...   nearDist = numpy.sort(sqDistArray)[:,1].transpose()
...   npAppend = numpy.lib.recfunctions.append_fields(npAppend,'NearDist',nearDist)
...   outFC = r'in_memory\outPts'+str(random.random()*1000)
...   outFeat = arcpy.da.NumPyArrayToFeatureClass(npAppend,outFC,"SHAPE@XY")
...   arcpy.CopyFeatures_management(outFC,r'in_memory\numpy_Pts')
... def arcpyNear():
...   lyr = 'arcpy_Pts'
...   arcpy.MakeFeatureLayer_management(fc,lyr)
...   with arcpy.da.UpdateCursor(lyr,['SHAPE@','OID@','NearFID','NearDist']) as cursor1:
...     for row in cursor1:
...       dist = float("inf")
...       with arcpy.da.SearchCursor(lyr,['SHAPE@','OID@']) as cursor2:
...         for row2 in cursor2:
...           if row[1] != row2[1]:
...             curDist = row[0].distanceTo(row2[0])
...             if curDist < dist:
...               dist = curDist
...               nearFID = row2[1]
...       row[2] = nearFID
...       row[3] = dist
...       cursor1.updateRow(row)
... cProfile.runctx('numpyNear()',None,locals())
... cProfile.runctx('arcpyNear()',None,locals())

Even though the result of each function is essentially the same, using 1,000 points the NumPy function completes in 0.516 seconds, while the ArcPy takes 202.062 seconds.

Side note: see use of cProfile for benchmarking method.

Here is the NumPy method, using three points as an example.

1. Create the NumPy Array (FeatureClassToNumPyArray):

npArray = arcpy.da.FeatureClassToNumPyArray(fc,["OID@","SHAPE@XY"])

[(0, [0.0, 0.0])
(1, [3.0, 0.0])
(2, [3.0, 4.0])]

2. Calculate all distances using SciPy pdist function (result = distance from 0 to 1, 0 to 2, and 0 to 3):

distArray = scipy.spatial.distance.pdist(npArray['SHAPE@XY'],'euclidean')

[ 3.  5.  4.]

3. Create a squareform of the distance array (row 1 is all distances from point 0, row 2 is all distances from point 1, etc.):

sqDistArray = scipy.spatial.distance.squareform(distArray)

[[ 0. 3. 5.]
[ 3. 0. 4.]
[ 5. 4. 0.]]

4. For each object ID in the original points, find the second nearest value’s index (so, not the distance to itself) – result is list of positions, by row, in squareform distance array holding the second-nearest value:

nearFID = npArray['OID@'][numpy.argsort(sqDistArray)[:,1]].transpose()

[1 0 1]

5. Append the array of NearFIDs to original point array, using numpy.lib.recfunctions.append_fields:

npAppend = numpy.lib.recfunctions.append_fields(npArray,'NearFID',nearFID)

6. Get second-nearest distance value for each row in squareform distance array:

nearDist = numpy.sort(sqDistArray)[:,1].transpose()

[ 3.  3.  4.]

7. Append the array of NearDist values to original point array:

npAppend = numpy.lib.recfunctions.append_fields(npAppend,'NearDist',nearDist)

8. Convert back to feature class and save (NumPyArrayToFeatureClass):

outFC = r'in_memory\outPts'+str(random.random()*1000)
outFeat = arcpy.da.NumPyArrayToFeatureClass(npAppend,outFC,"SHAPE@XY")
arcpy.CopyFeatures_management(outFC,r'in_memory\numpy_Pts')

Final attribute table, including original object ID, Near object ID, and Near Distance:

# Arcpy: Polygons to Centroids (within Polygons, and with all Attributes)

Here’s a fairly simple python script that creates centroid points (constrained to fall within polygons) and attaches all the attributes (plus a link field, “ORIG_ID”) from the polygons to the centroids. Notably, it uses cursors to read raw geometry objects – no geoprocessing tools (except CreateFeatureclass). Also, while Advanced licensees may use the Feature to Point tool for this procedure, the method presented here is completely free, using no extensions or non-basic licensing – I don’t know of another free way to achieve this in ArcGIS. Wouldn’t it be easier to run a spatial join on the points to attach the attributes? Yes, but spatial join is slow. In my test of 1000 polygons, the cursor method completed in about 2 seconds, while adding a spatial join call extended the run time to more than 10 seconds. This could mean big savings on large datasets.

1. # import libraries
2. import arcpy, os
3. # set input/output parameters
4. polyFC = arcpy.GetParameterAsText(0)
5. outCentroids = arcpy.GetParameterAsText(1)
6. # set overwrite environment
7. arcpy.env.overwriteOutput = True
8. # if the output file does not exist, create it. Add “ORIG_ID” field.
9. if not arcpy.Exists(outCentroids):
10.     arcpy.CreateFeatureclass_management(os.path.dirname(outCentroids),
11.                                     os.path.basename(outCentroids),
12.                                     “POINT”,
13.                                     polyFC,
14.                                     “”,
15.                                     “”,
16.                                     polyFC)
18. # create an InsertCursor containing all the fields in ourCentroids, which are all the fields in polyFC plus “ORIG_ID”
19. iCursor = arcpy.da.InsertCursor(outCentroids, [‘*’])
20. # read all features in polyFC
21. with arcpy.da.SearchCursor(polyFC, [“SHAPE@”,‘*’,‘OID@’]) as sCursor:
22.     for row in sCursor:
23.          # create an array to hold a new, modified row
24.         rowArray = []
25.          # read all the fields except “SHAPE@”
26.         for fieldnum in range(1,len(row)):
27.             # if this is the 2nd field [i.e. SHAPE], replace it with the centroid coordinates
28.             if fieldnum == 2:
29.                 rowArray.append(row[0].centroid)
30.             else:
31.                 rowArray.append(row[fieldnum])
32.         # write the new row to the cursor
33.         iCursor.insertRow(rowArray)
34. del iCursor

Looping through the fields in this way (lines 29 – 36) feels cumbersome – if anyone knows a better way to do this, please let me know!

————————————————-

EDIT (January 19, 2016): Following some discussion on LinkedIn’s GIS and Geography group regarding this post, here is an example of why you must use the SHAPE@ token (and Polygon.centroid) rather than either the SHAPE@XY or SHAPE@TRUECENTROID tokens to guarantee that the point falls inside the given polygon:

… points = []
… with arcpy.da.SearchCursor(“MY_FEATURE_LAYER”,[‘SHAPE@’,’Shape@XY’,’SHAPE@TRUECENTROID’]) as cursor:
…   for row in cursor:
…     points.append(arcpy.PointGeometry(row[0].centroid)) # SHAPE@
…     points.append(arcpy.PointGeometry(arcpy.Point(row[1][0],row[1][1]))) # SHAPE@XY
…     points.append(arcpy.PointGeometry(arcpy.Point(row[2][0],row[2][1]))) # SHAPE@TRUECENTROID
… arcpy.CopyFeatures_management(points, r’in_memory\mypoints’)

# Define Projections for Hard Drive of Files with os.walk

Received a hard drive full of imagery today. Tiles in two UTM zones, luckily specified in file name. Lots of folders. No projections defined. Here’s how to do it.

import os, arcpy
inFolder = r"ROOT PATH OF ALL IMAGERY"
folders = os.walk(inFolder)
sr9 = arcpy.Describe(r"PATH TO FILE ALREADY IN ZONE 9").spatialReference
sr10 = arcpy.Describe(r"PATH TO FILE ALREADY IN ZONE 10").spatialReference
for folder in folders:
for file in folder[2]:
if file.endswith(".ers"):
if 'U10W84' in file:
arcpy.DefineProjection_management(os.path.join(folder[0], file), sr10)
elif 'U9W84' in file:
arcpy.DefineProjection_management(os.path.join(folder[0], file), sr9)
else:
print file