Monthly Archives: July 2015

GeoNet Prime Answering Times


ESRI runs a biannual contest to encourage participation on their help forum, GeoNet. Prizes go to the top ten point-getters. I found out during the last contest that it takes a great deal of time and persistence to keep up with the top of the pack (I ended up 5th). There are several tips for optimizing your effort (some of which I outlined, somewhat sarcastically, here).

One such tip is: get on GeoNet during times when there are the greatest number of fresh, unanswered questions. If you’ve spent any time on GeoNet, you have likely noticed that questions are generally asked during North American working hours, when people are struggling to get through their work-related GIS tasks. I wanted to put some better numbers to this idea, so I set about gathering the data myself.

All the information is there: each post has the date/time it was asked written right there in the posting. You could click on each post and record that date/time into an Excel and be done with it, but that would be awfully tedious. This is where screen scraping comes in. Screen scraping is the direct equivalent of having your computer control your web browser: click here, find this part of the HTML code, read it, and do something with it.Luckily, your computer doesn’t care if it has to spend all day doing the same thing over and over and over…

I chose to use Python, but you can do this in other languages, as well. Useful libraries to download are Requests and lxml. I use Requests for making the, you guessed it, “requests”, which are similar to typing a URL in the address bar of your browser. I use lxml for parsing and traversing the returned HTML code, which you can look at on any web page by pressing Ctrl+u (at least, in Chrome).

from lxml import html
import requests, time, csv

with open('C:/junk/geonet.csv', 'w') as csvfile: # create and/or open a CSV file
  csvWriter = csv.writer(csvfile, delimiter=" ", quoting=csv.QUOTE_MINIMAL) # writer
  dateList = []
  baseUrl = '' # store the URL prefix

  for i in range(10): # loop through the first 10 'Content' pages
    page = requests.get(baseUrl + '?start=' + str(i*20)) # navigate to page
    tree = html.fromstring(page.text) # retrieve the HTML
    linkList = tree.iterlinks() # find all the links on the page
    threads = []

    for link in linkList: # loop through the links
      if link[2].startswith('/thread/'): # find those starting with "thread"
        threads.append(link[2]) # add the link to the list

    threadBase = '' # store the URL prefix
    for thread in threads: # loop through the threads listed on the 'Content' page
      page = requests.get(threadBase + thread) # navigate to the correct thread page
      tree = html.fromstring(page.text) # retrieve the HTML
      dates = tree.find_class('j-post-author') # retrieve the date
      dateList.append(dates[0].text_content().strip()) # write to list
      csvWriter.writerow(dates[0].text_content().strip()) # write to CSV
      time.sleep(5) # wait 5s to give server a chance to handle someone else's requests

Anyhow, the graph at the start of this post shows pretty much what I expected: people on the East Coast get confused, then people on the West Coast get confused, then everyone goes home.