ESRI runs a biannual contest to encourage participation on their help forum, GeoNet. Prizes go to the top ten point-getters. I found out during the last contest that it takes a great deal of time and persistence to keep up with the top of the pack (I ended up 5th). There are several tips for optimizing your effort (some of which I outlined, somewhat sarcastically, here).
One such tip is: get on GeoNet during times when there are the greatest number of fresh, unanswered questions. If you’ve spent any time on GeoNet, you have likely noticed that questions are generally asked during North American working hours, when people are struggling to get through their work-related GIS tasks. I wanted to put some better numbers to this idea, so I set about gathering the data myself.
All the information is there: each post has the date/time it was asked written right there in the posting. You could click on each post and record that date/time into an Excel and be done with it, but that would be awfully tedious. This is where screen scraping comes in. Screen scraping is the direct equivalent of having your computer control your web browser: click here, find this part of the HTML code, read it, and do something with it.Luckily, your computer doesn’t care if it has to spend all day doing the same thing over and over and over…
I chose to use Python, but you can do this in other languages, as well. Useful libraries to download are Requests and lxml. I use Requests for making the, you guessed it, “requests”, which are similar to typing a URL in the address bar of your browser. I use lxml for parsing and traversing the returned HTML code, which you can look at on any web page by pressing Ctrl+u (at least, in Chrome).
from lxml import html import requests, time, csv with open('C:/junk/geonet.csv', 'w') as csvfile: # create and/or open a CSV file csvWriter = csv.writer(csvfile, delimiter=" ", quoting=csv.QUOTE_MINIMAL) # writer dateList =  baseUrl = 'https://geonet.esri.com/content' # store the URL prefix for i in range(10): # loop through the first 10 'Content' pages page = requests.get(baseUrl + '?start=' + str(i*20)) # navigate to page tree = html.fromstring(page.text) # retrieve the HTML linkList = tree.iterlinks() # find all the links on the page threads =  for link in linkList: # loop through the links if link.startswith('/thread/'): # find those starting with "thread" threads.append(link) # add the link to the list threadBase = 'https://geonet.esri.com' # store the URL prefix for thread in threads: # loop through the threads listed on the 'Content' page page = requests.get(threadBase + thread) # navigate to the correct thread page tree = html.fromstring(page.text) # retrieve the HTML dates = tree.find_class('j-post-author') # retrieve the date dateList.append(dates.text_content().strip()) # write to list csvWriter.writerow(dates.text_content().strip()) # write to CSV time.sleep(5) # wait 5s to give server a chance to handle someone else's requests
Anyhow, the graph at the start of this post shows pretty much what I expected: people on the East Coast get confused, then people on the West Coast get confused, then everyone goes home.