CBC Sports Word Cloud


Still playing around with PHP/XML. Here’s a method for parsing an RSS feed (CBC Sports) with PHP and loading the results (descriptions) into a D3 wordcloud. And, here‘s why you shouldn’t use word clouds – I don’t really care about that, though, this was about learning some PHP.

Anyways, the PHP code I used was:

//load xml
$xml_file_open = simplexml_load_file(“http://rss.cbc.ca/lineup/sports.xml&#8221;);
//parse xml

function displayChildrenRecursive($xmlObj,$depth=0) {
//make array of stopwords to ignore – go get them from here.
$stopwords =
$stopwords = explode(“,”, $stopwords);

//loop through xml tags
foreach($xmlObj->children() as $child) {
//get descriptions
if($child->getName()==’description’) {
//get text
$childStringRaw = print_r((string)$child,true);
//drop excess HTML in description
$childStringMid = explode(“<p>”, $childStringRaw);
$childStringLeft = explode(“</p>”, $childStringMid[1]);
$childString = $childStringLeft[0];
//escape quotes
$childString = str_replace(“‘”,”\'”,$childString);
//separate description into individual words
$childArray = explode(” “, $childString);
//loop through words
foreach ($childArray as $word)
//disregard if the word is a stopword
if (!in_array(strtolower($word), $stopwords, TRUE)) {
//get rid of spaces
$trimmed = trim($word);
//add to javascript array
echo “js_array.push(‘$trimmed’);\n”;
//run on next tag

Hat tips:

Sherif’s Tech Blog

Jason Davies Blog

Nieman Journalism Lab


One thought on “CBC Sports Word Cloud

  1. Pingback: My First Geo D3 | Darren's Side Projects

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s