Let's Build a Better DC News Aggregator

Posted Saturday, February 19, 2011 at 10:29 p.m. by Chris Amico in Projects

This month's Carnival of Journalism asks: What can I do to increase the number of news sources?

I take the following as given: DC doesn't have a shortage of news, or an inadequate supply of sources. There are three daily newspapers (with metro sections), a weekly, several TV stations and radio on both AM and FM. TBD's community network points to 225 blogs focused on the metro area. Between Wordpress, Facebook, cell phones and cheap broadband access, anyone with an urge to participate in conversation and report news (however defined) can do so.

(A disclosure I'll come back to: My wife's site, Homicide Watch DC is a member of TBD's network.)

The problem (here, at least) is surfacing useful and relevant information in a timely matter. DC has a low signal-to-noise ratio, and, to extend the metaphor, there are large zones of radio silence, but those areas are hard to spot because there is no way to see the entire picture.

So, let's build a better aggregator. The point here is to assemble the many bits of news and try to see the big picture, and from there, to see the gaps.

Start by pulling the RSS feeds of every outlet I mentioned above. Include every blog in TBD's network, the metro sections of the Post, the Times, the Examiner. Text is easier to deal with, so we'll start there. We'll get to TV and radio later.

Run every story and blog post through OpenCalais, tagging it with known entities and social tags. Geocode, probably using code from OpenBlock. Sort by neighborhood, zip code, etc. For simple news stories that are mostly text, none of this is hard. Add some weighting based on voting, or click-through rates, and you have a little Hacker News for DC.

The first goal here is to start driving traffic to smaller sites covering niche issues. TBD's promotion, for example, has been a boon to Homicide Watch, giving the site early exposure that has led (in part) to more attention (and hopefully, eventually, resources).

But the second goal is to start seeing the gaps. Chart how many stories cover Anacostia, or Petworth, or Adams Morgan. How many mention each member of the city council? Does Michelle Rhee still get more coverage than Kaya Henderson, her successor? Now we have a metric to find under-reported stories.


Comments are closed for this post. If you still have something to say, please email me.