Database journalism

From Wikipedia the free encyclopedia

Database journalism or structured journalism is a principle in information management whereby news content is organized around structured pieces of data, as opposed to news stories. See also Data journalism

Communication scholar Wiebke Loosen defines database journalism as "supplying databases with raw material - articles, photos and other content - by using medium-agnostic publishing systems and then making it available for different devices."[1]

History and development of database journalism[edit]

Computer programmer Adrian Holovaty wrote what is now considered the manifesto of database journalism in September 2006.[2] In this article, Holovaty explained that most material collected by journalists is "structured information: the type of information that can be sliced-and-diced, in an automated fashion, by computers".[3] For him, a key difference between database journalism and traditional journalism is that the latter produces articles as the final product while the former produces databases of facts that are continually maintained and improved.

2007 saw a rapid development in database journalism.[4] A December 2007 investigation by The Washington Post (Fixing DC's schools) aggregated dozens of items about more than 135 schools in a database that distributed content on a map, on individual webpages or within articles.

The importance of database journalism was highlighted when the Knight Foundation awarded $1,100,000 to Adrian Holovaty's EveryBlock project,[5] which offers local news at the level of city block, drawing from existing data. The Pulitzer prize received by the St. Petersburg Times' Politifact in April 2009 has been considered a Color of Money moment by Aron Pilhofer,[6] head of the New York Times technology team. Referring to Bill Dedman's Pulitzer Prize-winning articles called The Color of Money, Pilhofer suggested that database journalism has been accepted by the trade and will develop, much like CAR did in the 1980s and 1990s.

Seeing journalistic content as data has pushed several news organizations to release APIs, including the BBC, the Guardian, the New York Times and the American National Public Radio.[7] By doing so, they let others aggregate the data they have collected and organized. In other words, they acknowledge that the core of their activity is not story-writing, but data gathering and data distribution.

Beginning with the early years of the 21st century, some researchers expanded the conceptual dimension for databases in journalism, and in digital journalism or cyberjournalism.[8] A conceptual approach begins to consider databases as a specificity of digital journalism, expanding their meaning and identifying them with a specific code, as opposed to the approach which perceived them as sources for the production of journalistic stories, that is, as tools, according to some of the systematized studies in the 90s.

Difference with data-driven journalism[edit]

Data-driven journalism is a process whereby journalists build stories using numerical data or databases as a primary material. In contrast, database journalism is an organizational structure for content. It focuses on the constitution and maintenance of the database upon which web or mobile applications can be built, and from which journalists can extract data to carry out data-driven stories.

Examples of database journalism[edit]

Early projects in this new database journalism were mySociety in the UK, launched in 2004, and Adrian Holovaty's, released in 2005.[9]

As of 2011, several databases could be considered journalistic in themselves. They include EveryBlock, OpenCorporates, and


See also[edit]