Filling in the blanks on DocumentCloud

My OJR post on DocumentCloud, the muchbuzzedabout $1-million Knight News Challenge grant proposal, is up. I did an e-mail interview with three of the proposal’s authors, Aron Pilhofer of The New York Times and Scott Klein and Eric Umansky of ProPublica. Here’s an excerpt:

Aron Pilhofer: The grant would be used to create an independent, non-profit organization called DocumentCloud, which would manage the grant, build and maintain the software and so forth. Given the intensely competitive nature of the news business, we reckoned that this project had to be in the hands of an independent, impartial broker in order for a consortium like this to work.

More here.

Making sense of data at The New York Times

(After a long holiday hiatus, I’m finally getting around to posting this write-up of my visit with Aron Pilhofer at the NYT.)

"Movable Type" at The New York Times building

Tuesday, Dec. 23, 2008: The digital art installation in the lobby of the new New York Times building says more, I think, about the future of news and of the Times Company than its creators may have intended. Yes, we know that the future is digital and real-time and kinetic, like the work by Mark Hansen and Ben Rubin. But, more than that, the journalism of the future will be defined by its capacity to extract meaning from countless bits of data. The work, titled Movable Type, elegantly illustrates the bits. Making sense of them is Aron Pilhofer‘s domain.

It is my first visit to the new building, directly across 8th Avenue from the Port Authority Bus Terminal. I am meeting Pilhofer, who leads the paper’s interactive news technology team, for a quick tour and chat. His group of 10 developers, assembled over the last year or so, works on editorial projects (such as the Times’ live election results) but doesn’t report to the newsroom. Their boss is Marc Frons, the website’s CTO.

Over cups of caffeinated liquid in the Times’ airy 14th-floor cafeteria, Pilhofer tells me about Represent, a newly launched project from his team that, as the name suggests, lets you “keep track of what the people who represent you are doing.” Though still in soft launch, it’s already generating some nice buzz. (A bit on the tech specs from co-creator Derek Willis: They’re using GeoDjango to drive the mapping features.)

Pilhofer is an archetypal journo-techie, raised in the computer-assisted reporting school and fluent both in the cadences of the newsroom and in the technical lingo used by his fellow geeks. Before joining the Times’ computer-assisted reporting team, he honed his skills at the Center for Public Integrity, a D.C. nonprofit that seems to have been a sort of proving ground for smart, webby CAR folks (Willis and my former LAT colleague, Ben Welsh, are also alums).

This is the breed of journalist that web-oriented newsrooms would like to find more of. The problem is, “they just don’t exist,” Pilhofer says of his ilk. When I throw out the old question about whether it’s easier to teach a journalist programming skills or to teach a techie the principles of journalism, he tells me it’s not so much a question of trainability. Rather, he says, “there are more programmers out there that will find journalism interesting to learn” than vice-versa. He tells me that, with a couple of exceptions, the people on his team have either “very limited journalism experience or none whatsoever.”

Given that most of Pilhofer’s group comes from a hardcore tech background, I wonder whether they’ve acceded to rigid product development conventions like wireframes and detailed requirements documents. His response: “Hell no.” (Actually, he uses a more colorful four-letter word, but you get the point.)

He does throw out a lot of prod-dev terms like agile development, scrums, Extreme Programming and pair programming, but he uses newsroom analogies to describe them. Agile development methodology, for example, which stresses frequent deadlines and shuns long meetings, has a lot in common with the rhythm of a newsroom. And pair programming, an unconventional workflow in which two coders work in tandem on the same problem and test each other’s work as they go, is analogous to team reporting.

Some other highlights from our chat:

+++
Software: Pilhofer says his team relies heavily on open-source solutions. Ruby on Rails is the workhorse in this shop, but it’s been adapted to produce flat files when necessary (as opposed to rendering pages on the fly), a performance tweak that enabled the Times to keep up with unprecedented traffic to its election results data.
+++

+++
Hosting: Amazon’s EC2 service, used for most of the team’s data projects, has enabled them to scale with demand. “Amazon has been the savior of this group,” Pilhofer says.
+++

+++
Newsroom geography: The interactive news technology group sits on the Times building’s second floor, in close proximity to the graphics and CAR teams, the two groups Pilhofer says his team works most closely with. (The paper’s business desk takes up much of the rest of the floor.) The graphics desk, in particular, has been a close collaborator, bringing sophisticated visual interpretations to many of the team’s projects. Pilhofer calls deputy graphics director Matthew Ericson the “de facto co-manager” of the interactive news technology team.
+++

+++
Roles and hierarchy: Responsibility for the Times’ interactive projects is shared among Pilhofer’s team, the graphics department and other groups in the newsroom (a highly collaborative, loosely organized structure that reminds me of how interactive projects got done at the L.A. Times, but on a much larger scale). “I kind of like the way it’s working right now, where there isn’t some big, centralized, one-person-in-charge-of-everything,” Pilhofer says. “I think it’s healthier.” Each group brings certain strengths. For instance, the graphics folks want to do really intense, deep immersive online interactives, but they can’t do that without back-end help from Pilhofer’s team, so the two groups work together. Organizationally, Pilhofer says his team benefits from a direct connection to the website’s software and infrastructure folks while other teams are more closely tied to the newsroom. The downside to this setup, of course, is that it’s sometimes hard to know who owns what.
+++

More Pilhofer: Old Media, New Tricks recently published an interview with Pilhofer.

Update 2009.01.13: Emily Nussbaum has a feature on Pilhofer, Ericson and other NYT geeks in New York Magazine.

Coming next week: A look at DocumentCloud, a promising Knight News Challenge proposal from Pilhofer and ProPublica’s Eric Umansky and Scott Klein.

Photo by Eric Ulken.