Beyond the Data Portal

I’m a data portal skeptic. I have been for years, but I’ve gotten tripped up when trying to explain why. I’m certainly not anti open data. I’m not even anti data portal. But I worry that organizations think that setting up an open data portal is a way to make data useful, when it’s really just a small step toward that goal.

More cynically, I worry that people are setting up open data portals, holding press conferences to announce them, dusting their hands off, and moving on, confident that they can check “get open data” off their to do list.

It’s time to acknowledge that data is not made useful simply by making it available online. As we work to make data open and available, we also need to train people who can help make it accessible and useful.

Consider this scenario:

You’re doing research on transportation in Pacific Beach, a neighborhood in San Diego. You want to find out how dangerous Pacific Beach’s streets are. You heard that the city of San Diego recently launched an open data portal, so you go to it and find a database showing the locations of recently-filled potholes and streetlights. That’s it. That’s all you can find related to the streets in Pacific Beach in the city’s portal.

This is a completely hypothetical situation. San Diego doesn’t even have an open data portal (yet). But it highlights a key limitation of any data portal: it’s bound by its jurisdiction. If you’re anywhere in the city of San Diego, you’re also in San Diego County, as well as in California and the United States. Oh, and you’re also within a bi-national region that some of us call Tijuego. A city data portal, by definition, will only hold a portion of the data that might be useful to you.

Now consider this scenario:

You’re doing your research, but you’ve heard of the San Diego Regional Data Library. You go to its website and see that you can email, call, or chat online with a data librarian who can help you find the information you need. You call the library and speak with a librarian who tells you that the data you need is provided by the county rather than the city. You also learn about datasets available from California’s Department of Transportation, a non-profit called BikeSD, and some other data from the city that hasn’t been opened up yet.

This is also a hypothetical situation. In fact, it is the hypothesis behind my research in Beth Noveck’s Gov 3.0 course.

The concept of an open data portal is relatively new, and most scholarly research on government open data has so far focused on why governments do or do not create open data portals. Findings on the impact of portals is still scarce, but my hunch (and others’) is that open data portals don’t accomplish much per se. As you can tell from the scenario I describe above, I believe they need librarians: helpful people who understand how to find information.

While he might not consider himself a librarian, Chicago’s Director of Analytics, Tom Schenk, provides a good example of how this can work. According to Christopher Whitaker, Chicago’s Code for America Brigade Captain, Schenk not only oversees the city’s data portal, but also “does a great job of communicating with civic technologists about the data” and attends local hack nights. Christopher says Schenk’s presence makes “all the difference in the world.”

I’m optimistic about data’s ability to promote social development, but doing research is difficult, and finding useful data for research is extremely difficult. It requires an understanding of what data sets exists, what format they’re in, how often they’re updated, and how reliable they are. Open data portals can make some datasets easier to find, but a library can provide the additional human touch needed to make data accessible and put it in the hands of the people who need it.

My project for Gov 3.0 is to figure out how this a data library would work. How should a data library be staffed? Who would fund it? Could it fund itself? Should data libraries be regional (e.g. a library focused on data about San Diego) or topical (e.g. a library focused on all data about transportation)?

If you have ideas about how to make data useful beyond open data portals, I’d love to hear from you. Please drop me a line at

— Jed Sundwall (@jedsundwall)

  1. robdotd reblogged this from stevespiker and added:
    Thanks for sharing. via jedsundwall
  2. stevespiker reblogged this from gov30
  3. jedsundwall reblogged this from gov30 and added:
    I wrote a thing for my work in Beth Noveck’s Gov 3.0
  4. gov30 posted this