I recently tried to get some data (that is already public, at least in theory) from the government, to take an empirical view at what legislators do in Congress (who sponsors bills, votes, speaks on the floor, and how that changes over time (seniority and by decade). It is basically impossible as far as I can see (see below for the full story, but its not that interesting).
The government as a whole does a poor job of providing data to the public in general. Even when they try and do a good job, their websites are generally crummy, missing features that many people have come to expect from large entities (good search, layout, etc). Additionally, they make it very hard to query data (as a silly example, find the titles of all bills longer than 1000 words that dealt with corn explicitly). You can’t even begin to get an answer to the question on government web sites. To begin with, not all bills are searchable (only recent ones), and you can’t search across sessions. Length is not a queryable field, and for that matter, there are not fields, its just raw text. So this seemingly simple question that would take a few minutes to hack together if it was in some SQL database somewhere is suddenly difficult (or even downright impossible)
While Congress is large enough that it has at least some data that is available online, most parts of the government do not. If you are a researcher that wants to study town hall meetings across the country, too bad. Raw data is going to be painful to come by. The sad part is, all the data is already digital (in almost all cases) or can easily be made digital (it was typed, and therefore OCR will probably work very well and cheaply). Additionally every state has some analog of the Freedom of Information Act, where if someone requests the data, and it does not contain anything that could be harmful/confidential, it must be released. All this does it make data less useful.
While I don’t think that data is a panacea for every problem, it does make rational discussion possible in a way that simple isn’t possible otherwise. If you know that particular communities have seen dramatic drops in crime rates (data) you can look for commonalities in the town hall minutes (more data) to see what changed all automatically (there are algorithms and programs that can do a decent job at this).
The argument can move away from “Here is what makes sense to me” or “Here is what I see in the small amount of communities that I was able to survey” to here is what we know after comparing a large number of communities (some significant fraction of them, or even just all of them).
I am not saying that the government should do all this research for us, or even run all the queries for us, but merely that if the data is already accessible (if you are willing to put in the time) it should be made centrally available, and downloadable free of charge so that real research is possible/made easier.
What inspired this:
I recently tried to get a digital copy of the Congressional Record so I could attempt to figure what is done in Congress and who does it (do freshmen congressmen propose more bills? Vote more? Have better attendance? Speak on the house floor for longer? Does it drop off in a predictable way (or rise as time goes on)?) and I was stunned at the difficultly of getting the data. The GPO (government printing office) whose motto is “Keeping America Informed” has a “cannot retrieve page” error, and in case you wanted to see what the page looked like before, you can’t because they have a robot.txt file that tells Google (and any other search engines) something like “We would really appreciate it if you didn’t look at out stuff in these really broad categories.”
If you want, they make available a subscription to a bi-weekly publication that summarizes the Congressional Record for you (now available in microfiche as well as paper) but thats really not as useful. They have a searchable version of the Congressional Record, but its basically useless, as you can only search one session of congress at a time, the search is basically just a word search (does this contain the word I am looking for) has no ability to sort (other than by date), and only goes back around 20 years (but they do have the worst index I’ve seen a long time that goes back to 1983). If you are looking for a relevant result on a bunch of common words, you are out of luck.
I can’t even find a print copy floating around that is anywhere near reasonably complete (or even has anything that is complete from 1983 to the present). I figure that the Library of Congress probably has a copy, so I go to thier website, search for Congressional Record and I get Congress speaks on Nixon’s visit to mainland China. The only result that even has to do with a copy of the Congressional Record is buried at item 34, and is from 1909, and just contains a list of the volumes. Good job congress.