Now that I have your attention, I note with interest that the Obama administration is now providing us with everything we ever wanted to now about the skin color and sex of U.S. farmers but were afraid to ask.
The president through his budget director demanded last year that all government agencies provide "three high-value datasets" online in an "open format" on data.gov by Friday January 22, 2010. The Agriculture department has responded with the "2007 Census of Agriculture Race, Ethnicity, and Gender".
I didn't download the txt/csv file that contained the data but I am almost certain that this data involves the sex and the skin color of the farmers and not the animals. Other "open formats" apparently supported on the data.gov include Microsoft (MSFT) .xls, the W3C's Extensible Markup Lanuage (XML), Google's (GOOG) Keyhole Markup Language (KML)/KSV, and Shapefile. Those are open enough by my market-centric definition of standarization but I am not sure open-standards purists would agree. Where is Sun's (JAVA) Open Document Format (ODF)? Apparently aware that that might be an issue, the Obamanistas did not include Microsoft's .xls in its catalog's logo although it did show Google's KML.
Speaking about purity, if you're prudish about such things as farmer sex, race and ethnicity there is a feed grains database (also from the Department of Agriculture), t
The budget director's memo itself was formated in Adobe (ADBE) .pdf, one of about a dozen "open formats" recognized by the International Standards Organization.
-- Dennis Byron
I think that if you talk to experts in this area they would express a preference for datasets (as opposed to documents) to be published in a format directly consumable, in particular by web apps for making making mashups. So they want the data sets in pure XML, or in Atom feeds, or JSON objects. Presentation level documents like XLS, PDF or indeed ODF are not the best choice for publishing a dataset (as opposed to a document).
You might find this discussion interesting: http://www.sunlightlabs.com/blog/2009/adobe-bad-open-government/
Posted by: Rob Weir | January 27, 2010 at 11:55 AM
Dennis Byron's reply:
Thanks for the comment. As a consumer of such data for my research, I agree. I'm happy with CSV and TXT (which is why I always found the the whole OOXML/ODF thing simply an IBM/Sun attack on Microsoft). Just to be clear, it is the U.S. government that chose .xls and the other formats. It's not that I am advocating one or the other.
Posted by: Dennis Byron | January 27, 2010 at 12:31 PM