To create an effective data standard, decisions needs to be made about four main things:
A schema is the structure of the data. It outlines all the different facets/fields of information that should be captures about each individual item. If you think about data as content on a spreadsheet then the schema is first row, which defined the information in all the columns.
When figuring out how to share data with other people and organizations, one of the most important conversation to have is about the labels on the column headers. If you use the same column headers as other resources, it will be much easier for your resources to be integrated with other resources - enable people to search many resources at one time. Making your resources interoperable with other resources is really important because it will allow fewer people to develop resources and keep them up to date, while producing superior resources for your constituents.
Taxonomies consist of a list of descriptive terms that are used to classify and categorize information. It is useful to think of this as the terms you’d like to be able to filter resources through. For example, food stamps help people deal overcome hunger, so you might want to attach the term “hunger” to a program or service that provides food stamps to people.
Just like with fields, if you use the same taxonomy as other groups doing similar work, it will be much easier for people to maintain resources and for people to find what they're looking for.
With services information, we need two taxonomies: one that describes “what” the service provides and the other that describes “who” that service is meant to assist. We call the “what” schema “service_type” and the “who” schema “audience.”
* Civic Services Taxonomy
You'll also need to use a taxonomy for locations information such as neighborhoods. We recommend using NYC's official neighborhood list for
* Location Taxonomy
If you want information to be as accessible and useful as possible while still retaining legal control over the information, a creative commons licenses should be used. These are many different types. You can learn more here: http://creativecommons.org/choose/
Format describes the type of file being managed. Examples of different electronic formats include PDF, CSV, TXT, XLS, XML, etc.
For resources to be easily shared online, they must be in the same or compatible formats. Some formats are easier to share offline than others. The chart below shows a list of popular formats with rankings from (1) worst to (3) best.
- PDFs are great for printing but make it nearly impossible to share their data with other systems.
- CSVs are fantastic for sharing data but almost entirely unintelligible in print.
- The best formats for sharing news and events information is through XML formats. Most popular website content management systems (CMS) produce XML news feeds called RSS and event feeds call iCAL. Google Calendars also produce iCAL feeds.
"How to Contribute" could be considered a fifth category.
For a data standard to remain relevant and useful, the community that uses it should be able to update the standard to meet their ever changing needs. For them to do this effectively, standards should be managed by a body (group of people) and they should provide people with the ability to modify the standard.