Part of implementing the Federal Government’s M-13-13 “Open Data Policy – Managing Information as an Asset” is to create and maintain an Enterprise Data Inventory (EDI). EDI is supposed to catalog government-wide SRDAs (Strategically Relevant Data Assets). The challenge is that the definition of an SRDA is subjective within the context of an internal IT system, there’s not enough budget to catalog the huge number of legacy systems, and it’s hard to know when you’re done documenting the complete set.
Enter DDOD (Demand-Driven Open Data). While it doesn’t solve these challenges directly, its practical approach to managing open data initiatives certainly can improve the situation. Every time an internal “system of record” is identified for a DDOD Use Case, we’re presented with a new opportunity to make sure that an internal system is included in the EDI. Already, DDOD has been able to identify missing assets.
But DDOD can do even better. By focusing on working one Use Case at a time, we provide the opportunity to catalog the data asset to a much more granular level. The data assets on HealthData.gov and Data.gov are catalog at the dataset level, using the W3C DCAT (Data Catalog) Vocabulary. The goal is to catalog datasets associated with DDOD Use Cases at the field-level data dictionary level. Ultimately, we’d want to get attain a level of sophistication at which we’re semantically tagging fields using controlled vocabularies.
Performing field-level cataloging all this has a couple important advantages. First, in enables better indexing and more sophisticated data discovery on HealthData.gov and other HHS portals. Second, it identifies opportunities to link across datasets from different organizations and even across different domains. The mechanics of DDOD in relation to EDI, HealthData.gov, data discoverability and linking is further explained at the Data Owners section of the DDOD website.
Note: HHS EDI is not currently available as a stand-alone data catalog. But it’s incorporated into http://www.healthdata.gov/data.json, because this catalog includes all 3 types of access levels: public, restricted public, and non-public datasets.