Monthly Archives: March 2015

The Birth of Demand-Driven Open Data

And so it begins

My project as an Entrepreneur-in-Residence with the HHS IDEA Lab is called “Innovative Design, Development and Linkages of Databases”.  Think of it as Web 3.0 (the next generation of machine readable and programmable internet applications) applied to open government and focused on healthcare and social service applications.  The underlying hypothesis was that by investigating how HHS could better leverage its vast data repositories as a strategic asset, we would discover innovative ways to create value by linking across datasets from different agencies.

So to sum up…  I was to find opportunities across a trillion dollar organization, where the experts already working with the data have a lifetime of domain-specific experience and several acronyms after their name.  And I was to accomplish this without any dedicated resources within one year.  Pretty easy, right?

My hope was that my big data experience in industry — both for startups and large scale enterprises — was a sufficient catalyst to make progress.  And I had one other significant asset to make it all come together…  I was fortunate that the project was championed by a phenomenal group of internal backers: Keith Tucker and Cynthia Colton, who lead the Enterprise Data Inventory (EDI) in the Office of the Chief Information Officer (OCIO), and Damon Davis, who heads up the Health Data Initiative and HealthData.gov.

Tell me your data fantasies

The first step was to set out on a journey of discovery.  With guidance and clout from the internal sponsors, I was able to secure meetings with leaders and innovators for big data and analytics efforts across HHS.  I had the privilege of engaging in stimulating discussions at CMS, FDA, NIH, CDC, NCHS, ONC, ASPE and several other organizations.

Upon attempting to synthesize the information gathered into something actionable, I noticed that past open data projects fell into two camps.  In the first camp, were those with ample examples of how external organizations were doing fantastic and often unexpected things with the data.  In the second, while the projects may have been successfully implemented from a technical perspective, it wasn’t clear whether or how the data was being used.

The “aha” moment

That’s when it hit me — we’re trying to solve the wrong problem.  It seemed that the greatest value that has been created with existing HHS data — and thereby the most innovative linkages — has been done by industry, researchers and citizen activists.  That meant we can accomplish the main goals of the project if we look at the problem a bit differently.  Instead of outright building the linkages that we think have value, we can accelerate the rate at which external organizations to do what they do best.

It seemed so obvious now. In fact, I had personally experienced this phenomenon myself.  Prior to my HHS fellowship, I built an online marketplace for medical services called Symbiosis Health.  I made use of three datasets across different HHS organizations.  But I did so with great difficulty.  Each had deficiencies which I thought should be easy to fix.  It might be providing more frequent refreshes, adding a field that enables joins to another dataset, providing a data dictionary or consolidating data sources.  If only I could have told someone at HHS what we needed!

Let’s pivot this thing

Thus, the “pivot” was made.  While pivoting is a well known concept for rapid course correction in Lean Startup circles, it’s not something typically associated with government.  Entrepreneurs are supposed to allow themselves to make mistakes and make fast course corrections.  Government is supposed to plan ahead and stay the course.  Except in this case we have the best of both worlds — IDEA Lab.  It gives access to all the resources and deep domain expertise of HHS, but with the ability to pivot and continue to iterate without being weighed down by original assumptions!  I feel fortunate for an opportunity to work in such an environment.

Pivoting into Demand-Driven Open Data


So what exactly is this thing?

The project born from this pivot is called Demand-Driven Open Data (DDOD).  It’s a framework of tools and methods to provide a systematic, ongoing and transparent mechanism for industry and academia to tell HHS what data they need.  With DDOD, all open data efforts are managed in terms of “use cases” which enables allocation of limited resources based on value.  It’s the Lean Startup approach to open data.  The concept is to minimize up front development, acquiring customers before you build the product.

As the use cases are completed, several things happen.  Outside of the actual work done on adding and improving datasets, both the specifications and the solution associated with the use cases are documented and made publicly available on the DDOD website.  Additionally, for the datasets involved and linkages enabled, we add or enhance relevant tagging, dataset-level metadata, data dictionary, cross-dataset relationships and long form dataset descriptions.  This approach, in turn, accelerates future discoveries of datasets.  And best of all, it stimulates the linking we wanted in the first place, through coded relationships and field-level matching. 

How does it fit into the big picture?

It’s beautiful how the pieces come together.  DDOD incorporates quite well with HHS’s existing Health Data Initiative (HDI) and HealthData.gov.  While DDOD is demand-driven from outside of HHS, you can think of HDI as its supply-driven counterpart.  That’s the one guided by brilliant subject matter experts throughout HHS.  Finally, HealthData.gov is the data indexing and discovery platform that serves as a home for enabling both these components.  As a matter of fact, we’re looking for DDOD to serve as the community section of HealthData.gov.

Let’s roll!

So now the fun begins.  Next up…  More adventure as we work through actual pilot use cases.  We’ll also cover some cool potential components of DDOD that would put more emphasis on the “linkages” aspect of the project.  These include usage analytics, data maturity reporting, and semantic tagging of the dataset catalog and fields in the data dictionary.  Stay tuned.

 In the mean time, you can get involved in two ways…  Get the word out to your network about the opportunities provided by DDOD.  Or, if you have actual use cases to add, go to http://demand-driven-open-data.github.io/ and get them entered.