Category Archives: blog

Open Data Panel at All Things Open Conference

Open Data Panel to be Featured at All Things Open

Open Data will be a featured panel discussion at the All Things Open conference this year, .  With a new administration set to transition into place in January and multiple new initiatives starting at both the state and federal levels, the topic has never been more important.  The session, which will take place Wednesday, October 26 at 1:30 pm ET, will feature some of the foremost experts in the world.

Topics to be discussed will include:

  • The New Open Data Transition Report
  • Future opportunities for Open Data at the local and federal levels with the DATA Act
  • How the Open Data landscape is evolving, particularly through Demand-Driven Open Data (DDOD)
  • Future opportunities in open data at the Federal and local levels
  • How the panel’s insights can help local governments create demand driven open data programs

The world-class lineup of panel members will include:

  • Joel Gurin-  (President and Founder, Center for Open Data Enterprise)
  • Hudson Hollister –  (Founder and Executive Director, Data Coalition)
  • David Portnoy –  (Founder, Demand-Driven Open Data)
  • Tony Fung –  (Deputy Secretary of Technology, State of Virginia)
  • Andreas Addison –  (President, Civic Innovator LLC)
  • Sam McClenney –  (Economist, RTI International)
  • Caroline Sullivan – (Wake County Commissioner)

The panel is open to attendees of All Things Open, the largest “open” technology event on the east coast of The Unites States

 

Rheumatoid Arthritis Data Challenge

Looking forward to seeing the evolution of the Rheumatoid Arthritis Data Challenge.  Here are the parameters…

  • Title: Rheumatoid Arthritis Data Challenge
  • Announcement date: March 8, 2016
  • Award date: May 10, 2016
  • Summary:
The Rheumatoid Arthritis Data Challenge is a code-a-thon, described as:

“Striking at the heart of a key issue in health outcomes research, participants will be provided access to a secured development environment in a staged competition over three weeks to create the best competitive algorithms to gauge clinical response in Rheumatoid Arthritis management.”
The challenge is hosted by Health Datapalooza in May 2016. It’s sponsored by Optum, Academy Health, and the US Department of Health and Human Services (HHS). This challenge uses non-governmental de-identified administrative claims data and electronic record clinical (EHR) data with the goal of establishing algorithms to predict clinical response to rheumatoid arthritis management. Applications are open to any team of health data enthusiasts, but only 15 of these will be selected for participation. (Register at: https://hdpalooza.wufoo.com/forms/rheumatoid-arthritis-data-challenge-reg-form/). Winners announced at the Health Datapalooza on May 10, 2016, with $40,000 in prizes to be awarded.

Public Access Repositories for Federally Funded Research

According to OSTP, there has been growth in the use of public access repositories intended to store results of federally funded research.  That’s good news.  Despite a mandate from February 2013 that such results be made available, the adoption by the research community has been slow.  Challenges include the competitive nature of research, mixing of multiple sources of funding, licensing conflicts with private peer reviewed publications, privacy concerns for study subjects, and many others.  Actually, even the raw data and source code for the calculations needs to be mHHS Public access reposade available.  For a research study, the clearest measure for meeting this mandate is complete reproducibility.

So while we’re quite far away from the ultimate goal, there have been incremental gains.  The HHS statistical agencies (including NIH, AHRQ, CDC, C FDA and ASPR) in particular have been using two systems: PubMed Central and CDC Stacks.  According to the latest figuresGrowth in PubMed from OSTP, on a typical weekday PubMed has than 1.2 million unique users who are downloading 2 million articles.  While that’s impressive, the actual growth in the number of articles in the two years since the mandate is approximately 30% (from about 2.7 million to 3.5 million).  So much more work remains.

 

Open Access repositories at a glance

Plans for Demand-Driven Open Data 2.0

Demand-Driven Open Data (DDOD) is a component HHS’s Health Data Initiative (HDI) represented publicly by  HealthData.gov.  DDOD is a framework of tools and methods to provide a systematic, ongoing and transparent mechanism for industry and academia to tell HHS more about their data needs.  The DDOD project description has recently been updated on the HHS IDEA Lab website: http://www.hhs.gov/idealab/projects-item/demand-driven-open-data/.   The writeup includes the problem description, background and history, the DDOD solution and process, and future plans.

In November 2015, the project has undergone an extensive evaluation of the activities and accomplishments from the prior year.  Based on the observations, plans are in place to deploy DDOD 2.0 in 2016.  On the process side, the new version will have clearly defined SOPs (standard operating procedures), better instructions for data requesters and data program owners, and up-front validation of use cases.  On the technology side, DDOD will integrate with the current HealthData.gov platform, with the goals of optimizing data discoverability and usability.  It will also include dashboards, data quality analytics, and automated validation of use case content.  These features help guide the operations of DODD and HealthData.gov workflow.

Invisible Illness Codaton

Identifying Datasets for Invisible Illness Codathon

Several datasets were identified for use on a recent White House codathon on mental illness and suicide prevention.  (See related press release.)  Many of them were from HHS (U.S. Department of Health and Human Services) agencies: CDCSAMHSA and AHRQ.  Datasets throughout government were tagged with “Suicide” for easy retrieval.  These tags were then ingested and aggregated up to Data.gov, specifically http://catalog.data.gov/dataset?tags=suicide.

Source: White House – Suicide Prevention/Mental Health & Data for Invisible Illnesses

Data sourcesCDC Suicide data sources

  • WHO Statistical Information System (WHOSIS)WHOSIS, the WHO Statistical Information System, is an interactive database bringing together core health statistics for the 193 WHO Member States. It comprises more than 70 indicators, which can be accessed by way of a quick search, by major categories, or through user-defined tables. The data can be further filtered, tabulated, charted and downloaded.
  • International Crime Victims Surveys
  • National Inpatient Sample (NIS)The NIS is a database of hospital inpatient stays used to identify, track, and analyze national trends in health care utilization, access, charges, quality, and outcomes. The NIS is the largest all-payer inpatient care database that is publicly available in the United States, containing data from approximately 8 million hospital stays from about 1,000 hospitals sampled to approximate a 20-percent stratified sample of U.S. community hospitals
  • National Survey on Drug Use and Health (NSDUH)Beginning in 2008 the National Survey on Drug Use and Health Report starting asking suicidal thoughts and behaviors of all adults aged 18 or older. Along with responses for the suicide-related questions, the survey collects nationally- and state-representative information on socio-demographic items such as age group, sex, ethnicity, employment, and income.
  • Pan American Health Association, Regional Core Health Data InitiativeIn 1995, the Regional Core Health Data and Country Profile Initiative was launched by the Pan American Health Organization to monitor the attainment of health goals of the Member States. The initiative includes a database with 117 health-related indicators, country health profiles, and reference documents.
  • The American Association of SuicidologyThe goal of the American Association of Suicidology (AAS) is to understand and prevent suicide. The Research Division of AAS is dedicated to advancing knowledge about suicidal behavior through science.
  • Suicide Attack Database – current CPOST-SAD (release contains the universe of suicide attacks from 1982 through June 2015, a total of 4,620 attacks in over 40 countries.
  • Behavioral Risk Factor Surveillance System (BRFSS) —Collects data on a variety of behavioral health issues through a national telephone survey developed by the US Centers for Disease Control and Prevention (CDC), and administered to a sample of households in the US. Some states include questions on suicidal behavior.
  • Department of Defense Suicide Event Report (DoDSER) Data – The Department of Defense Suicide Event Report (DoDSER) is the system of record for health surveillance related to suicide ideations, attempts, and deaths.

 

Overview for using these data sources

 

 

Record matching on mortality data

I’m looking forward to teaming up with my HHS Entrepreneur-in-Residence cohorts Paula Braun and Adam Culbertson.  We have a “perfect storm” coming up, where all three of our projects are intersecting.  Paula is working on modernizing the nation’s mortality reporting capabilities.  Adam has been working with the HIMSS (Heath Information Management Society and Systems) organization to improve algorithms and methods for matching patient records.  And I, for the DDOD project, have been working on a use case to leverage NDI (National Death Index) for outcomes research.  So the goals of mortality system modernization, patient matching and outcomes research are converging.Patient Matching Exercise

To that end, Adam organized a hackathon at the HIMSS Innovation Center in Cleveland for August 2015.  This event throws in one more twist: the FHIR (Fast Healthcare Interoperability Resources) specification.  FHIR is a flexible standard for exchanging healthcare information electronically using RESTful APIs.  The hackathon intends to demonstrate what can be accomplished when experts from different domains combine their insights on patient matching and add FHIR as a catalyst.  The event is broken into two sections:

Section 1:  Test Your Matching Algorithms
Connect matching algorithms to a FHIR resource server containing synthetic patient resources.  The matching algorithms will be updated to take in FHIR patient resources and then perform a de-duplication of the records.  A final list of patient resources should be produced.  Basic performance metrics can then be calculated to determine the success of the matching exercise.  Use the provided tools, or bring your own and connect them up.Section 2:  Development Exercise
Develop applications that allow EHRs to easily update the status of patients who are deceased. A synthetic centralized mortality database, such as the National Death Index or a state’s vital statistics registry, will be made available through a FHIR interface.  External data sources, such as EHRs, will be matched against this repository to flag decedents. The applications should be tailored to deliver data to decision makers. This scenario will focus on how different use cases drive different requirements for matching.

Matching algorithms for patient recordsPatient matching and de-duplication is an important topic in EHRs (Electronic Health Records) and HIEs (Health Information Exchanges), where identifying a patient uniquely impacts clinical care quality, patient safety, and research results.  It becomes increasingly important as organizations exchange records electronically and patients seek treatment across multiple healthcare providers.   (See related assessment titled “Patient Identification and Matching Report” that was delivered to HHS’s ONC in 2014.)

We’re looking forward to reporting on progress on all three initiatives and the common goal.

This topic is covered on the HHS IDEA Lab blog:  http://www.hhs.gov/idealab/2015/08/10/teaming-advance-patient-matching-hackathon/

Appendix: Background on patient matching

Additional challenges occur because real-world data often has errors, variations and missing attributes.  Common errors could include misspellings and transpositions.  Many first names in particular could be written in multiple ways, including variations in spelling, formality, abbreviations and initials.  In large geographies, it’s also common for there to be multiple patients with identical first and last names.

Data set Name Date of birth City of residence
Data set 1 William J. Smith 1/2/73 Berkeley, California
Data set 2 Smith, W. J. 1973.1.2 Berkeley, CA
Data set 3 Bill Smith Jan 2, 1973 Berkeley, Calif.

Although there’s a broad range of matching algorithms, they can be divided into two main categories:

  • Deterministic algorithms search for an exact match between attributes
  • Probabilistic algorithms score an approximate match between records

These are often supplemented with exception-driven manual review.  From a broader, mathematical perspective, the concept we’re dealing with is entity resolution (ER).  There’s a good introductory ER tutorial that summarizes the work in Entity Resolution for Big Data, presented at KDD 2013.  Although it looks at the discipline more generically, it’s still quite applicable to patient records.  It delves into the areas of Data Preparation, Pairwise Matching, Algorithms in Record Linkage, De-duplication, and Canonicalization.  To enabling scalability, it suggest use of Blocking techniques and Canopy Clustering    These capabilities are needed so often, that they may be built into commercial enterprise software.  IBM’s InfoSphere MDM (Master Data Management) is an example.

Metrics for patient matchingWhen comparing multiple algorithms for effectiveness, we have a couple good metrics: precision and recall.  Precision identifies how many of the matches were relevant, while recall identifies how many of the relevant items were matched.  F-Measure combines the two.  It should be noted that the accuracy metric, which is the ratio of items accurately identified to the total number of items, should be avoided.  It suffers from the “accuracy paradox”, where lower measures of accuracy may actually be more predictive

 

  • Precision:     p = TP/(TP+FP)
  • Recall:    r = TP/(TP+FN)
  • F-Measure =  2 p r / (p + r)
  • Accuracy:   a = TP+TN/(TP+TN+FP+FN)

In the long run, the challenge can also be approached from the other side.  In other words, how can the quality of data entry and storage within an organization be improved.  This approach could reap benefits in downstream matching, reducing the need for complex algorithms and improving accuracy.  AHIMA published a primer on Patient Matching in HIEs, in which they go as far as calling for a nationwide standard that would facilitate more accurate matching.  They suggest standardizing on commonly defined demographic elements, eliminating use of free text entry except for proper names, and ensuring multiple values aren’t combined in single fields.

Using DDOD to identify and index data assets

Part of implementing the Federal Government’s M-13-13 “Open Data Policy – Managing Information as an Asset” is to create and maintain an Enterprise Data Inventory (EDI).   EDI is supposed to catalog government-wide SRDAs (Strategically Relevant Data Assets).  The challenge is that the definition of an SRDA is subjective within the context of an internal IT system, there’s not enough budget to catalog the huge number of legacy systems, and it’s hard to know when you’re done documenting the complete set.

Enter DDOD (Demand-Driven Open Data).  While it doesn’t solve these challenges directly, its practical approach to managing open data initiatives certainly can improve the situation.  Every time an internal “system of record” is identified for a DDOD Use Case, we’re presented with a new opportunity to make sure that an internal system is included in the EDI.  Already, DDOD has been able to identify missing assets.

DDOD helps with EDI and field-level data dictionary

But DDOD can do even better.  By focusing on working one Use Case at a time, we provide the opportunity to catalog the data asset to a much more granular level.  The data assets on HealthData.gov and Data.gov are catalog at the dataset level, using the W3C DCAT (Data Catalog) Vocabulary.  The goal is to catalog datasets associated with DDOD Use Cases at the field-level data dictionary level.  Ultimately, we’d want to get attain a level of sophistication at which we’re semantically tagging fields using controlled vocabularies.

Performing field-level cataloging all this has a couple important advantages.  First, in enables better indexing and more sophisticated data discovery on HealthData.gov and other HHS portals.  Second, it identifies opportunities to link across datasets from different organizations and even across different domains.  The mechanics of DDOD in relation to EDI, HealthData.gov, data discoverability and linking is further explained at the Data Owners section of the DDOD website.

Note: HHS EDI is not currently available as a stand-alone data catalog.  But it’s incorporated into http://www.healthdata.gov/data.json, because this catalog includes all 3 types of access levels: public, restricted public, and non-public datasets.

DDOD Love from Health Datapalooza 2015

Health Datapalooza

Demand-Driven Open Data (DDOD) has gotten a lot of coverage throughout Health Datapalooza 2015.  I participated in 4 panels throughout the week and had the opportunity to explain DDOD to many constituents.

  • Developer HealthCa.mp
    Health DevCamp logo
    Developer HealthCa.mp is a collaborative event for learning about existing and emerging APIs that can be used to develop applications that will help consumers, patients and/or beneficiaries achieve better care through access to health data, especially their own!Areas of focus include:
    • Prototype BlueButton on FHIR API from CMS
    • Project Argonaut
    • Privacy on FHIR initiative
    • Sources of population data from CMS and elsewhere around HHS
  • Health Datapalooza DataLab
    EVENT DETAILS HHS has so much data! Medicare, substance abuse and mental health, social services and disease prevention are only some of the MANY topical domains where HHS provides huge amounts of free data for public consumption. It’s all there on HealthData.gov! Don’t know how the data might be useful for you? In the DataLab you’ll meet the people who collect and curate this trove of data assets as they serve up their data for your use. But if you still want inspiration, many of the data owners will co-present with creative, insightful, innovative users of their data to truly demonstrate its alternative value for positive disruptions in health, health care, and social services.

    Moderator: Damon Davis, U.S. Department of Health & Human Services

    Panelists: Natasha Alexeeva, Caretalia; Christina Bethell, PhD, MBA, MPH, Johns Hopkins; Lily Chen, PhD, National Center for Health Statistics; Steve Cohen, Agency for Healthcare Research & Quality; Manuel Figallo, Sas; Reem Ghandour, DrPH, MPA, Maternal and Child Health Bureau; Jennifer King, U.S. Department of Health & Human Services; Jennie Larkin, PhD, National Institutes of Health; Brooklyn Lupari, Substance Abuse & Mental Health Services Administration; Rick Moser, PhD, National Cancer Institute; David Portnoy, MBA, U.S. Department of Health & Human Services; Chris Powers, PharmD, Centers for Medicare and Medicaid Services; Elizabeth Young, RowdMap

  • No, You Can’t Always Get What You Want: Getting What You Need from HHS
    EVENT DETAILSWhile more data is better than less, pushing out any ol’ data isn’t good enough.  As the Data Liberation movement matures, the folks releasing the data face a major challenge in determining what’s the most valuable stuff to put out.  How do they move from smorgasbord to intentionally curated data releases prioritizing the highest-value data?  Folks at HHS are wrestling with this, going out of their way to make sure they understand what you want and ensure you get the yummy data goodies you’re craving.  Learn how HHS is using your requests and feedback to share data differently.  This session explores the HHS new initiative, the Demand-Driven Open Data (DDOD): the lean startup approach to public-private collaboration.  A new initiative out of HHS IDEA Lab, DDOD is bold and ambitious, intending to change the fundamental data sharing mindset throughout HHS agencies — from quantity of datasets published to actual value delivered.

    Moderator: Damon Davis, U.S. Department of Health & Human Services

    Panelists: Phil Bourne, National Institute of Health (NIH); Niall Brennan, Centers for Medicare & Medicaid Services; Jim Craver, MMA, Centers for Disease Control & Prevention; Chris Dymek, EdD, U.S. Department of Health & Human Services; Taha Kass-Hout, Food & Drug Administration; Brian Lee, MPH, Centers for Disease Control & Prevention; David Portnoy, MBA, U.S. Department of Health & Human Services

  • Healthcare Entrepreneurs Boot Camp: Matching Public Health Data with Real-World Business Models
    EVENT DETAILSIf you’ve ever considered starting something using health data, whether a product, service, or offering in an existing business, or a start-up company to take over the world this is something you won’t want to miss.  In this highly-interactive, games-based brew-ha, we pack the room full of flat-out gurus to get an understanding of what it takes to be a healthcare entrepreneur.  Your guides will come from finance and investment; clinical research and medical management; sales and marketing; technology and information services; operations and strategy; analytics and data science; government and policy; business, product, and line owners from payers and providers; and some successful entrepreneurs who have been there and done it for good measure.  We’ll take your idea from the back of a napkin and give you the know-how to make it a reality!

    Orchestrators: Sujata Bhatia, MD, PhD, Harvard University; Niall Brennan, Centers for Medicare & Medicaid Services; Joshua Rosenthal, PhD, RowdMap; Marshall Votta, Leverage Health Solutions

    Panelists: Michael Abate, JD, Dinsmore & Shohl LLP; Stephen Agular, Zaffre Investments; Chris Boone, PhD, Health Data Consortium; Craig Brammer, The Health Collaborative; John Burich, Passport Health Plan; Jim Chase, MHA, Minnesota Community Measurement; Arnaub Chatterjee, Merck; Henriette Coetzer, MD, RowdMap; Jim Craver, MAA, Center for Disease Control; Michelle De Mooy, Center for Democracy and Technology; Gregory Downing, PhD, U.S. Department of Health & Human Services; Chris Dugan, Evolent Health; Margo Edmunds,PhD, AcademyHealth; Douglas Fridsma, MD, PhD, American Medical Informatics Association; Tina Grande, MHS, Healthcare Leadership Council; Mina Hsiang, US Digital Services; Jessica Kahn, Center for Medicare & Medicaid Services; Brian Lee, MPH, Center for Disease Control; David Portnoy, MBA, U.S. Department of Health & Human Services; Aaron Seib, National Association for Trusted Exchange; Maksim Tsvetovat, OpenHealth; David Wennberg, MD, The Dartmouth Institute; Niam Yaraghi, PhD, Brookings Institute; Jean-Ezra Yeung, Ayasdi

 

There were follow-up publications as well.  Among them, was HHS on a mission to liberate health data from GCN.

GCN article on DDOD
HHS found that its data owners were releasing datasets that were easy to generate and least risky to release, without much regard to what data consumers could really use. The DDOD framework lets HHS prioritize data releases based on the data’s value because, as every request is considered a use case.It lets users — be they researchers, nonprofits or local governments — request data in a systematic, ongoing and transparent way and ensures there will be data consumers for information that’s released, providing immediate, quantifiable value to both the consumer and HHS.

My list of speaking engagements at Palooza is here.

Investment Model for Pharma

I had the opportunity to attend a presentation on “Entry and Investment Decisions in the
Pharmaceutical Industry”
by Anita Rao, PhD at Booth School of Business, University of Chicago. Transition between FDA approval years The concepts examined are applicable for any product that has lengthy periods of pre-launch R&D investment with a presence of competing products.  But there’s an aspect to this particular research that’s unique to pharmaceuticals: the uncertainty factor introduced by FDA’s drug approval process.  With that in mind, the paper analyzes historical data from FDA to infer how firms working on potentially competing products may respond each other’s actions prior to approval.

Quick side note…  I love what you can do by analyzing readily available historical data in a new way.  I think there’s an opportunity to improve on this model by leveraging valuable data that’s still buried deep within the FDA.  That’s exactly the kind of opportunity Demand-Driven Open Data (DDOD) was designed to address.

Investment model faster FDA approvalThere was one key question that it aimed to answer: What net effect would accelerating the drug approval process have on investment decisions and NPV (net present value) of each product.  The conclusion reached was that the increased incentive due to accelerated return on investment was significantly stronger than the disincentive due to risk of intensified competition.

Most immediately this model has potential to assist investors in making better decisions in regulated industries with substitute products and long investment time periods.  Investors in the areas of medical devices, agriculture and alternative energy might also be able to use this model.

But I’d love for this model to go beyond the use by investors and help inform public policy.  For that to happen, it needs to take into account a bigger picture, including the cost to the regulating body and by implication typically the taxpayer.  So in this particular case, we would need to assess the cost FDA bears in its current approval process, as well as estimating the likely increases due for accelerated approvals.

To take the concept further, there are certain questions that policymakers could address in order to maximize the total economic value to all participants.   For example, are there opportunities for the firms to fund or offset some of the additional cost from accelerating the approval process?  Is there an efficient way for the FDA to prioritize approvals more dynamically based on economic or public health value?  And is there a way to do so without significant conflicts of interest and minimal additional risk to consumers?

The Birth of Demand-Driven Open Data

And so it begins

My project as an Entrepreneur-in-Residence with the HHS IDEA Lab is called “Innovative Design, Development and Linkages of Databases”.  Think of it as Web 3.0 (the next generation of machine readable and programmable internet applications) applied to open government and focused on healthcare and social service applications.  The underlying hypothesis was that by investigating how HHS could better leverage its vast data repositories as a strategic asset, we would discover innovative ways to create value by linking across datasets from different agencies.

So to sum up…  I was to find opportunities across a trillion dollar organization, where the experts already working with the data have a lifetime of domain-specific experience and several acronyms after their name.  And I was to accomplish this without any dedicated resources within one year.  Pretty easy, right?

My hope was that my big data experience in industry — both for startups and large scale enterprises — was a sufficient catalyst to make progress.  And I had one other significant asset to make it all come together…  I was fortunate that the project was championed by a phenomenal group of internal backers: Keith Tucker and Cynthia Colton, who lead the Enterprise Data Inventory (EDI) in the Office of the Chief Information Officer (OCIO), and Damon Davis, who heads up the Health Data Initiative and HealthData.gov.

Tell me your data fantasies

The first step was to set out on a journey of discovery.  With guidance and clout from the internal sponsors, I was able to secure meetings with leaders and innovators for big data and analytics efforts across HHS.  I had the privilege of engaging in stimulating discussions at CMS, FDA, NIH, CDC, NCHS, ONC, ASPE and several other organizations.

Upon attempting to synthesize the information gathered into something actionable, I noticed that past open data projects fell into two camps.  In the first camp, were those with ample examples of how external organizations were doing fantastic and often unexpected things with the data.  In the second, while the projects may have been successfully implemented from a technical perspective, it wasn’t clear whether or how the data was being used.

The “aha” moment

That’s when it hit me — we’re trying to solve the wrong problem.  It seemed that the greatest value that has been created with existing HHS data — and thereby the most innovative linkages — has been done by industry, researchers and citizen activists.  That meant we can accomplish the main goals of the project if we look at the problem a bit differently.  Instead of outright building the linkages that we think have value, we can accelerate the rate at which external organizations to do what they do best.

It seemed so obvious now. In fact, I had personally experienced this phenomenon myself.  Prior to my HHS fellowship, I built an online marketplace for medical services called Symbiosis Health.  I made use of three datasets across different HHS organizations.  But I did so with great difficulty.  Each had deficiencies which I thought should be easy to fix.  It might be providing more frequent refreshes, adding a field that enables joins to another dataset, providing a data dictionary or consolidating data sources.  If only I could have told someone at HHS what we needed!

Let’s pivot this thing

Thus, the “pivot” was made.  While pivoting is a well known concept for rapid course correction in Lean Startup circles, it’s not something typically associated with government.  Entrepreneurs are supposed to allow themselves to make mistakes and make fast course corrections.  Government is supposed to plan ahead and stay the course.  Except in this case we have the best of both worlds — IDEA Lab.  It gives access to all the resources and deep domain expertise of HHS, but with the ability to pivot and continue to iterate without being weighed down by original assumptions!  I feel fortunate for an opportunity to work in such an environment.

Pivoting into Demand-Driven Open Data


So what exactly is this thing?

The project born from this pivot is called Demand-Driven Open Data (DDOD).  It’s a framework of tools and methods to provide a systematic, ongoing and transparent mechanism for industry and academia to tell HHS what data they need.  With DDOD, all open data efforts are managed in terms of “use cases” which enables allocation of limited resources based on value.  It’s the Lean Startup approach to open data.  The concept is to minimize up front development, acquiring customers before you build the product.

As the use cases are completed, several things happen.  Outside of the actual work done on adding and improving datasets, both the specifications and the solution associated with the use cases are documented and made publicly available on the DDOD website.  Additionally, for the datasets involved and linkages enabled, we add or enhance relevant tagging, dataset-level metadata, data dictionary, cross-dataset relationships and long form dataset descriptions.  This approach, in turn, accelerates future discoveries of datasets.  And best of all, it stimulates the linking we wanted in the first place, through coded relationships and field-level matching. 

How does it fit into the big picture?

It’s beautiful how the pieces come together.  DDOD incorporates quite well with HHS’s existing Health Data Initiative (HDI) and HealthData.gov.  While DDOD is demand-driven from outside of HHS, you can think of HDI as its supply-driven counterpart.  That’s the one guided by brilliant subject matter experts throughout HHS.  Finally, HealthData.gov is the data indexing and discovery platform that serves as a home for enabling both these components.  As a matter of fact, we’re looking for DDOD to serve as the community section of HealthData.gov.

Let’s roll!

So now the fun begins.  Next up…  More adventure as we work through actual pilot use cases.  We’ll also cover some cool potential components of DDOD that would put more emphasis on the “linkages” aspect of the project.  These include usage analytics, data maturity reporting, and semantic tagging of the dataset catalog and fields in the data dictionary.  Stay tuned.

 In the mean time, you can get involved in two ways…  Get the word out to your network about the opportunities provided by DDOD.  Or, if you have actual use cases to add, go to http://demand-driven-open-data.github.io/ and get them entered.