Category Archives: healthcare

Healthcare Provider Registries

As I’ve been reviewing Use Cases for DDOD (Demand-Driven Open Data), I’m realizing how much the industry depends on an up-to-date, reliable source of healthcare providers (aka, physicians, groups, hospitals, etc.).  Although some people may also call such an effort “NPI registry”, the actual need identified encompasses much more than even the fields and capabilities of the existing NPPES database.

Here are just the Use Cases that directly mention NPPES and other existing registries.

And besides these, there are at least a dozen more that would benefit from this repository, since they rely on the “provider” dimension for their analytics.  For example, most analysis on provider quality, utilization, and fraud depend on this dimension.

The most obvious improvements needed are around:

  • More realistic association between provider, group, and location, recognizing that these are many-to-many relationships that change with time
  • More accurate specialty taxonomy
  • More up to date information (since NPPES entries are rarely updated)
  • Easier method to query this information (rather than relying on zip file downloads)

But there are challenges on the “input” side of the equation as well.  There also seems to be some confusion in terms of assigning rights for modifying registries.  For example, it’s not easy for a provider group to figure out how to delegate update rights for all of its physicians to third party administrator.

There’s a growing list of companies and non-profits (including the American Medical Association) that have been trying to capitalize on the opportunities for a better solution.  As we go about working on the use cases mentioned here, I’d be looking to build a body of knowledge that would contribute to solving the core problems identified.

Related post:  CMS is enabling streamlined access to NPPES, PECOS, EHR to 3rd parties


The Birth of Demand-Driven Open Data

And so it begins

My project as an Entrepreneur-in-Residence with the HHS IDEA Lab is called “Innovative Design, Development and Linkages of Databases”.  Think of it as Web 3.0 (the next generation of machine readable and programmable internet applications) applied to open government and focused on healthcare and social service applications.  The underlying hypothesis was that by investigating how HHS could better leverage its vast data repositories as a strategic asset, we would discover innovative ways to create value by linking across datasets from different agencies.

So to sum up…  I was to find opportunities across a trillion dollar organization, where the experts already working with the data have a lifetime of domain-specific experience and several acronyms after their name.  And I was to accomplish this without any dedicated resources within one year.  Pretty easy, right?

My hope was that my big data experience in industry — both for startups and large scale enterprises — was a sufficient catalyst to make progress.  And I had one other significant asset to make it all come together…  I was fortunate that the project was championed by a phenomenal group of internal backers: Keith Tucker and Cynthia Colton, who lead the Enterprise Data Inventory (EDI) in the Office of the Chief Information Officer (OCIO), and Damon Davis, who heads up the Health Data Initiative and

Tell me your data fantasies

The first step was to set out on a journey of discovery.  With guidance and clout from the internal sponsors, I was able to secure meetings with leaders and innovators for big data and analytics efforts across HHS.  I had the privilege of engaging in stimulating discussions at CMS, FDA, NIH, CDC, NCHS, ONC, ASPE and several other organizations.

Upon attempting to synthesize the information gathered into something actionable, I noticed that past open data projects fell into two camps.  In the first camp, were those with ample examples of how external organizations were doing fantastic and often unexpected things with the data.  In the second, while the projects may have been successfully implemented from a technical perspective, it wasn’t clear whether or how the data was being used.

The “aha” moment

That’s when it hit me — we’re trying to solve the wrong problem.  It seemed that the greatest value that has been created with existing HHS data — and thereby the most innovative linkages — has been done by industry, researchers and citizen activists.  That meant we can accomplish the main goals of the project if we look at the problem a bit differently.  Instead of outright building the linkages that we think have value, we can accelerate the rate at which external organizations to do what they do best.

It seemed so obvious now. In fact, I had personally experienced this phenomenon myself.  Prior to my HHS fellowship, I built an online marketplace for medical services called Symbiosis Health.  I made use of three datasets across different HHS organizations.  But I did so with great difficulty.  Each had deficiencies which I thought should be easy to fix.  It might be providing more frequent refreshes, adding a field that enables joins to another dataset, providing a data dictionary or consolidating data sources.  If only I could have told someone at HHS what we needed!

Let’s pivot this thing

Thus, the “pivot” was made.  While pivoting is a well known concept for rapid course correction in Lean Startup circles, it’s not something typically associated with government.  Entrepreneurs are supposed to allow themselves to make mistakes and make fast course corrections.  Government is supposed to plan ahead and stay the course.  Except in this case we have the best of both worlds — IDEA Lab.  It gives access to all the resources and deep domain expertise of HHS, but with the ability to pivot and continue to iterate without being weighed down by original assumptions!  I feel fortunate for an opportunity to work in such an environment.

Pivoting into Demand-Driven Open Data

So what exactly is this thing?

The project born from this pivot is called Demand-Driven Open Data (DDOD).  It’s a framework of tools and methods to provide a systematic, ongoing and transparent mechanism for industry and academia to tell HHS what data they need.  With DDOD, all open data efforts are managed in terms of “use cases” which enables allocation of limited resources based on value.  It’s the Lean Startup approach to open data.  The concept is to minimize up front development, acquiring customers before you build the product.

As the use cases are completed, several things happen.  Outside of the actual work done on adding and improving datasets, both the specifications and the solution associated with the use cases are documented and made publicly available on the DDOD website.  Additionally, for the datasets involved and linkages enabled, we add or enhance relevant tagging, dataset-level metadata, data dictionary, cross-dataset relationships and long form dataset descriptions.  This approach, in turn, accelerates future discoveries of datasets.  And best of all, it stimulates the linking we wanted in the first place, through coded relationships and field-level matching. 

How does it fit into the big picture?

It’s beautiful how the pieces come together.  DDOD incorporates quite well with HHS’s existing Health Data Initiative (HDI) and  While DDOD is demand-driven from outside of HHS, you can think of HDI as its supply-driven counterpart.  That’s the one guided by brilliant subject matter experts throughout HHS.  Finally, is the data indexing and discovery platform that serves as a home for enabling both these components.  As a matter of fact, we’re looking for DDOD to serve as the community section of

Let’s roll!

So now the fun begins.  Next up…  More adventure as we work through actual pilot use cases.  We’ll also cover some cool potential components of DDOD that would put more emphasis on the “linkages” aspect of the project.  These include usage analytics, data maturity reporting, and semantic tagging of the dataset catalog and fields in the data dictionary.  Stay tuned.

 In the mean time, you can get involved in two ways…  Get the word out to your network about the opportunities provided by DDOD.  Or, if you have actual use cases to add, go to and get them entered.


CMS is enabling streamlined access to NPPES, PECOS, EHR to 3rd parties

I had a couple conversations this week with subject matter experts from industry and government about the NPPES and PECOS systems.

NPPES (National Plan and Provider Enumeration System) is a registry of healthcare providers, including their NPI (National Provider Identifier), specialty taxonomy and contact information.

PECOS (Medicare Provider Enrollment, Chain, and Ownership System) is a system that supports Medicare enrollment for providers and has its own similar database.

There seems to be a lot of demand for these systems to be:

  1. Kept up to date.  Currently NPPES is often too out of date to be useful for patients.  PECOS is updated more frequently, but isn’t available publicly.
  2. Easier to update.  One of the reasons NPPES is not updated often is the difficulty and overhead of doing so.  It would benefit greatly from an easier user interface, a public API and ability for surrogate 3rd parties to make updates.
  3. More realistic.  The data model for NPPES is much to simplistic to reflect the way providers currently do their work.  It should allow for many-to-many relationships between physicians, organizations and locations.
  4. Kept in sync. Discrepancies between NPPES and PECOS may be hard to resolve.  Sometimes it’s due to NPPES being out of date.  Other times it’s because the provider handles billing for Medicare differently .

First, my colleague and fellow HHS Entrepreneur-in-Residence, Alan Viars, has been leading a phenomenal effort to build a robust API for NPPES.  It was created as part of HHS IDEA Lab’s NPPES Modernization Project.  It’s designed to handle both efficient read access wanted by many applications and robust methods for making changes.  It was developed to focus on functionality and let external developers design beautiful user interfaces.

Second, CMS’s Identity & Access (I&A) Management System may help with some of these needs.  I&A is supposed to enable “streamlined access to NPPES, PECOS, and EHR” to both healthcare providers and their 3rd party surrogates.  There’s an introductory presentation on the topic that explains further:  That said, I still need to familiarize myself with it and its capabilities.


PS: In an effort to help people who had problems with the CMS website, I uploaded a video to YouTube that demonstrates how a 3rd party can request to work on behalf of a healthcare provider as a surrogate.

U.S. Turns to Private Sector for IT Innovation

Last week, the Department of Health and Human Services welcomed its third group of “entrepreneurs-in-residence” — mainly private-sector tech experts and start-up founders who are spending a year advising the agency on its health IT projects.   Read the full article in The Washington Post

HHS's IDEA Lab External Entrepreneurs

U.S. Turns to Private Sector for IT Innovation and HHS’s IDEA Lab External Entrepreneurs make it happen.
(Photo: David Portnoy, from left, Mark Scrimshire, Niall Brennan and Damon Davis working on project in the HHS IDEA Lab.)

HHS IDEA Lab – Innovative Linkages Initiative

HHS IDEA LabFull speed ahead on a bold new initiative from the HHS IDEA Lab called “Innovative Design, Development and Linkages of Databases“.

As the largest funder of biomedical research in the world, U.S. Department of Health and Human Services (HHS) directly and indirectly generates massive amounts of scientific data through research, grants, and contracts. The HHS Office of the Chief Information Officer and the HHS IDEA Lab want to build an innovative strategy to design, develop and link public-facing research database applications for the HHS.

The goal of this project is to create a solution to the U.S. Department of Health and Human Services’ (HHS)  current problem of multiple, disparate data sources that simultaneously meets the requirements of two new White House memoranda (Increasing Access to Results of Federally Funded Scientific Research and Open Data Policy – Managing Information as an Asset).

Case study in Linked Data and Semantic Web: The Human Genome Project

The National Human Genome Research Institute’s “GWAS Catalog” (Genome-Wide Association Studies) project is a successful implementation of Linked Data ( and Semantic Web ( concepts.  This article discusses how the project has been implemented, challenges faced and possible paths for the future.