Monthly Archives: July 2015

Provider network directory standards

Here’s my most recent contribution to the effort around deploying data interoperability standards for use with healthcare provider network directories.  The schema proposed for use by QHP (Qualified Health Plans) on health insurance marketplaces can be found on GitHub:  Designing an improved model for the provider directory and plan coverage standards required analysis of:

The data model now looks like this:

Background info on this topic can be found in the related DDOD article.

Vision of healthcare provider network directories


There are four pieces of information that U.S. consumers need to make informed choices about their healthcare insurance coverage.

  1. Directory: What are the healthcare provider demographics, including specialty, locations, hours, credentialing?
  2. Coverage: Does the provider take a particular insurance plan?
  3. Benefits: What are the benefits, copays and formularies associated with my plan?
  4. Availability: Is the provider accepting new patients for this particular insurance plan and location?

Without having these capabilities in place, consumers are likely to make uninformed decisions or delay decisions.  That in turn has significant health and financial impacts.


Healthcare provider directories have historically been supplied by the NPPES database.  But it has been lacking in terms of being accurate, up to date, or even able to represent reality accurately.  First, the overhead of making changes is quite high and there hasn’t been an easy way for a provider to delegate ability to make changes.  Second, the incentives aren’t there.  There are no penalties for abandoning updates and many providers don’t realize how frequently NPPES data is downloaded and propagated to consumer-facing applications.  Third, the data model is fixed by regulation, but it cannot accurately represent the many-to-many relationships among practitioners, groups, facilities and locations.  It also doesn’t adequately reflect the ability to manage multiple specialties and accreditations.

Incidentally, my work in the area of provider directories has been driven by the needs of DDOD.  Specifically, there were at least five DDOD use cases that directly depended on solving the provider directory problems.  But the actual problem extends well past the use cases.  An accurate and standardized “provider dimension” is needed for any type of analytics or applications involving providers.  That could include having access to insurance coverage information to analytics on utilization, open payments, fraud and comparative effectiveness research.

Addressing consumers need to understand their options in terms of coverage and benefits has historically been a challenge that’s yet to be solved.  There are routine complaints of consumers signing up for new coverage, only to find out that their provider doesn’t take their new plan or that they are not accepting patients for their plan.  These problems have been the driver for Insurance Marketplaces (aka, FFMs) instituting a new rule requiring QHPs (Qualified Health Plans) to publish machine readable provider network directories that are updated on at least a monthly basis.  This rule, which is effective open enrollment 2015 and the technical challenges around it are described in detail in the related DDOD discussion on provider network directories.  (Note that although the rule refers to “provider directories”, in reality it includes all 4 pieces of information.)  CMS already collects all this information from QHPs during the annual qualifications process.  It asks payers to submit template spreadsheets containing information about their plans, benefits and provider networks.

The seemingly simple question as to whether a provider is taking new patients has been a challenge as well.  That’s because the answer is both non-binary and volatile.  The answer might be different depending on insurance plan, type of referral, location and even time of day.  It may also fluctuate based on patient load, vacations and many other factors.  The challenged becomes even harder when you consider the fact that providers often don’t have the time or financial incentive to update this information with the payers.


Aneesh Chopra and I put together an industry workgroup to help determine how to best implement the QHP rule.  The workgroup spans the full spectrum of industry participants, payers, payer-provider intermediaries, providers and consumer applications.  It should be noted that we have an especially strong representation from payers and intermediaries, representing a substantial portion of the market.  While looking at the best ways to implement the rule from a technical and logistical perspective, we identified a missing leg: incentives.

3 pillars needed to reach critical mass for a new standard to become sustainable
Technology Logistics Incentives

The QHP rule and the specified data schema provides a starting point for the technology.  Workgroup participants also suggested how to use their organizations’ existing systems capabilities to fulfill the rule requirements.  We discussed logistics of how data can get moved from its multiple points of origin to CMS submission.

Through this exercise, it became quite clear that the implementation of the QHP mandate could make significant progress towards achieving its stated goals if certain actions are taken in another area — Medicare Advantage (MA).  That’s because, much of the data in the proposed standard originates with providers, rather than payers.  Such data typically includes provider demographics, credentialing, locations, and whether they’re accepting new patients.  But at this point, marketplaces are able to only exert economic pressure on payers.  MA, on the other hand, can leverage the STAR rating system to establish incentives for providers as well, which typically get propagated into provider-payer contracts.  STAR incentives are adjusted every year.  So it should be well within CMS’s ability to establish the desired objectives.  They can also leverage the CAHPS survey to measure the level of progress these efforts are making towards providing the necessary decision making tools to consumers.  At the moment, marketplaces don’t have any such metric.

It’s worth noting that Original Medicare (aka, Medicare FFS or Fee for Service) has an even stronger ability to create incentives for providers and I’ve been talking with CMS’s CPI group about publishing PECOS data to the new provider directory standard.  PECOS enjoys much more accurate and up to date provider data than NPPES, due to its use for billing.  But the PECOS implementation is not as challenging as its QHP counterpart in that we’re effectively publishing coverage for only one plan.  So complexities around plan coverage and their mapping to provider networks don’t apply.  But consumers still benefit from up to date provider information.


If we create incentive-driven solutions in the areas of Marketplaces, Medicare Advantage, Managed Medicaid, and Original Medicare, we might be able to solve the problems plaguing NPPES without requiring new regulation or a systems overhaul.  We will be including the vast majority of the practitioners across the U.S., almost all payers and deliver the needed information for consumers to make decisions about their coverage.

Finally, we are partnering with Google to leverage the timing of the QHP rule with a deployment of a compatible standard on  Doing so would help cement the standards around provider directories and insurance coverage even further.  It empowers healthcare providers and payers to publish their information in a decentralized manner.  Since updating information is so easy, it can happen more frequently.  Third party applications could pull this information directly from the source, rather than relying on a central body.  And the fact that search engines correctly interpret and index previously unstructured data means faster answers for consumers even outside of specialized applications.

Record matching on mortality data

I’m looking forward to teaming up with my HHS Entrepreneur-in-Residence cohorts Paula Braun and Adam Culbertson.  We have a “perfect storm” coming up, where all three of our projects are intersecting.  Paula is working on modernizing the nation’s mortality reporting capabilities.  Adam has been working with the HIMSS (Heath Information Management Society and Systems) organization to improve algorithms and methods for matching patient records.  And I, for the DDOD project, have been working on a use case to leverage NDI (National Death Index) for outcomes research.  So the goals of mortality system modernization, patient matching and outcomes research are converging.Patient Matching Exercise

To that end, Adam organized a hackathon at the HIMSS Innovation Center in Cleveland for August 2015.  This event throws in one more twist: the FHIR (Fast Healthcare Interoperability Resources) specification.  FHIR is a flexible standard for exchanging healthcare information electronically using RESTful APIs.  The hackathon intends to demonstrate what can be accomplished when experts from different domains combine their insights on patient matching and add FHIR as a catalyst.  The event is broken into two sections:

Section 1:  Test Your Matching Algorithms
Connect matching algorithms to a FHIR resource server containing synthetic patient resources.  The matching algorithms will be updated to take in FHIR patient resources and then perform a de-duplication of the records.  A final list of patient resources should be produced.  Basic performance metrics can then be calculated to determine the success of the matching exercise.  Use the provided tools, or bring your own and connect them up.Section 2:  Development Exercise
Develop applications that allow EHRs to easily update the status of patients who are deceased. A synthetic centralized mortality database, such as the National Death Index or a state’s vital statistics registry, will be made available through a FHIR interface.  External data sources, such as EHRs, will be matched against this repository to flag decedents. The applications should be tailored to deliver data to decision makers. This scenario will focus on how different use cases drive different requirements for matching.

Matching algorithms for patient recordsPatient matching and de-duplication is an important topic in EHRs (Electronic Health Records) and HIEs (Health Information Exchanges), where identifying a patient uniquely impacts clinical care quality, patient safety, and research results.  It becomes increasingly important as organizations exchange records electronically and patients seek treatment across multiple healthcare providers.   (See related assessment titled “Patient Identification and Matching Report” that was delivered to HHS’s ONC in 2014.)

We’re looking forward to reporting on progress on all three initiatives and the common goal.

This topic is covered on the HHS IDEA Lab blog:

Appendix: Background on patient matching

Additional challenges occur because real-world data often has errors, variations and missing attributes.  Common errors could include misspellings and transpositions.  Many first names in particular could be written in multiple ways, including variations in spelling, formality, abbreviations and initials.  In large geographies, it’s also common for there to be multiple patients with identical first and last names.

Data set Name Date of birth City of residence
Data set 1 William J. Smith 1/2/73 Berkeley, California
Data set 2 Smith, W. J. 1973.1.2 Berkeley, CA
Data set 3 Bill Smith Jan 2, 1973 Berkeley, Calif.

Although there’s a broad range of matching algorithms, they can be divided into two main categories:

  • Deterministic algorithms search for an exact match between attributes
  • Probabilistic algorithms score an approximate match between records

These are often supplemented with exception-driven manual review.  From a broader, mathematical perspective, the concept we’re dealing with is entity resolution (ER).  There’s a good introductory ER tutorial that summarizes the work in Entity Resolution for Big Data, presented at KDD 2013.  Although it looks at the discipline more generically, it’s still quite applicable to patient records.  It delves into the areas of Data Preparation, Pairwise Matching, Algorithms in Record Linkage, De-duplication, and Canonicalization.  To enabling scalability, it suggest use of Blocking techniques and Canopy Clustering    These capabilities are needed so often, that they may be built into commercial enterprise software.  IBM’s InfoSphere MDM (Master Data Management) is an example.

Metrics for patient matchingWhen comparing multiple algorithms for effectiveness, we have a couple good metrics: precision and recall.  Precision identifies how many of the matches were relevant, while recall identifies how many of the relevant items were matched.  F-Measure combines the two.  It should be noted that the accuracy metric, which is the ratio of items accurately identified to the total number of items, should be avoided.  It suffers from the “accuracy paradox”, where lower measures of accuracy may actually be more predictive


  • Precision:     p = TP/(TP+FP)
  • Recall:    r = TP/(TP+FN)
  • F-Measure =  2 p r / (p + r)
  • Accuracy:   a = TP+TN/(TP+TN+FP+FN)

In the long run, the challenge can also be approached from the other side.  In other words, how can the quality of data entry and storage within an organization be improved.  This approach could reap benefits in downstream matching, reducing the need for complex algorithms and improving accuracy.  AHIMA published a primer on Patient Matching in HIEs, in which they go as far as calling for a nationwide standard that would facilitate more accurate matching.  They suggest standardizing on commonly defined demographic elements, eliminating use of free text entry except for proper names, and ensuring multiple values aren’t combined in single fields.

Using DDOD to identify and index data assets

Part of implementing the Federal Government’s M-13-13 “Open Data Policy – Managing Information as an Asset” is to create and maintain an Enterprise Data Inventory (EDI).   EDI is supposed to catalog government-wide SRDAs (Strategically Relevant Data Assets).  The challenge is that the definition of an SRDA is subjective within the context of an internal IT system, there’s not enough budget to catalog the huge number of legacy systems, and it’s hard to know when you’re done documenting the complete set.

Enter DDOD (Demand-Driven Open Data).  While it doesn’t solve these challenges directly, its practical approach to managing open data initiatives certainly can improve the situation.  Every time an internal “system of record” is identified for a DDOD Use Case, we’re presented with a new opportunity to make sure that an internal system is included in the EDI.  Already, DDOD has been able to identify missing assets.

DDOD helps with EDI and field-level data dictionary

But DDOD can do even better.  By focusing on working one Use Case at a time, we provide the opportunity to catalog the data asset to a much more granular level.  The data assets on and are catalog at the dataset level, using the W3C DCAT (Data Catalog) Vocabulary.  The goal is to catalog datasets associated with DDOD Use Cases at the field-level data dictionary level.  Ultimately, we’d want to get attain a level of sophistication at which we’re semantically tagging fields using controlled vocabularies.

Performing field-level cataloging all this has a couple important advantages.  First, in enables better indexing and more sophisticated data discovery on and other HHS portals.  Second, it identifies opportunities to link across datasets from different organizations and even across different domains.  The mechanics of DDOD in relation to EDI,, data discoverability and linking is further explained at the Data Owners section of the DDOD website.

Note: HHS EDI is not currently available as a stand-alone data catalog.  But it’s incorporated into, because this catalog includes all 3 types of access levels: public, restricted public, and non-public datasets.

Obtaining data on cost of FDA drug approval process

To follow up on the post describing Investment Model for Pharma…   We’re working on obtaining data on cost of FDA drug approval process via DDOD (Demand-Driven Open Data).  Use Case 34: Cost of drug approval process describes this effort.  It identifies the drivers and value of obtaining this data in informing policy.  The writeup identifies several data sources and how to go about using them.  The information provided has come from discussions with FDA’s CDER Office of Strategic Programs (OSP).

Data sources identified:

  • IND activity: Distinct count of new INDs (Investigational New Drug) received during the calendar year and previously received INDs which had an incoming document during the same period: INDs with Activity page
  • PDUFA reports: The Prescription Drug User Fee Act (PDUFA) requires FDA to submit two annual reports to the President and the Congress for each fiscal year: 1) a performance report and 2) a financial report
  • FTE reports: Statistics on number of FDA employees and grade levels
  • might provide glimpses into drug approval activity, although it’s not complete (especially for Phase 1 trials) and mixes in non-IND trials.
  • Citeline has counts of active compounds under development, including breakdown by Phase

As more users come forward to identify specifics of how they need to use the data, there’s an opportunity to refine the use case and focus efforts on obtaining data not yet available.