Category Archives: standards

Leveraging healthcare data for consumer solutions

On April 23, 2016, over 300 developers from around the country descended on San Francisco for the weekend to tackle some of the hardest challenges facing the nation.  The event was called BayesHack, sponsored by the nonprofit Bayes Impact.  There were representatives from 7 cabinet-level federal agencies present to set up the 11 “prompts”, mentor the teams and judge the entries.  The prompts for the U.S. Department of Health and Human Services and the Department of Veterans Affairs asked challenging questions on how to leverage existing datasets…

  • How can data connect individuals with the health providers they need?
  • How can data get help to sufferers of opioid addiction?
  • How can data predict and prevent veteran suicide?
  • How can data tackle End Stage Renal Disease (ESRD) and Chronic Kidney Disease (CKD)?

 

As part of the judging process, the teams had to pitch their solutions to both agencies and private sector judges, such as partners at Andreessen Horowitz.  All teams submitted their code to the event’s github account, so that it could be used for judging, as well as ensuring that it will be available in the public domain.    For hackathons such as this one, it’s important to recognize that even if there are already similar commercial products, getting solutions into the public domain makes it possible for others build on later.  (Incidentally, this focus on actual working prototypes via GitHub is surprisingly lacking from many hackathons.  Bayes did a great job focusing on potential implementation beyond just the weekend.)

Of particular focus was the “How can data connect individuals with the health providers they need?” prompt, since this data has only recently become available, due to a regulatory requirement.  This data consisted of commercial healthcare provider networks for plans on ACA insurance marketplaces, including plan coverage, practice locations, specialties, copays and drug formularies.  There were 7 team submissions, most of which produced solutions focusing on usability for consumers and advanced analytics to policy makers.  Some teams expanded the scope to include not just insurance selection, but access to care in general.

To summarize some of the novel ideas in the solutions…

  • Simplified mobile-first user experience, resembling TurboTax for health selection
  • Visualizations and what-if analysis for policy makers
  • Voice recognition and NLP, as in Google freeform search instead of menus and buttons
  • Ranking algorithms and recommendation engines
  • Ingesting additional 3rd party information (such as Vitals, Yelp, and Part D claims) for consumers who need additional information before they can make an informed choice
  • Providing an API for other apps to leverage
  • Enabling self-reporting of network accuracy, like GasBuddy for health plan coverage

Here are some notable entries for this prompt:

The Hhs-marketplace team created an app that leverages chart visualizations to let a consumer compare plan attributes against benchmarks, such as state averages.  The example below shows a user entering a zip code and the specialists they’re interested in seeing.  The app finds the plans that meet those criteria, displays cost comparisons for them and a graphical comparison of the options.

 

The Fhir salamander team created mobile-first responsive web front end that takes the user through a series of simple menu choices to get them to recommended plans.  Along the way, for convenience and efficiency, it enables the user to click a button to place a telephone call to the plans (to ensure that the doctor they want is taking new patients from that plan) or to view the summary plan description files.

In working on the challenge the team transcribed the JSON provider network schema into a relational model.  They reported identifying data quality issues and therefore needing to clean up the raw data in order to use it for analytics.  They also generated state-level statistics to assist in comparison.  The app is written in Javascript, while the analytics are in Python.  They feel that the relational model, code to load it and the code to clean up the data could be reused elsewhere.  While the AWS website (http://tiny.cc/bayeshhs_fsdemo) is no longer live, the deck is available (http://tiny.cc/bayeshhs_fs).

The Hhs insights team produced an interactive provider density map.  Their approach was to target policy makers, rather than consumers.  For that purpose, they built aggregate analytics and related visualizations.  For example, their code uses DOL MSA (Metro Statistical Area) for GeoJSON calculations and visualizations.  In order to enable the needed analytics, they had to take on the challenge of normalizing the JSON schema of provider networks into a tabular format, as well as pre-calculating several aggregate metrics.

The Hhs marketplace finder team created an app that displays the pros-cons of the top 5 plan option for the
user, along with visualizations for making quantitative comparisons easy to understand.  Bad choices are suppressed to avoid screen clutter.  It starts with less than 10 simple questions.  Then adds a prediction of the user’s healthcare needs, which was determined based on statistics by age, gender, preexisting conditions and location.  Finally, it would eventually make it possible for a user to estimate their total cost based on different events, such as hospitalization or illness.
A data science team from Berkeley calling themselves Semantic Search, submitted an extremely ambitious

project.  Basically, creation of a Google Pagerank for healthcare decisions.  Instead of the menus and buttons of a traditional app UI, this solution used a freeform field for a user to indicate what they were looking for.  The goal is to let a consumer who is not tech saavy explain their situation in a natural way, without the interface and technology getting in the way.  Under the covers it uses natural language processing, ranking algorithms and a recommendation engine.  The user is ultimately presented with the top couple plans, along with explanations of why they were recommended.  To make the solution possible, this app has to collect behavioral data logs, use logistic regression to predict the probability that a certain plan would work, and leverage the LETOR ranking mechanism to provide answers.
As an interesting side note, a Schema.org standard for U.S. health insurance networks has recently been adopted.  Eventually, medical groups and insurance companies can publish semantically tagged information directly to the web, bypassing the current single point of collection at CMS.  This would allow for a growth of relevant data that could be used by applications like this one.

 

 

Disclaimer: The challenge prompt used for HHS does not constitute the department’s official stance or endorsement of this activity.  It was used in an unofficial capacity only and intended to take advantage of data newly available from industry due to changes in regulations of the health insurance marketplace.

 

Schema.org publishes health plan and provider network schemas

Some good news on healthcare standards

I have been working with the Google semantic web group for many months to design several schemas that represent healthcare provider networks and health insurance plan coverage.  The good news is that these schemas have been officially published for use with Schema.org.  This is the first step towards a wider adoption for a more consistent designation for this type of information.  The schemas are:

Health Insurance Plan: List of health plans and their corresponding network of providers and drug formularies http://pending.webschemas.org/HealthInsurancePlan
Health Plan Network: Defines a network of providers within an health plan. http://pending.webschemas.org/HealthPlanNetwork
Health Plan Cost Sharing Specification: List of costs to be paid by the covered beneficiary. http://pending.webschemas.org/HealthPlanCostSharingSpecification
Health Plan Formulary: Lists of drugs covered by health plan. http://pending.webschemas.org/HealthPlanFormulary

Now for the background…

In November 2015, the US health agency Centers for Medicare & Medicaid Services (CMS) enacted a new regulatory requirement for health insurers who list plans on insurance marketplaces. They must now publish a machine-readable version of their provider network directory and health plan coverage, publish it to a specified JSON standard, and update it at least monthly. Many major health insurance companies across the US have already started to publish their health plan coverage, provider directories and drug formularies to this standard.

The official schema is kept in a GitHub Repository: https://github.com/CMSgov/QHP-provider-formulary-APIs.  This format makes it possible to see how which changes were made and when.  It also has an issues section to facilitate ongoing discussion about the optimal adoption of the standard.  There’s a website that goes into a more detailed explanation on the background of this effort: https://www.cms.gov/CCIIO/Resources/Data-Resources/marketplace-puf.html.

This website also includes the “Machine-readable URL PUF” seed file” to the actual data that have been published by insurance company.  This file contains URLs that can be crawled to aggregate the latest plan and provider data.

In terms of adoption, U.S. health plans that participate in insurance markeplaces have published: *

  • 39 states
  • 398 health plans
  • ~26,000 URLs describing insurance coverage, provider networks, drug formularies

* Updated November 2016


A group of companies representing the provider, payer and consumer segments of healthcare convened to discuss the standard throughout 2015.  The considerations that went into formation of the standard can be found at: http://ddod.healthdata.gov/wiki/Interoperability:_Provider_network_directories

Open Referral standard

Open_ReferralThe DDOD program is currently assisting the proponents of a new open standard for publishing human services, called Open Referral.  In order for us to be able to justify the promotion of this standard and publication of data to it, we’re first looking to develop clear and concise use cases.

The Background

Open Referral is a standard that originally came out of a Code for America initiative a couple years ago, with the goal of automating the updating of human services offered across many programs.  Doing so would not only make offered services more discoverable, but also lower the cost of administration for the service providers and referring organizations.

The Problem: A landscape of siloed directories

It’s hard to see the safety net. Which agencies provide what services to whom? Where and how can people access them? These details are always in flux. Nonprofit and government agencies are often under-resourced and overwhelmed, and it may not be a priority for them to push information out to attract more customers.

So there are many ‘referral services’ — such as call centers, resource directories, and web applications — that collect directory information about health, human, and social services. However, these directories are all locked in fragmented and redundant silos. As a result of this costly and ineffective status quo:

  • People in need have difficulty discovering and accessing services that can help them live better lives.
  • Service providers struggle to connect clients with other services that can help meet complex needs.
  • Decision-makers are unable to gauge the effectiveness of programs at improving community health.
  • Innovators are stymied by lack of access to data that could power valuable tools for any of the above.  

– Source: Open Referral project description

For potential use cases, there have been a small handful of government programs identified as potential pilots.  These include:

 

The Competition

Open Referral is not without competing standards.  In fact, the AIRS/211 Taxonomy is already widely used among certified providers of information and referral services, such as iCarol.  However, AIRS/211 has two drawbacks in comparison with Open Referral.  

First, it’s not a free and open standard.  While there are sample PDFs available for parts of the taxonomy, a full spec requires a subscription.

“If you wish to evaluate the Taxonomy prior to subscribing, you can register for evaluation purposes and have access to the full Taxonomy for a limited period of time through the search function. ”  – Source: UAIRS/211 Download page and Subscription page

The taxonomy also requires an annual license fee, which could be a challenge to continue funding in perpetuity for government and nonprofit organizations.

“Organizations need a license to engage in any use of the Taxonomy.”
— Source: AIRS/211 Subscription page

Second, the AIRS/221 taxonomy if highly structured and extensive.  While that has advantages for consistency and interoperability, it raises other challenges.  It leads to a high learning curve and therefore sets potentials barriers for organizations without technical expertise.  Open Referral states that it is a more lightweight option.

It should also be noted that there’s a CivicServices schema defined for use with  Schema.org.  Its approach is to embed machine-readable “Microdata” throughout human-readable HTML web pages.  Schema.org standards are intended to be interpreted by web engines like Google, Bing and Yahoo when indexing a website.  That said, the degree of adoption for CivicServices in particular – from either search engines or information publishers – is unclear at this point.

 

Onward!

In concept, the Open Referral standard would lower the cost and lag time for organizations to update relevant services for their constituents.  The standard is being evangelized by Greg Bloom, who has started with Code for America and has been reaching out to organizations who would be consuming this data (such as Crisis Text Line, Purple Binder and iCarol) for the purpose of defining a compelling use case.

There’s a DDOD writeup on this topic at “Interoperability: Directories of health, human and social services”, intended to facilitate creation of practical use cases.

 

 

Further reading…

Additional information on Open Referral can be found at:

Provider Network Directories on FHIR

FHIR logoI’ve done a lot of work on designing provider network directory schemas.  Much of it is described in this blog (“provider directories” tag) and in the related “Interoperability” entry on the DDOD website.  But so far, the effort has been focused on designing a standard data schema that could adequately represent the way the healthcare industry currently operates in terms of provider networks and health insurance coverage.  Now I’d like to highlight an important factor that’s been overlooked: the mechanics of moving this data between systems.

Simplified provider network directory modelIn their recent machine readability requirement for insurance issuers on health insurance marketplaces, CMS/CCIIO did not specify the transport mechanism for the QHP schema. The only requirement is to register the URL containing the data with HIOS (Health Insurance Oversight System). The URLs could be to a static page or to a dynamic RESTful query. I’d like to point out that CMS or third party services have an opportunity to provide significant value to both consumer applications and transaction oriented systems by adding a RESTful FHIR layer. Ideally, this would be done in front of globally aggregated datasets that have been registered in HIOS.  The resulting FHIR API would have resource types of Provider, Network and Plan, which correspond to the JSON files of the QHP provider directory schema.  The most relevant resource type

Much of the usefulness for machine readable provider network requirement is around enabling consumers to ask certain common questions when they need to select an insurance plan. (For example: Which insurance plans is my doctor in? Is she taking new patients at a desired facility under a particular plan? What plans have the specialists I need in a specific geographic region?) These questions could easily translate to FHIR queries using the Search interaction on any of the defined resource types.  With required monthly updates and potentially frequent changes in network and provider demographics, there are also use cases that benefit from availability of the History interaction, either as a type-level change log or an instance-level version view.  Additionally, by adding search parameters, response record count limits, and pagination in front of network directory datasets, load from traffic on aggregated data servers could be much more efficient.

NPPES on FHIR serverI set up a server with an example of a FHIR API implemented for provider directories, although limited to NPPES data model.  A big thanks to Dave McCallie for creating and sharing the original codebase: GitHub.com/ DavidXPortnoy/ nppes_fhir_demo.  You can find the live non-production sandbox version here: http://fhir-dev.ddod.us:8080/nppes_fhir.  Here are a few sample queries you can run against it:

I’m working on expanding the functionality of this server to accommodate the full provider network directory schema, including components of provider demographics, facilities, organizations, credentialing, insurance plans, plan coverage, and formularies.

 


Edit 10/2015: It should be said that my HHS Entrepreneur-in-Residence colleague, Alan Viars, has led an effort to build a robust API for NPPES for HHS IDEA Lab’s NPPES Modernization Project.  It’s designed to handle both efficient read access wanted by many applications and robust methods for making changes.  Although initially it focused on providing the simplest purpose built API possible, Alan is now looking at creating a version that would be based on FHIR practices.


Additional FHIR server implementations

The current FHIR server is quite simple.  It’s implemented using Python, Elasticsearch as document store for NPPES records, Flask as Python web server, and Gunicorn as WSGI web gateway.  Let’s call it the Flask-ElasticSearch implementation.   There are a couple other more popular alternatives.

It seems that the most active FHIR open source codebase is HAPI, located at https://github.com/jamesagnew/hapi-fhir.  It’s managed by James Agnew at University Health Network.  This is a Java / Maven library for creating both FHIR servers and clients.  Its ability to easily bolt FHIR onto any database makes it ideal for extending the API to existing applications.  It also enables existing apps to connect to other FHIR servers as a client.  This codebase is quite full featured, supporting all current FHIR resource types, most operations, and both XML and JSON encodings.  Relative to other alternatives, it’s well documented as well.  There’s a live demo project available: http://fhirtest.uhn.ca/

Finally, FHIRbase, located at https://github.com/fhirbase/fhirbase, is a relational storage server for FHIR with a document API.  It uses PostgreSQL as the relational database engine and written in PLpgSQL.  FHIRplace, located at https://github.com/fhirbase/fhirplace, provides a server that accesses FHIRbase.  It’s written in Clojure, Node.js, and JavaScript.  And like HAPI, it supports all current FHIR resource types, operations, and both XML and JSON encodings.

There are also a surprisingly large number of Windows-based FHIR servers that I haven’t considered, due to a desire to stay on non-proprietary platforms.  Although perhaps it shouldn’t be that surprising given the Windows heavy history of EHR and other healthcare apps.

 

Provider network directory standards

Here’s my most recent contribution to the effort around deploying data interoperability standards for use with healthcare provider network directories.  The schema proposed for use by QHP (Qualified Health Plans) on health insurance marketplaces can be found on GitHub:  https://github.com/CMSgov/QHP-provider-formulary-APIs.  Designing an improved model for the provider directory and plan coverage standards required analysis of:

The data model now looks like this:

Background info on this topic can be found in the related DDOD article.

Vision of healthcare provider network directories

Background

There are four pieces of information that U.S. consumers need to make informed choices about their healthcare insurance coverage.

  1. Directory: What are the healthcare provider demographics, including specialty, locations, hours, credentialing?
  2. Coverage: Does the provider take a particular insurance plan?
  3. Benefits: What are the benefits, copays and formularies associated with my plan?
  4. Availability: Is the provider accepting new patients for this particular insurance plan and location?

Without having these capabilities in place, consumers are likely to make uninformed decisions or delay decisions.  That in turn has significant health and financial impacts.

Problem

Healthcare provider directories have historically been supplied by the NPPES database.  But it has been lacking in terms of being accurate, up to date, or even able to represent reality accurately.  First, the overhead of making changes is quite high and there hasn’t been an easy way for a provider to delegate ability to make changes.  Second, the incentives aren’t there.  There are no penalties for abandoning updates and many providers don’t realize how frequently NPPES data is downloaded and propagated to consumer-facing applications.  Third, the data model is fixed by regulation, but it cannot accurately represent the many-to-many relationships among practitioners, groups, facilities and locations.  It also doesn’t adequately reflect the ability to manage multiple specialties and accreditations.


Incidentally, my work in the area of provider directories has been driven by the needs of DDOD.  Specifically, there were at least five DDOD use cases that directly depended on solving the provider directory problems.  But the actual problem extends well past the use cases.  An accurate and standardized “provider dimension” is needed for any type of analytics or applications involving providers.  That could include having access to insurance coverage information to analytics on utilization, open payments, fraud and comparative effectiveness research.

Addressing consumers need to understand their options in terms of coverage and benefits has historically been a challenge that’s yet to be solved.  There are routine complaints of consumers signing up for new coverage, only to find out that their provider doesn’t take their new plan or that they are not accepting patients for their plan.  These problems have been the driver for Insurance Marketplaces (aka, FFMs) instituting a new rule requiring QHPs (Qualified Health Plans) to publish machine readable provider network directories that are updated on at least a monthly basis.  This rule, which is effective open enrollment 2015 and the technical challenges around it are described in detail in the related DDOD discussion on provider network directories.  (Note that although the rule refers to “provider directories”, in reality it includes all 4 pieces of information.)  CMS already collects all this information from QHPs during the annual qualifications process.  It asks payers to submit template spreadsheets containing information about their plans, benefits and provider networks.

The seemingly simple question as to whether a provider is taking new patients has been a challenge as well.  That’s because the answer is both non-binary and volatile.  The answer might be different depending on insurance plan, type of referral, location and even time of day.  It may also fluctuate based on patient load, vacations and many other factors.  The challenged becomes even harder when you consider the fact that providers often don’t have the time or financial incentive to update this information with the payers.

Approach

Aneesh Chopra and I put together an industry workgroup to help determine how to best implement the QHP rule.  The workgroup spans the full spectrum of industry participants, payers, payer-provider intermediaries, providers and consumer applications.  It should be noted that we have an especially strong representation from payers and intermediaries, representing a substantial portion of the market.  While looking at the best ways to implement the rule from a technical and logistical perspective, we identified a missing leg: incentives.

3 pillars needed to reach critical mass for a new standard to become sustainable
Technology Logistics Incentives

The QHP rule and the specified data schema provides a starting point for the technology.  Workgroup participants also suggested how to use their organizations’ existing systems capabilities to fulfill the rule requirements.  We discussed logistics of how data can get moved from its multiple points of origin to CMS submission.

Through this exercise, it became quite clear that the implementation of the QHP mandate could make significant progress towards achieving its stated goals if certain actions are taken in another area — Medicare Advantage (MA).  That’s because, much of the data in the proposed standard originates with providers, rather than payers.  Such data typically includes provider demographics, credentialing, locations, and whether they’re accepting new patients.  But at this point, marketplaces are able to only exert economic pressure on payers.  MA, on the other hand, can leverage the STAR rating system to establish incentives for providers as well, which typically get propagated into provider-payer contracts.  STAR incentives are adjusted every year.  So it should be well within CMS’s ability to establish the desired objectives.  They can also leverage the CAHPS survey to measure the level of progress these efforts are making towards providing the necessary decision making tools to consumers.  At the moment, marketplaces don’t have any such metric.

It’s worth noting that Original Medicare (aka, Medicare FFS or Fee for Service) has an even stronger ability to create incentives for providers and I’ve been talking with CMS’s CPI group about publishing PECOS data to the new provider directory standard.  PECOS enjoys much more accurate and up to date provider data than NPPES, due to its use for billing.  But the PECOS implementation is not as challenging as its QHP counterpart in that we’re effectively publishing coverage for only one plan.  So complexities around plan coverage and their mapping to provider networks don’t apply.  But consumers still benefit from up to date provider information.

Vision

If we create incentive-driven solutions in the areas of Marketplaces, Medicare Advantage, Managed Medicaid, and Original Medicare, we might be able to solve the problems plaguing NPPES without requiring new regulation or a systems overhaul.  We will be including the vast majority of the practitioners across the U.S., almost all payers and deliver the needed information for consumers to make decisions about their coverage.

Finally, we are partnering with Google to leverage the timing of the QHP rule with a deployment of a compatible standard on Schema.org.  Doing so would help cement the standards around provider directories and insurance coverage even further.  It empowers healthcare providers and payers to publish their information in a decentralized manner.  Since updating information is so easy, it can happen more frequently.  Third party applications could pull this information directly from the source, rather than relying on a central body.  And the fact that search engines correctly interpret and index previously unstructured data means faster answers for consumers even outside of specialized applications.

Field-level data dictionaries for open data

Typically, publicly available open data repositories — especially being hosted or indexed via CKAN — have been described only at the dataset level.  Meaning, typically datasets are described in a DCAT-compatible schema.  This includes the metadata schema required by Project Open Data for Data.gov and all agency-specific data hosting websites.

But ideally, the cataloging of these datasets should move to a more granular level of detail: field-level.  Doing so, makes it possible for search capabilities to go well beyond the typical tags and predefined categories.  With fields defined, we can quickly find all datasets that have common fields.  That in turn makes it easier to find opportunities for linking across datasets and allows for a related dataset recommendation engine.  The solution becomes even more powerful if the fields are labeled with a predefined semantic vocabulary — that is globally uniquely defined.  (See approach described in Health2.0 Metadata Challenge.)

One challenge to this goal is that CKAN has not historically been good at defining a standard, machine readable data dictionary.  We’ve examined a range of standards and suggestions for defining data dictionaries.  These include common SQL DDL, XML, JSON, and YAML formats.

* ANSI SQL Standard 
   - DDL (Data Definition Language): "CREATE TABLE"
   - SQL/Schemata
 
        ```
        testdb-# \d company
                    Table "public.company"
          Column   |     Type      | Modifiers
        -----------+---------------+-----------
         id        | integer       | not null
         name      | text          | not null
         address   | character(50) |
         join_date | date          |
        Indexes:
            "company_pkey" PRIMARY KEY, btree (id)
        ```

* JSON Table Schema: http://dataprotocols.org/json-table-schema/

    ```
    "schema": {
      "fields": [
        {
          "name": "name of field (e.g. column name)",
          "title": "A nicer human readable label or title for the field",
          "type": "A string specifying the type",
        },
        ... more field descriptors
      ],
      "primaryKey": ...
      "foreignKeys": ...
    }
    ```

* YAML schema files used for Doctrine ORM: http://doctrine.readthedocs.org/en/latest/en/manual/yaml-schema-files.html

* XML schema syntax for Google's DSPL (Dataset Publishing Language): https://developers.google.com/public-data/docs/schema/dspl9

* W3 XML Schema: http://www.w3.org/TR/xmlschema-2/#built-in-primitive-datatypes


## CSV storage formats
* Open Knowledge Data Packager - CKAN Extension http://ckan.org/2014/06/09/the-open-knowledge-data-packager/

* Tabular Data Package Spec: http://dataprotocols.org/tabular-data-package/

* The above two are also part of a W3C standards track:
http://www.w3.org/TR/tabular-data-model/

 

Enter the all powerful CSV

CSV format often a desired format for it’s high interoperability.  However, it suffers from the fact that we need to keep its metadata separately defined.  This in turn causes challenges in version control, broken links and correctly identifying the column order.  There’s also the all-too-common and annoying test that has to be performed to determine if the first row is data or column header.

So is there an elegant, machine-readable, standard-ish way to embed the metadata within the data file itself?  OKFN suggests that the solution could be accomplished via Tabular Data Packages.  Basically, you have the option to provide the data “inline” directly in the datapackage.json file.  The data would be in addition to specifying the full schema (as per JSON Table Schema) and CSV dialect (as per CSVDDF Dialect specification) in the same file.  We just need to have simple scripts that eventually extract these components into separate CSV files and JSON Table Schema.  Open Knowledge Data Packager is a CKAN extension that makes use of JSON Table Schema and Tabular Data Package for hosting datasets on CKAN FileStore.

Finally, there’s a helpful article on Implementing CSV on the Web and W3C’s CSV working group is seeking feedback on model and vocabulary for tabular data.

 

Is “SchemaStore” CKAN’s mystical unicorn?

As mentioned previously, CKAN hasn’t been strong in storing and managing standard, machine readable data dictionaries.  So a special shout out goes to Greg Lawrence, who has figured out how to solve this limitation.  He’s built a CKAN “SchemaStore” and a custom Java app to index content into CKAN’s DataStore object.  It grabs the needed information by running SQL exports on Oracle tables.  The code that enables SchemaStore is incorporated into the BC Data Catalogue CKAN extension on GitHub.  The field tags are defined in the edc_datasets.py file of this repository.

An example of the SchemaStore implementation can be found in this sample dataset under the “Object Description” section.  Here you’re able to see all of the relevant elements from the Oracle table object: Column Name, Short Name, Data Type, Data Precision, and Comments.  The data dictionary for this dataset is in machine readable JSON format.  For example, the first 3 fields of the data dictionary are:

details: [
  {
    data_precision: "0",
    column_comments: "The date and time the information was entered.",
    data_type: "DATE",
    short_name: "TIMESTAMP",
    column_name: "ENTRY_TIMESTAMP"
  },
  {
    data_precision: "0",
    column_comments: "The identification of the user that created the initial record.",
    data_type: "VARCHAR2",
    short_name: "ENT_USR_ID",
    column_name: "ENTRY_USERID"
  },
  {
    data_precision: "0",
    column_comments: "A feature code is most importantly a means of linking a features to its name and definition.",
    data_type: "VARCHAR2",
    short_name: "FEAT_CODE",
    column_name: "FEATURE_CODE"
  }, ...
],
See related issue for Data.gov (“Make field level metadata searchable and link common fields across the catalog”):  https://github.com/GSA/data.gov/issues/640

CMS is enabling streamlined access to NPPES, PECOS, EHR to 3rd parties

I had a couple conversations this week with subject matter experts from industry and government about the NPPES and PECOS systems.

NPPES (National Plan and Provider Enumeration System) is a registry of healthcare providers, including their NPI (National Provider Identifier), specialty taxonomy and contact information.

PECOS (Medicare Provider Enrollment, Chain, and Ownership System) is a system that supports Medicare enrollment for providers and has its own similar database.

There seems to be a lot of demand for these systems to be:

  1. Kept up to date.  Currently NPPES is often too out of date to be useful for patients.  PECOS is updated more frequently, but isn’t available publicly.
  2. Easier to update.  One of the reasons NPPES is not updated often is the difficulty and overhead of doing so.  It would benefit greatly from an easier user interface, a public API and ability for surrogate 3rd parties to make updates.
  3. More realistic.  The data model for NPPES is much to simplistic to reflect the way providers currently do their work.  It should allow for many-to-many relationships between physicians, organizations and locations.
  4. Kept in sync. Discrepancies between NPPES and PECOS may be hard to resolve.  Sometimes it’s due to NPPES being out of date.  Other times it’s because the provider handles billing for Medicare differently .

First, my colleague and fellow HHS Entrepreneur-in-Residence, Alan Viars, has been leading a phenomenal effort to build a robust API for NPPES.  It was created as part of HHS IDEA Lab’s NPPES Modernization Project.  It’s designed to handle both efficient read access wanted by many applications and robust methods for making changes.  It was developed to focus on functionality and let external developers design beautiful user interfaces.

Second, CMS’s Identity & Access (I&A) Management System may help with some of these needs.  I&A is supposed to enable “streamlined access to NPPES, PECOS, and EHR” to both healthcare providers and their 3rd party surrogates.  There’s an introductory presentation on the topic that explains further: http://www.cms.gov/Outreach-and-Education/Outreach/NPC/Downloads/508-IA-Call-FINALDM2.pdf.  That said, I still need to familiarize myself with it and its capabilities.

 

PS: In an effort to help people who had problems with the CMS website, I uploaded a video to YouTube that demonstrates how a 3rd party can request to work on behalf of a healthcare provider as a surrogate.