Tag Archives: semantic web

Schema.org publishes health plan and provider network schemas

Some good news on healthcare standards

I have been working with the Google semantic web group for many months to design several schemas that represent healthcare provider networks and health insurance plan coverage.  The good news is that these schemas have been officially published for use with Schema.org.  This is the first step towards a wider adoption for a more consistent designation for this type of information.  The schemas are:

Health Insurance Plan: List of health plans and their corresponding network of providers and drug formularies http://pending.webschemas.org/HealthInsurancePlan
Health Plan Network: Defines a network of providers within an health plan. http://pending.webschemas.org/HealthPlanNetwork
Health Plan Cost Sharing Specification: List of costs to be paid by the covered beneficiary. http://pending.webschemas.org/HealthPlanCostSharingSpecification
Health Plan Formulary: Lists of drugs covered by health plan. http://pending.webschemas.org/HealthPlanFormulary

Now for the background…

In November 2015, the US health agency Centers for Medicare & Medicaid Services (CMS) enacted a new regulatory requirement for health insurers who list plans on insurance marketplaces. They must now publish a machine-readable version of their provider network directory and health plan coverage, publish it to a specified JSON standard, and update it at least monthly. Many major health insurance companies across the US have already started to publish their health plan coverage, provider directories and drug formularies to this standard.

The official schema is kept in a GitHub Repository: https://github.com/CMSgov/QHP-provider-formulary-APIs.  This format makes it possible to see how which changes were made and when.  It also has an issues section to facilitate ongoing discussion about the optimal adoption of the standard.  There’s a website that goes into a more detailed explanation on the background of this effort: https://www.cms.gov/CCIIO/Resources/Data-Resources/marketplace-puf.html.

This website also includes the “Machine-readable URL PUF” seed file” to the actual data that have been published by insurance company.  This file contains URLs that can be crawled to aggregate the latest plan and provider data.

In terms of adoption, U.S. health plans that participate in insurance markeplaces have published: *

  • 39 states
  • 398 health plans
  • ~26,000 URLs describing insurance coverage, provider networks, drug formularies

* Updated November 2016

A group of companies representing the provider, payer and consumer segments of healthcare convened to discuss the standard throughout 2015.  The considerations that went into formation of the standard can be found at: http://ddod.healthdata.gov/wiki/Interoperability:_Provider_network_directories

What Happened to the Semantic Web?

It looks bleak

Over the past few years, there have been questions asked about the viability of the Semantic Web (aka, SemWeb) envisioned by Tim Berners-Lee.  In the strictest sense, the original standards set out by the W3C have not proliferated at any great pace and have not been widely adopted commercially. There are also no multi-billion dollar acquisitions or IPOs in the SemWeb space.  Even in government and academia, the vast majority of “open data” is in traditional relational form (rather than RDF linked datasets) and don’t reference widely adopted ontologies.

Evidence of decline?


But it’s a matter of framing

The outlook changes drastically if we look at the question a bit differently. Rather than defining the SemWeb as the original set of standards or narrow vision, what if we look at related technologies that it may have spawned or influenced.  Now a number of success stories emerge.  We have the tremendous growth of Schema.org and adoption of Microdata among the 3 big search engines: Google, Yahoo, and Bing.  We also have SemWeb concepts applied in Google’s Knowledge Graph, Google Rich Data Snippets, and Facebook Social Graph.  Even IBM’s Watson is no longer just an IBM Research project.  It’s being commercialized into IBM’s verticals, including healthcare, insurance and finance. So SemWeb technologies are alive — in a sense.  For the purpose of clarity, let’s refer to the original W3C vision discussed since 2001 as the “old SemWeb” and the recent commercial successes as the “new SemWeb”.  Of course, these are fuzzy definitions, since the new SemWeb is not formally defined.


What’s wrong with the original vision?

The W3C breaks the elements of the old SemWeb into: (1) Linked Data, (2) Vocabularies, (3) Inference, and (4) Query.  Each of which are widely in use today, but in a way that’s different from original specs.  For example, linked data implemented as Microdata or JSON-LD has gained popularity over the heavier and more verbose RDF/XML.  Most websites forgo formally defined OWL ontologies for vocabularies found on databases like Schema.org or Freebase.  Rule engines and reasoners are already built into products we use.  It’s what happens in the “brains” of Google’s page rank and ad optimization algorithms.  And instead of the SPARQL query language, humans interact often interact with the new SemWeb through natural language searches, while machines through RESTful APIs.  With IBM’s Watson translates questions into sophisticated queries involving federation and inference against its knowledge base.

There are a couple other difficulties with the old SemWeb worthy of noting.  It’s been said that it’s too rigid to effectively keep up with today’s rate of data creation and structural evolution.  The overhead of frequent updates to ontologies, tagging and linkages is just too high.  Another problem is around the anemic adoption of the SPARQL language.  The high level of both technical and domain proficiency required to leverage SPARQL directly — especially when it comes to federated queries or those involving inference — is simply impractical in most commercial situations.  However, it might be feasible to have such skills in a highly specialized domain, such as the human genome project.  (See post on a case study of such a SemWeb implementation.)

But even in highly specialized domains, you run into another problem: ontological realism.  This problem is one of ontological “silos” that naturally occur as a result of optimizing for a specific domain and the need to integrate with ontologies built for neighboring domains.  Such silos reduce the effectiveness of SemWeb efforts, because they impair the ability to run queries and inference across multiple data sources.  There needs to be a widely adopted base ontology and corresponding design methodology that works across multiple domains, yet wouldn’t interfere with your specific domain.  The fact that ontologies need to evolve over time means that consistent effort is needed to adhere to such methodologies to avoid eventual silos.

Why has adoption of the old SemWeb lagged that of simpler implementations, like Schema.org?  One could draw an analogy to adoption of API integration standards.  Adoption of REST/JSON has overtaken SOAP/XML.  (See chart below.)  To understand why, we need to look at the domains in which these technologies are applied.  The compelling use case of loose coupling between unrelated companies or independent teams favored the simplicity of REST.  That said, within the confines of large corporate environments, the rigor of SOAP implementations still make sense. Analogy of rest vs soap to semantic web


When does it make sense?

One of the biggest challenges to the adoption of the old SemWeb has been the lack of clear commercial benefits.  To many corporate CIOs and CTOs, any potential benefit was overshadowed by the TCO (total cost of ownership, including migration overhead and ongoing maintenance).  No doubt the technology and concepts proposed for the old SemWeb are exhilarating.  But rather than falling in love with the technology, the key to adoption has been the existence and realization of a clear business case.  That’s exactly what’s been happening for the successful implementations of the new SemWeb.  For example, Google sees tremendous ROI in implementing its Knowledge Graph, because it greatly improves ad revenue.  Webmasters and Google’s advertisers, in turn, are eager to organize and tag their content per Schema.org for the purpose of SEO/SEM.

Sure, that’s fine for deep-pocketed visionaries like Google.  But how about for the risk averse?  How would they know when there’s likely a sufficient ROI to adopting SemWeb technologies?  CEOs and CTOs looking to incorporate such technologies into their product lines might watch for a trend of increasing acquisitions or VC funding for SemWeb related services.  CIOs looking to support their business operations might wait to hear about success stories from similar corporate implementations.  Researchers and universities may ask whether there been any discoveries substantially aided by SemWeb initiatives.

Additionally, there may be some hope even for the aspects of the old SemWeb vision that haven’t gained adoption yet.  The LOD2 Technology Stack is being funded by the European Commission within the Seventh Framework Programme. It is a set of standards and integrated semantic web tools being developed in conjunction with the EU Open Data Portal. It’s too early to see any obvious success stories. But it’s quite possible that such government support will lead to unexpected new developments from SemWeb efforts. After all, the US Department of Defense’s funding of ARPANET led to the development of the Internet.

There are many paths to adopting the new SemWeb.  Go find yours.

Case study in Linked Data and Semantic Web: The Human Genome Project

The National Human Genome Research Institute’s “GWAS Catalog” (Genome-Wide Association Studies) project is a successful implementation of Linked Data (http://linkeddata.org/) and Semantic Web (http://www.w3.org/standards/semanticweb/) concepts.  This article discusses how the project has been implemented, challenges faced and possible paths for the future.