Category Archives: open data

DDOD Love from Health Datapalooza 2015

Health Datapalooza

Demand-Driven Open Data (DDOD) has gotten a lot of coverage throughout Health Datapalooza 2015.  I participated in 4 panels throughout the week and had the opportunity to explain DDOD to many constituents.

  • Developer
    Health DevCamp logo
    Developer is a collaborative event for learning about existing and emerging APIs that can be used to develop applications that will help consumers, patients and/or beneficiaries achieve better care through access to health data, especially their own!Areas of focus include:
    • Prototype BlueButton on FHIR API from CMS
    • Project Argonaut
    • Privacy on FHIR initiative
    • Sources of population data from CMS and elsewhere around HHS
  • Health Datapalooza DataLab
    EVENT DETAILS HHS has so much data! Medicare, substance abuse and mental health, social services and disease prevention are only some of the MANY topical domains where HHS provides huge amounts of free data for public consumption. It’s all there on! Don’t know how the data might be useful for you? In the DataLab you’ll meet the people who collect and curate this trove of data assets as they serve up their data for your use. But if you still want inspiration, many of the data owners will co-present with creative, insightful, innovative users of their data to truly demonstrate its alternative value for positive disruptions in health, health care, and social services.

    Moderator: Damon Davis, U.S. Department of Health & Human Services

    Panelists: Natasha Alexeeva, Caretalia; Christina Bethell, PhD, MBA, MPH, Johns Hopkins; Lily Chen, PhD, National Center for Health Statistics; Steve Cohen, Agency for Healthcare Research & Quality; Manuel Figallo, Sas; Reem Ghandour, DrPH, MPA, Maternal and Child Health Bureau; Jennifer King, U.S. Department of Health & Human Services; Jennie Larkin, PhD, National Institutes of Health; Brooklyn Lupari, Substance Abuse & Mental Health Services Administration; Rick Moser, PhD, National Cancer Institute; David Portnoy, MBA, U.S. Department of Health & Human Services; Chris Powers, PharmD, Centers for Medicare and Medicaid Services; Elizabeth Young, RowdMap

  • No, You Can’t Always Get What You Want: Getting What You Need from HHS
    EVENT DETAILSWhile more data is better than less, pushing out any ol’ data isn’t good enough.  As the Data Liberation movement matures, the folks releasing the data face a major challenge in determining what’s the most valuable stuff to put out.  How do they move from smorgasbord to intentionally curated data releases prioritizing the highest-value data?  Folks at HHS are wrestling with this, going out of their way to make sure they understand what you want and ensure you get the yummy data goodies you’re craving.  Learn how HHS is using your requests and feedback to share data differently.  This session explores the HHS new initiative, the Demand-Driven Open Data (DDOD): the lean startup approach to public-private collaboration.  A new initiative out of HHS IDEA Lab, DDOD is bold and ambitious, intending to change the fundamental data sharing mindset throughout HHS agencies — from quantity of datasets published to actual value delivered.

    Moderator: Damon Davis, U.S. Department of Health & Human Services

    Panelists: Phil Bourne, National Institute of Health (NIH); Niall Brennan, Centers for Medicare & Medicaid Services; Jim Craver, MMA, Centers for Disease Control & Prevention; Chris Dymek, EdD, U.S. Department of Health & Human Services; Taha Kass-Hout, Food & Drug Administration; Brian Lee, MPH, Centers for Disease Control & Prevention; David Portnoy, MBA, U.S. Department of Health & Human Services

  • Healthcare Entrepreneurs Boot Camp: Matching Public Health Data with Real-World Business Models
    EVENT DETAILSIf you’ve ever considered starting something using health data, whether a product, service, or offering in an existing business, or a start-up company to take over the world this is something you won’t want to miss.  In this highly-interactive, games-based brew-ha, we pack the room full of flat-out gurus to get an understanding of what it takes to be a healthcare entrepreneur.  Your guides will come from finance and investment; clinical research and medical management; sales and marketing; technology and information services; operations and strategy; analytics and data science; government and policy; business, product, and line owners from payers and providers; and some successful entrepreneurs who have been there and done it for good measure.  We’ll take your idea from the back of a napkin and give you the know-how to make it a reality!

    Orchestrators: Sujata Bhatia, MD, PhD, Harvard University; Niall Brennan, Centers for Medicare & Medicaid Services; Joshua Rosenthal, PhD, RowdMap; Marshall Votta, Leverage Health Solutions

    Panelists: Michael Abate, JD, Dinsmore & Shohl LLP; Stephen Agular, Zaffre Investments; Chris Boone, PhD, Health Data Consortium; Craig Brammer, The Health Collaborative; John Burich, Passport Health Plan; Jim Chase, MHA, Minnesota Community Measurement; Arnaub Chatterjee, Merck; Henriette Coetzer, MD, RowdMap; Jim Craver, MAA, Center for Disease Control; Michelle De Mooy, Center for Democracy and Technology; Gregory Downing, PhD, U.S. Department of Health & Human Services; Chris Dugan, Evolent Health; Margo Edmunds,PhD, AcademyHealth; Douglas Fridsma, MD, PhD, American Medical Informatics Association; Tina Grande, MHS, Healthcare Leadership Council; Mina Hsiang, US Digital Services; Jessica Kahn, Center for Medicare & Medicaid Services; Brian Lee, MPH, Center for Disease Control; David Portnoy, MBA, U.S. Department of Health & Human Services; Aaron Seib, National Association for Trusted Exchange; Maksim Tsvetovat, OpenHealth; David Wennberg, MD, The Dartmouth Institute; Niam Yaraghi, PhD, Brookings Institute; Jean-Ezra Yeung, Ayasdi


There were follow-up publications as well.  Among them, was HHS on a mission to liberate health data from GCN.

GCN article on DDOD
HHS found that its data owners were releasing datasets that were easy to generate and least risky to release, without much regard to what data consumers could really use. The DDOD framework lets HHS prioritize data releases based on the data’s value because, as every request is considered a use case.It lets users — be they researchers, nonprofits or local governments — request data in a systematic, ongoing and transparent way and ensures there will be data consumers for information that’s released, providing immediate, quantifiable value to both the consumer and HHS.

My list of speaking engagements at Palooza is here.

Healthcare Provider Registries

As I’ve been reviewing Use Cases for DDOD (Demand-Driven Open Data), I’m realizing how much the industry depends on an up-to-date, reliable source of healthcare providers (aka, physicians, groups, hospitals, etc.).  Although some people may also call such an effort “NPI registry”, the actual need identified encompasses much more than even the fields and capabilities of the existing NPPES database.

Here are just the Use Cases that directly mention NPPES and other existing registries.

And besides these, there are at least a dozen more that would benefit from this repository, since they rely on the “provider” dimension for their analytics.  For example, most analysis on provider quality, utilization, and fraud depend on this dimension.

The most obvious improvements needed are around:

  • More realistic association between provider, group, and location, recognizing that these are many-to-many relationships that change with time
  • More accurate specialty taxonomy
  • More up to date information (since NPPES entries are rarely updated)
  • Easier method to query this information (rather than relying on zip file downloads)

But there are challenges on the “input” side of the equation as well.  There also seems to be some confusion in terms of assigning rights for modifying registries.  For example, it’s not easy for a provider group to figure out how to delegate update rights for all of its physicians to third party administrator.

There’s a growing list of companies and non-profits (including the American Medical Association) that have been trying to capitalize on the opportunities for a better solution.  As we go about working on the use cases mentioned here, I’d be looking to build a body of knowledge that would contribute to solving the core problems identified.

Related post:  CMS is enabling streamlined access to NPPES, PECOS, EHR to 3rd parties


Field-level data dictionaries for open data

Typically, publicly available open data repositories — especially being hosted or indexed via CKAN — have been described only at the dataset level.  Meaning, typically datasets are described in a DCAT-compatible schema.  This includes the metadata schema required by Project Open Data for and all agency-specific data hosting websites.

But ideally, the cataloging of these datasets should move to a more granular level of detail: field-level.  Doing so, makes it possible for search capabilities to go well beyond the typical tags and predefined categories.  With fields defined, we can quickly find all datasets that have common fields.  That in turn makes it easier to find opportunities for linking across datasets and allows for a related dataset recommendation engine.  The solution becomes even more powerful if the fields are labeled with a predefined semantic vocabulary — that is globally uniquely defined.  (See approach described in Health2.0 Metadata Challenge.)

One challenge to this goal is that CKAN has not historically been good at defining a standard, machine readable data dictionary.  We’ve examined a range of standards and suggestions for defining data dictionaries.  These include common SQL DDL, XML, JSON, and YAML formats.

* ANSI SQL Standard 
   - DDL (Data Definition Language): "CREATE TABLE"
   - SQL/Schemata
        testdb-# \d company
                    Table ""
          Column   |     Type      | Modifiers
         id        | integer       | not null
         name      | text          | not null
         address   | character(50) |
         join_date | date          |
            "company_pkey" PRIMARY KEY, btree (id)

* JSON Table Schema:

    "schema": {
      "fields": [
          "name": "name of field (e.g. column name)",
          "title": "A nicer human readable label or title for the field",
          "type": "A string specifying the type",
        ... more field descriptors
      "primaryKey": ...
      "foreignKeys": ...

* YAML schema files used for Doctrine ORM:

* XML schema syntax for Google's DSPL (Dataset Publishing Language):

* W3 XML Schema:

## CSV storage formats
* Open Knowledge Data Packager - CKAN Extension

* Tabular Data Package Spec:

* The above two are also part of a W3C standards track:


Enter the all powerful CSV

CSV format often a desired format for it’s high interoperability.  However, it suffers from the fact that we need to keep its metadata separately defined.  This in turn causes challenges in version control, broken links and correctly identifying the column order.  There’s also the all-too-common and annoying test that has to be performed to determine if the first row is data or column header.

So is there an elegant, machine-readable, standard-ish way to embed the metadata within the data file itself?  OKFN suggests that the solution could be accomplished via Tabular Data Packages.  Basically, you have the option to provide the data “inline” directly in the datapackage.json file.  The data would be in addition to specifying the full schema (as per JSON Table Schema) and CSV dialect (as per CSVDDF Dialect specification) in the same file.  We just need to have simple scripts that eventually extract these components into separate CSV files and JSON Table Schema.  Open Knowledge Data Packager is a CKAN extension that makes use of JSON Table Schema and Tabular Data Package for hosting datasets on CKAN FileStore.

Finally, there’s a helpful article on Implementing CSV on the Web and W3C’s CSV working group is seeking feedback on model and vocabulary for tabular data.


Is “SchemaStore” CKAN’s mystical unicorn?

As mentioned previously, CKAN hasn’t been strong in storing and managing standard, machine readable data dictionaries.  So a special shout out goes to Greg Lawrence, who has figured out how to solve this limitation.  He’s built a CKAN “SchemaStore” and a custom Java app to index content into CKAN’s DataStore object.  It grabs the needed information by running SQL exports on Oracle tables.  The code that enables SchemaStore is incorporated into the BC Data Catalogue CKAN extension on GitHub.  The field tags are defined in the file of this repository.

An example of the SchemaStore implementation can be found in this sample dataset under the “Object Description” section.  Here you’re able to see all of the relevant elements from the Oracle table object: Column Name, Short Name, Data Type, Data Precision, and Comments.  The data dictionary for this dataset is in machine readable JSON format.  For example, the first 3 fields of the data dictionary are:

details: [
    data_precision: "0",
    column_comments: "The date and time the information was entered.",
    data_type: "DATE",
    short_name: "TIMESTAMP",
    column_name: "ENTRY_TIMESTAMP"
    data_precision: "0",
    column_comments: "The identification of the user that created the initial record.",
    data_type: "VARCHAR2",
    short_name: "ENT_USR_ID",
    column_name: "ENTRY_USERID"
    data_precision: "0",
    column_comments: "A feature code is most importantly a means of linking a features to its name and definition.",
    data_type: "VARCHAR2",
    short_name: "FEAT_CODE",
    column_name: "FEATURE_CODE"
  }, ...
See related issue for (“Make field level metadata searchable and link common fields across the catalog”):

The Birth of Demand-Driven Open Data

And so it begins

My project as an Entrepreneur-in-Residence with the HHS IDEA Lab is called “Innovative Design, Development and Linkages of Databases”.  Think of it as Web 3.0 (the next generation of machine readable and programmable internet applications) applied to open government and focused on healthcare and social service applications.  The underlying hypothesis was that by investigating how HHS could better leverage its vast data repositories as a strategic asset, we would discover innovative ways to create value by linking across datasets from different agencies.

So to sum up…  I was to find opportunities across a trillion dollar organization, where the experts already working with the data have a lifetime of domain-specific experience and several acronyms after their name.  And I was to accomplish this without any dedicated resources within one year.  Pretty easy, right?

My hope was that my big data experience in industry — both for startups and large scale enterprises — was a sufficient catalyst to make progress.  And I had one other significant asset to make it all come together…  I was fortunate that the project was championed by a phenomenal group of internal backers: Keith Tucker and Cynthia Colton, who lead the Enterprise Data Inventory (EDI) in the Office of the Chief Information Officer (OCIO), and Damon Davis, who heads up the Health Data Initiative and

Tell me your data fantasies

The first step was to set out on a journey of discovery.  With guidance and clout from the internal sponsors, I was able to secure meetings with leaders and innovators for big data and analytics efforts across HHS.  I had the privilege of engaging in stimulating discussions at CMS, FDA, NIH, CDC, NCHS, ONC, ASPE and several other organizations.

Upon attempting to synthesize the information gathered into something actionable, I noticed that past open data projects fell into two camps.  In the first camp, were those with ample examples of how external organizations were doing fantastic and often unexpected things with the data.  In the second, while the projects may have been successfully implemented from a technical perspective, it wasn’t clear whether or how the data was being used.

The “aha” moment

That’s when it hit me — we’re trying to solve the wrong problem.  It seemed that the greatest value that has been created with existing HHS data — and thereby the most innovative linkages — has been done by industry, researchers and citizen activists.  That meant we can accomplish the main goals of the project if we look at the problem a bit differently.  Instead of outright building the linkages that we think have value, we can accelerate the rate at which external organizations to do what they do best.

It seemed so obvious now. In fact, I had personally experienced this phenomenon myself.  Prior to my HHS fellowship, I built an online marketplace for medical services called Symbiosis Health.  I made use of three datasets across different HHS organizations.  But I did so with great difficulty.  Each had deficiencies which I thought should be easy to fix.  It might be providing more frequent refreshes, adding a field that enables joins to another dataset, providing a data dictionary or consolidating data sources.  If only I could have told someone at HHS what we needed!

Let’s pivot this thing

Thus, the “pivot” was made.  While pivoting is a well known concept for rapid course correction in Lean Startup circles, it’s not something typically associated with government.  Entrepreneurs are supposed to allow themselves to make mistakes and make fast course corrections.  Government is supposed to plan ahead and stay the course.  Except in this case we have the best of both worlds — IDEA Lab.  It gives access to all the resources and deep domain expertise of HHS, but with the ability to pivot and continue to iterate without being weighed down by original assumptions!  I feel fortunate for an opportunity to work in such an environment.

Pivoting into Demand-Driven Open Data

So what exactly is this thing?

The project born from this pivot is called Demand-Driven Open Data (DDOD).  It’s a framework of tools and methods to provide a systematic, ongoing and transparent mechanism for industry and academia to tell HHS what data they need.  With DDOD, all open data efforts are managed in terms of “use cases” which enables allocation of limited resources based on value.  It’s the Lean Startup approach to open data.  The concept is to minimize up front development, acquiring customers before you build the product.

As the use cases are completed, several things happen.  Outside of the actual work done on adding and improving datasets, both the specifications and the solution associated with the use cases are documented and made publicly available on the DDOD website.  Additionally, for the datasets involved and linkages enabled, we add or enhance relevant tagging, dataset-level metadata, data dictionary, cross-dataset relationships and long form dataset descriptions.  This approach, in turn, accelerates future discoveries of datasets.  And best of all, it stimulates the linking we wanted in the first place, through coded relationships and field-level matching. 

How does it fit into the big picture?

It’s beautiful how the pieces come together.  DDOD incorporates quite well with HHS’s existing Health Data Initiative (HDI) and  While DDOD is demand-driven from outside of HHS, you can think of HDI as its supply-driven counterpart.  That’s the one guided by brilliant subject matter experts throughout HHS.  Finally, is the data indexing and discovery platform that serves as a home for enabling both these components.  As a matter of fact, we’re looking for DDOD to serve as the community section of

Let’s roll!

So now the fun begins.  Next up…  More adventure as we work through actual pilot use cases.  We’ll also cover some cool potential components of DDOD that would put more emphasis on the “linkages” aspect of the project.  These include usage analytics, data maturity reporting, and semantic tagging of the dataset catalog and fields in the data dictionary.  Stay tuned.

 In the mean time, you can get involved in two ways…  Get the word out to your network about the opportunities provided by DDOD.  Or, if you have actual use cases to add, go to and get them entered.


CMS is enabling streamlined access to NPPES, PECOS, EHR to 3rd parties

I had a couple conversations this week with subject matter experts from industry and government about the NPPES and PECOS systems.

NPPES (National Plan and Provider Enumeration System) is a registry of healthcare providers, including their NPI (National Provider Identifier), specialty taxonomy and contact information.

PECOS (Medicare Provider Enrollment, Chain, and Ownership System) is a system that supports Medicare enrollment for providers and has its own similar database.

There seems to be a lot of demand for these systems to be:

  1. Kept up to date.  Currently NPPES is often too out of date to be useful for patients.  PECOS is updated more frequently, but isn’t available publicly.
  2. Easier to update.  One of the reasons NPPES is not updated often is the difficulty and overhead of doing so.  It would benefit greatly from an easier user interface, a public API and ability for surrogate 3rd parties to make updates.
  3. More realistic.  The data model for NPPES is much to simplistic to reflect the way providers currently do their work.  It should allow for many-to-many relationships between physicians, organizations and locations.
  4. Kept in sync. Discrepancies between NPPES and PECOS may be hard to resolve.  Sometimes it’s due to NPPES being out of date.  Other times it’s because the provider handles billing for Medicare differently .

First, my colleague and fellow HHS Entrepreneur-in-Residence, Alan Viars, has been leading a phenomenal effort to build a robust API for NPPES.  It was created as part of HHS IDEA Lab’s NPPES Modernization Project.  It’s designed to handle both efficient read access wanted by many applications and robust methods for making changes.  It was developed to focus on functionality and let external developers design beautiful user interfaces.

Second, CMS’s Identity & Access (I&A) Management System may help with some of these needs.  I&A is supposed to enable “streamlined access to NPPES, PECOS, and EHR” to both healthcare providers and their 3rd party surrogates.  There’s an introductory presentation on the topic that explains further:  That said, I still need to familiarize myself with it and its capabilities.


PS: In an effort to help people who had problems with the CMS website, I uploaded a video to YouTube that demonstrates how a 3rd party can request to work on behalf of a healthcare provider as a surrogate.

U.S. Turns to Private Sector for IT Innovation

Last week, the Department of Health and Human Services welcomed its third group of “entrepreneurs-in-residence” — mainly private-sector tech experts and start-up founders who are spending a year advising the agency on its health IT projects.   Read the full article in The Washington Post

HHS's IDEA Lab External Entrepreneurs

U.S. Turns to Private Sector for IT Innovation and HHS’s IDEA Lab External Entrepreneurs make it happen.
(Photo: David Portnoy, from left, Mark Scrimshire, Niall Brennan and Damon Davis working on project in the HHS IDEA Lab.)

HHS IDEA Lab – Innovative Linkages Initiative

HHS IDEA LabFull speed ahead on a bold new initiative from the HHS IDEA Lab called “Innovative Design, Development and Linkages of Databases“.

As the largest funder of biomedical research in the world, U.S. Department of Health and Human Services (HHS) directly and indirectly generates massive amounts of scientific data through research, grants, and contracts. The HHS Office of the Chief Information Officer and the HHS IDEA Lab want to build an innovative strategy to design, develop and link public-facing research database applications for the HHS.

The goal of this project is to create a solution to the U.S. Department of Health and Human Services’ (HHS)  current problem of multiple, disparate data sources that simultaneously meets the requirements of two new White House memoranda (Increasing Access to Results of Federally Funded Scientific Research and Open Data Policy – Managing Information as an Asset).

What Happened to the Semantic Web?

It looks bleak

Over the past few years, there have been questions asked about the viability of the Semantic Web (aka, SemWeb) envisioned by Tim Berners-Lee.  In the strictest sense, the original standards set out by the W3C have not proliferated at any great pace and have not been widely adopted commercially. There are also no multi-billion dollar acquisitions or IPOs in the SemWeb space.  Even in government and academia, the vast majority of “open data” is in traditional relational form (rather than RDF linked datasets) and don’t reference widely adopted ontologies.

Evidence of decline?


But it’s a matter of framing

The outlook changes drastically if we look at the question a bit differently. Rather than defining the SemWeb as the original set of standards or narrow vision, what if we look at related technologies that it may have spawned or influenced.  Now a number of success stories emerge.  We have the tremendous growth of and adoption of Microdata among the 3 big search engines: Google, Yahoo, and Bing.  We also have SemWeb concepts applied in Google’s Knowledge Graph, Google Rich Data Snippets, and Facebook Social Graph.  Even IBM’s Watson is no longer just an IBM Research project.  It’s being commercialized into IBM’s verticals, including healthcare, insurance and finance. So SemWeb technologies are alive — in a sense.  For the purpose of clarity, let’s refer to the original W3C vision discussed since 2001 as the “old SemWeb” and the recent commercial successes as the “new SemWeb”.  Of course, these are fuzzy definitions, since the new SemWeb is not formally defined.


What’s wrong with the original vision?

The W3C breaks the elements of the old SemWeb into: (1) Linked Data, (2) Vocabularies, (3) Inference, and (4) Query.  Each of which are widely in use today, but in a way that’s different from original specs.  For example, linked data implemented as Microdata or JSON-LD has gained popularity over the heavier and more verbose RDF/XML.  Most websites forgo formally defined OWL ontologies for vocabularies found on databases like or Freebase.  Rule engines and reasoners are already built into products we use.  It’s what happens in the “brains” of Google’s page rank and ad optimization algorithms.  And instead of the SPARQL query language, humans interact often interact with the new SemWeb through natural language searches, while machines through RESTful APIs.  With IBM’s Watson translates questions into sophisticated queries involving federation and inference against its knowledge base.

There are a couple other difficulties with the old SemWeb worthy of noting.  It’s been said that it’s too rigid to effectively keep up with today’s rate of data creation and structural evolution.  The overhead of frequent updates to ontologies, tagging and linkages is just too high.  Another problem is around the anemic adoption of the SPARQL language.  The high level of both technical and domain proficiency required to leverage SPARQL directly — especially when it comes to federated queries or those involving inference — is simply impractical in most commercial situations.  However, it might be feasible to have such skills in a highly specialized domain, such as the human genome project.  (See post on a case study of such a SemWeb implementation.)

But even in highly specialized domains, you run into another problem: ontological realism.  This problem is one of ontological “silos” that naturally occur as a result of optimizing for a specific domain and the need to integrate with ontologies built for neighboring domains.  Such silos reduce the effectiveness of SemWeb efforts, because they impair the ability to run queries and inference across multiple data sources.  There needs to be a widely adopted base ontology and corresponding design methodology that works across multiple domains, yet wouldn’t interfere with your specific domain.  The fact that ontologies need to evolve over time means that consistent effort is needed to adhere to such methodologies to avoid eventual silos.

Why has adoption of the old SemWeb lagged that of simpler implementations, like  One could draw an analogy to adoption of API integration standards.  Adoption of REST/JSON has overtaken SOAP/XML.  (See chart below.)  To understand why, we need to look at the domains in which these technologies are applied.  The compelling use case of loose coupling between unrelated companies or independent teams favored the simplicity of REST.  That said, within the confines of large corporate environments, the rigor of SOAP implementations still make sense. Analogy of rest vs soap to semantic web


When does it make sense?

One of the biggest challenges to the adoption of the old SemWeb has been the lack of clear commercial benefits.  To many corporate CIOs and CTOs, any potential benefit was overshadowed by the TCO (total cost of ownership, including migration overhead and ongoing maintenance).  No doubt the technology and concepts proposed for the old SemWeb are exhilarating.  But rather than falling in love with the technology, the key to adoption has been the existence and realization of a clear business case.  That’s exactly what’s been happening for the successful implementations of the new SemWeb.  For example, Google sees tremendous ROI in implementing its Knowledge Graph, because it greatly improves ad revenue.  Webmasters and Google’s advertisers, in turn, are eager to organize and tag their content per for the purpose of SEO/SEM.

Sure, that’s fine for deep-pocketed visionaries like Google.  But how about for the risk averse?  How would they know when there’s likely a sufficient ROI to adopting SemWeb technologies?  CEOs and CTOs looking to incorporate such technologies into their product lines might watch for a trend of increasing acquisitions or VC funding for SemWeb related services.  CIOs looking to support their business operations might wait to hear about success stories from similar corporate implementations.  Researchers and universities may ask whether there been any discoveries substantially aided by SemWeb initiatives.

Additionally, there may be some hope even for the aspects of the old SemWeb vision that haven’t gained adoption yet.  The LOD2 Technology Stack is being funded by the European Commission within the Seventh Framework Programme. It is a set of standards and integrated semantic web tools being developed in conjunction with the EU Open Data Portal. It’s too early to see any obvious success stories. But it’s quite possible that such government support will lead to unexpected new developments from SemWeb efforts. After all, the US Department of Defense’s funding of ARPANET led to the development of the Internet.

There are many paths to adopting the new SemWeb.  Go find yours.

Case study in Linked Data and Semantic Web: The Human Genome Project

The National Human Genome Research Institute’s “GWAS Catalog” (Genome-Wide Association Studies) project is a successful implementation of Linked Data ( and Semantic Web ( concepts.  This article discusses how the project has been implemented, challenges faced and possible paths for the future.