It looks bleak
Over the past few years, there have been questions asked about the viability of the Semantic Web (aka, SemWeb) envisioned by Tim Berners-Lee. In the strictest sense, the original standards set out by the W3C have not proliferated at any great pace and have not been widely adopted commercially. There are also no multi-billion dollar acquisitions or IPOs in the SemWeb space. Even in government and academia, the vast majority of “open data” is in traditional relational form (rather than RDF linked datasets) and don’t reference widely adopted ontologies.
But it’s a matter of framing
The outlook changes drastically if we look at the question a bit differently. Rather than defining the SemWeb as the original set of standards or narrow vision, what if we look at related technologies that it may have spawned or influenced. Now a number of success stories emerge. We have the tremendous growth of Schema.org and adoption of Microdata among the 3 big search engines: Google, Yahoo, and Bing. We also have SemWeb concepts applied in Google’s Knowledge Graph, Google Rich Data Snippets, and Facebook Social Graph. Even IBM’s Watson is no longer just an IBM Research project. It’s being commercialized into IBM’s verticals, including healthcare, insurance and finance. So SemWeb technologies are alive — in a sense. For the purpose of clarity, let’s refer to the original W3C vision discussed since 2001 as the “old SemWeb” and the recent commercial successes as the “new SemWeb”. Of course, these are fuzzy definitions, since the new SemWeb is not formally defined.
What’s wrong with the original vision?
The W3C breaks the elements of the old SemWeb into: (1) Linked Data, (2) Vocabularies, (3) Inference, and (4) Query. Each of which are widely in use today, but in a way that’s different from original specs. For example, linked data implemented as Microdata or JSON-LD has gained popularity over the heavier and more verbose RDF/XML. Most websites forgo formally defined OWL ontologies for vocabularies found on databases like Schema.org or Freebase. Rule engines and reasoners are already built into products we use. It’s what happens in the “brains” of Google’s page rank and ad optimization algorithms. And instead of the SPARQL query language, humans interact often interact with the new SemWeb through natural language searches, while machines through RESTful APIs. With IBM’s Watson translates questions into sophisticated queries involving federation and inference against its knowledge base.
There are a couple other difficulties with the old SemWeb worthy of noting. It’s been said that it’s too rigid to effectively keep up with today’s rate of data creation and structural evolution. The overhead of frequent updates to ontologies, tagging and linkages is just too high. Another problem is around the anemic adoption of the SPARQL language. The high level of both technical and domain proficiency required to leverage SPARQL directly — especially when it comes to federated queries or those involving inference — is simply impractical in most commercial situations. However, it might be feasible to have such skills in a highly specialized domain, such as the human genome project. (See post on a case study of such a SemWeb implementation.)
But even in highly specialized domains, you run into another problem: ontological realism. This problem is one of ontological “silos” that naturally occur as a result of optimizing for a specific domain and the need to integrate with ontologies built for neighboring domains. Such silos reduce the effectiveness of SemWeb efforts, because they impair the ability to run queries and inference across multiple data sources. There needs to be a widely adopted base ontology and corresponding design methodology that works across multiple domains, yet wouldn’t interfere with your specific domain. The fact that ontologies need to evolve over time means that consistent effort is needed to adhere to such methodologies to avoid eventual silos.
|Why has adoption of the old SemWeb lagged that of simpler implementations, like Schema.org? One could draw an analogy to adoption of API integration standards. Adoption of REST/JSON has overtaken SOAP/XML. (See chart below.) To understand why, we need to look at the domains in which these technologies are applied. The compelling use case of loose coupling between unrelated companies or independent teams favored the simplicity of REST. That said, within the confines of large corporate environments, the rigor of SOAP implementations still make sense.
When does it make sense?
One of the biggest challenges to the adoption of the old SemWeb has been the lack of clear commercial benefits. To many corporate CIOs and CTOs, any potential benefit was overshadowed by the TCO (total cost of ownership, including migration overhead and ongoing maintenance). No doubt the technology and concepts proposed for the old SemWeb are exhilarating. But rather than falling in love with the technology, the key to adoption has been the existence and realization of a clear business case. That’s exactly what’s been happening for the successful implementations of the new SemWeb. For example, Google sees tremendous ROI in implementing its Knowledge Graph, because it greatly improves ad revenue. Webmasters and Google’s advertisers, in turn, are eager to organize and tag their content per Schema.org for the purpose of SEO/SEM.
Sure, that’s fine for deep-pocketed visionaries like Google. But how about for the risk averse? How would they know when there’s likely a sufficient ROI to adopting SemWeb technologies? CEOs and CTOs looking to incorporate such technologies into their product lines might watch for a trend of increasing acquisitions or VC funding for SemWeb related services. CIOs looking to support their business operations might wait to hear about success stories from similar corporate implementations. Researchers and universities may ask whether there been any discoveries substantially aided by SemWeb initiatives.
Additionally, there may be some hope even for the aspects of the old SemWeb vision that haven’t gained adoption yet. The LOD2 Technology Stack is being funded by the European Commission within the Seventh Framework Programme. It is a set of standards and integrated semantic web tools being developed in conjunction with the EU Open Data Portal. It’s too early to see any obvious success stories. But it’s quite possible that such government support will lead to unexpected new developments from SemWeb efforts. After all, the US Department of Defense’s funding of ARPANET led to the development of the Internet.
There are many paths to adopting the new SemWeb. Go find yours.