Open Data Panel at All Things Open Conference

Open Data Panel to be Featured at All Things Open

Open Data will be a featured panel discussion at the All Things Open conference this year, .  With a new administration set to transition into place in January and multiple new initiatives starting at both the state and federal levels, the topic has never been more important.  The session, which will take place Wednesday, October 26 at 1:30 pm ET, will feature some of the foremost experts in the world.

Topics to be discussed will include:

  • The New Open Data Transition Report
  • Future opportunities for Open Data at the local and federal levels with the DATA Act
  • How the Open Data landscape is evolving, particularly through Demand-Driven Open Data (DDOD)
  • Future opportunities in open data at the Federal and local levels
  • How the panel’s insights can help local governments create demand driven open data programs

The world-class lineup of panel members will include:

  • Joel Gurin-  (President and Founder, Center for Open Data Enterprise)
  • Hudson Hollister –  (Founder and Executive Director, Data Coalition)
  • David Portnoy –  (Founder, Demand-Driven Open Data)
  • Tony Fung –  (Deputy Secretary of Technology, State of Virginia)
  • Andreas Addison –  (President, Civic Innovator LLC)
  • Sam McClenney –  (Economist, RTI International)
  • Caroline Sullivan – (Wake County Commissioner)

The panel is open to attendees of All Things Open, the largest “open” technology event on the east coast of The Unites States


Share Button

Python Serverless Microframework

AWS has introduced a Python serverless microframework.  It’s a beautiful concept, making it super simple to create and deploy an APIs with infinite scalability.  It does so by leveraging the Amazon API Gateway and AWS Lambda without the learning curve.  That said, the ideal use case for the frameworke is for rapid prototyping or highly scalable deployment of a very simple rest API.  It’s the right choice as long as the API you need is a good candidate for development using Python with Flask-like view decorators.

The framework is accessible via the Chalice GitHub repository, providing a CLI (command line tool) for creating, deploying, and managing your app.  All  you need to deploy a new API is to put your app in a Python file.  How little effort is needed to get going?  How about this…

pip install chalice
chalice new-project
chalice deploy

Now you’re ready to hit the endpoint you configured!

(Note that as a tradeoff for simplicity, not all API Gateway and Lambda services are exposed to Chalice.  There is a simple way to consume AWS Lambda’s built-in logging via Amazon CloudWatch Logs.)

Share Button

Predictive Analytics: Why IoT is different

The challenge: Moving from batch retrospective to real-time predictive analytics

First, let’s get the lay of the land.  Batch processed analytics has a solid track record with well-defined use cases and best practices.  And most analytics today are still limited to getting retrospective answers.  Many organizations are turning to the more forward-looking predictive analytics to broaden the actions they could take beyond the confines of historical decisions and add ability to ask what-if questions.

The Internet of Things (IoT) on the other hand is still quite new, with methodologies for solving unique problems, such as those around real-time processing and connectivity limitations, still being hashed out.  Now add the desire to move from retrospective to predictive and you have a world of new challenges.  In the past few years, there have been great strides made in both technologies and methods in processing of data in real-time.  But the move to predictive often takes an even bigger leap, due to the challenge that to get more incremental insight requires an exponential increase in data ingestion and processing capacity.  As a result, predictive analytics still accounts for only a small fraction of a typical organization’s analytical capabilities.


IoT adds another twist: Network and processing bottlenecks

What makes IoT different when it comes to predictive analytics?   In some applications, the majority of collected data loses value within milliseconds.  Historically, data collection has been the hard part of a predictive analytics system.  However, that’s shifting, especially with advances in industrial IoT technologies.  IoT brings capabilities to scale the volume of data collection while reducing latency.  As a result, now collection is becoming the easy part.  The bottlenecks start at sanitization, modeling and integration.  This in turn makes the downstream components of analytics and taking action more challenging.

When looking at optimizing an IoT implementation, it’s important to balance the roles and capabilities of “edge” vs. “cloud”.  Edge refers to specialized infrastructure that can improve performance through physical proximity.  It enables analytics and knowledge generation to occur closer to the source of the data.  Edge gives you responsiveness, but not scale.  Cloud, on the other hand, gives you scale but not responsiveness.  


Making it work: Configuring the 3 components

There are three core components in any IoT implementation.  First, collection, which includes sensing, network, storage and query capability.  Second, learning, to analyze the data and generate predictions.  And, third, taking action, typically using automated methods, on the analytics from prior stages.  In traditional cloud-centric IoT architecture, while the actual sensors are outside of the cloud, the collect, learn and act components of the system often run into responsiveness challenges.  In situations where data collection volumes are particularly large, the overhead of network communications has a significant impact on cost, sometimes up to 50% of the entire system.  

Moving any portion of these three components to the edge results in performance gains, because less data needs to be moved between each.  More buffering and storage can occur at the edge as the cost of memory and disk continues to drop.  (It should also be noted however that the storage and query functions of the system may become less prevalent as data is processed and acted on in real-time.)  Then various aspects of data filtering, computation and predictive analytics can be executed at the edge, so that only data required for centralized processing needs to be moved.  These gains need to be balanced against increasing system complexity resulting from the fact that edge resources may not be continuously connected to the network.  So clearly there are many ways to implement and fine tune predictive analytics for IoT.  Doing this will only get easier as the field matures.

Predix IoT cloud platform
For context…  The subject matter for this post came from a talk by Venu Vasudevan, Professor of Electrical & Computer Engineering at Rice University, where we discussed what makes IoT more challenging when it comes to specifically predictive analytics.  This topic was presented at an IoT meetup for the Predix platform, a cloud-based IoT platform-as-a-service (PaaS) created by GE.  It’s open source and built on CouldFoundry’s stack.  Predix is available on AWS and Azure cloud services.  

Share Button

Open Referral standard

Open_ReferralThe DDOD program is currently assisting the proponents of a new open standard for publishing human services, called Open Referral.  In order for us to be able to justify the promotion of this standard and publication of data to it, we’re first looking to develop clear and concise use cases.

The Background

Open Referral is a standard that originally came out of a Code for America initiative a couple years ago, with the goal of automating the updating of human services offered across many programs.  Doing so would not only make offered services more discoverable, but also lower the cost of administration for the service providers and referring organizations.

The Problem: A landscape of siloed directories

It’s hard to see the safety net. Which agencies provide what services to whom? Where and how can people access them? These details are always in flux. Nonprofit and government agencies are often under-resourced and overwhelmed, and it may not be a priority for them to push information out to attract more customers.

So there are many ‘referral services’ — such as call centers, resource directories, and web applications — that collect directory information about health, human, and social services. However, these directories are all locked in fragmented and redundant silos. As a result of this costly and ineffective status quo:

  • People in need have difficulty discovering and accessing services that can help them live better lives.
  • Service providers struggle to connect clients with other services that can help meet complex needs.
  • Decision-makers are unable to gauge the effectiveness of programs at improving community health.
  • Innovators are stymied by lack of access to data that could power valuable tools for any of the above.  

– Source: Open Referral project description

For potential use cases, there have been a small handful of government programs identified as potential pilots.  These include:


The Competition

Open Referral is not without competing standards.  In fact, the AIRS/211 Taxonomy is already widely used among certified providers of information and referral services, such as iCarol.  However, AIRS/211 has two drawbacks in comparison with Open Referral.  

First, it’s not a free and open standard.  While there are sample PDFs available for parts of the taxonomy, a full spec requires a subscription.

“If you wish to evaluate the Taxonomy prior to subscribing, you can register for evaluation purposes and have access to the full Taxonomy for a limited period of time through the search function. ”  – Source: UAIRS/211 Download page and Subscription page

The taxonomy also requires an annual license fee, which could be a challenge to continue funding in perpetuity for government and nonprofit organizations.

“Organizations need a license to engage in any use of the Taxonomy.”
— Source: AIRS/211 Subscription page

Second, the AIRS/221 taxonomy if highly structured and extensive.  While that has advantages for consistency and interoperability, it raises other challenges.  It leads to a high learning curve and therefore sets potentials barriers for organizations without technical expertise.  Open Referral states that it is a more lightweight option.

It should also be noted that there’s a CivicServices schema defined for use with  Its approach is to embed machine-readable “Microdata” throughout human-readable HTML web pages. standards are intended to be interpreted by web engines like Google, Bing and Yahoo when indexing a website.  That said, the degree of adoption for CivicServices in particular – from either search engines or information publishers – is unclear at this point.



In concept, the Open Referral standard would lower the cost and lag time for organizations to update relevant services for their constituents.  The standard is being evangelized by Greg Bloom, who has started with Code for America and has been reaching out to organizations who would be consuming this data (such as Crisis Text Line, Purple Binder and iCarol) for the purpose of defining a compelling use case.

There’s a DDOD writeup on this topic at “Interoperability: Directories of health, human and social services”, intended to facilitate creation of practical use cases.



Further reading…

Additional information on Open Referral can be found at:

Share Button

Rheumatoid Arthritis Data Challenge

Looking forward to seeing the evolution of the Rheumatoid Arthritis Data Challenge.  Here are the parameters…

  • Title: Rheumatoid Arthritis Data Challenge
  • Announcement date: March 8, 2016
  • Award date: May 10, 2016
  • Summary:
The Rheumatoid Arthritis Data Challenge is a code-a-thon, described as:

“Striking at the heart of a key issue in health outcomes research, participants will be provided access to a secured development environment in a staged competition over three weeks to create the best competitive algorithms to gauge clinical response in Rheumatoid Arthritis management.”
The challenge is hosted by Health Datapalooza in May 2016. It’s sponsored by Optum, Academy Health, and the US Department of Health and Human Services (HHS). This challenge uses non-governmental de-identified administrative claims data and electronic record clinical (EHR) data with the goal of establishing algorithms to predict clinical response to rheumatoid arthritis management. Applications are open to any team of health data enthusiasts, but only 15 of these will be selected for participation. (Register at: Winners announced at the Health Datapalooza on May 10, 2016, with $40,000 in prizes to be awarded.
Share Button

Open Data Discoverability

I’m adding a working document to cover the topic of open data discoverability and usability.  It appears as though this is an area that is in desperate need for attention.  I have come across it tangentially throughout much of my work.  It deserves to be aggregated and curated.  There are also some lingering opportunities to make practical use of semantic web concepts.  There are vast repositories of data assets throughout government, academia and industry that could be better leveraged.  So lets make it happen.


Share Button

DDOD featured on Digital Gov

DDOD logoThe Demand-Driven Open Data (DDOD) program has recently been featured on DigitalGov.  (See DigitalGov article.)

It should be added, that a major project in the works is the merging of DDOD tools and methodologies into the larger program.  The effort seeks to maximize the value of existing data assets from across HHS agencies (CMS, FDA, CDC, NIH, etc.).  Already planned are new features to enhance data discoverability and usability.

We’re also looking into how to improve the growing knowledge base of DDOD use cases by leveraging semantic web and linked open data (LOD) concepts.  A couple years ago, HHS organized the Health Data Platform Metadata Challenge – Health 2.0.  The findings from this exercise could be leveraged for both DDOD and

DDOD featured on DigitalGov

Share Button

Public Access Repositories for Federally Funded Research

According to OSTP, there has been growth in the use of public access repositories intended to store results of federally funded research.  That’s good news.  Despite a mandate from February 2013 that such results be made available, the adoption by the research community has been slow.  Challenges include the competitive nature of research, mixing of multiple sources of funding, licensing conflicts with private peer reviewed publications, privacy concerns for study subjects, and many others.  Actually, even the raw data and source code for the calculations needs to be mHHS Public access reposade available.  For a research study, the clearest measure for meeting this mandate is complete reproducibility.

So while we’re quite far away from the ultimate goal, there have been incremental gains.  The HHS statistical agencies (including NIH, AHRQ, CDC, C FDA and ASPR) in particular have been using two systems: PubMed Central and CDC Stacks.  According to the latest figuresGrowth in PubMed from OSTP, on a typical weekday PubMed has than 1.2 million unique users who are downloading 2 million articles.  While that’s impressive, the actual growth in the number of articles in the two years since the mandate is approximately 30% (from about 2.7 million to 3.5 million).  So much more work remains.


Open Access repositories at a glance

Share Button

Plans for Demand-Driven Open Data 2.0

Demand-Driven Open Data (DDOD) is a component HHS’s Health Data Initiative (HDI) represented publicly by  DDOD is a framework of tools and methods to provide a systematic, ongoing and transparent mechanism for industry and academia to tell HHS more about their data needs.  The DDOD project description has recently been updated on the HHS IDEA Lab website:   The writeup includes the problem description, background and history, the DDOD solution and process, and future plans.

In November 2015, the project has undergone an extensive evaluation of the activities and accomplishments from the prior year.  Based on the observations, plans are in place to deploy DDOD 2.0 in 2016.  On the process side, the new version will have clearly defined SOPs (standard operating procedures), better instructions for data requesters and data program owners, and up-front validation of use cases.  On the technology side, DDOD will integrate with the current platform, with the goals of optimizing data discoverability and usability.  It will also include dashboards, data quality analytics, and automated validation of use case content.  These features help guide the operations of DODD and workflow.

Share Button

Invisible Illness Codaton

Identifying Datasets for Invisible Illness Codathon

Several datasets were identified for use on a recent White House codathon on mental illness and suicide prevention.  (See related press release.)  Many of them were from HHS (U.S. Department of Health and Human Services) agencies: CDCSAMHSA and AHRQ.  Datasets throughout government were tagged with “Suicide” for easy retrieval.  These tags were then ingested and aggregated up to, specifically

Source: White House – Suicide Prevention/Mental Health & Data for Invisible Illnesses

Data sourcesCDC Suicide data sources

  • WHO Statistical Information System (WHOSIS)WHOSIS, the WHO Statistical Information System, is an interactive database bringing together core health statistics for the 193 WHO Member States. It comprises more than 70 indicators, which can be accessed by way of a quick search, by major categories, or through user-defined tables. The data can be further filtered, tabulated, charted and downloaded.
  • International Crime Victims Surveys
  • National Inpatient Sample (NIS)The NIS is a database of hospital inpatient stays used to identify, track, and analyze national trends in health care utilization, access, charges, quality, and outcomes. The NIS is the largest all-payer inpatient care database that is publicly available in the United States, containing data from approximately 8 million hospital stays from about 1,000 hospitals sampled to approximate a 20-percent stratified sample of U.S. community hospitals
  • National Survey on Drug Use and Health (NSDUH)Beginning in 2008 the National Survey on Drug Use and Health Report starting asking suicidal thoughts and behaviors of all adults aged 18 or older. Along with responses for the suicide-related questions, the survey collects nationally- and state-representative information on socio-demographic items such as age group, sex, ethnicity, employment, and income.
  • Pan American Health Association, Regional Core Health Data InitiativeIn 1995, the Regional Core Health Data and Country Profile Initiative was launched by the Pan American Health Organization to monitor the attainment of health goals of the Member States. The initiative includes a database with 117 health-related indicators, country health profiles, and reference documents.
  • The American Association of SuicidologyThe goal of the American Association of Suicidology (AAS) is to understand and prevent suicide. The Research Division of AAS is dedicated to advancing knowledge about suicidal behavior through science.
  • Suicide Attack Database – current CPOST-SAD (release contains the universe of suicide attacks from 1982 through June 2015, a total of 4,620 attacks in over 40 countries.
  • Behavioral Risk Factor Surveillance System (BRFSS) —Collects data on a variety of behavioral health issues through a national telephone survey developed by the US Centers for Disease Control and Prevention (CDC), and administered to a sample of households in the US. Some states include questions on suicidal behavior.
  • Department of Defense Suicide Event Report (DoDSER) Data – The Department of Defense Suicide Event Report (DoDSER) is the system of record for health surveillance related to suicide ideations, attempts, and deaths.


Overview for using these data sources



Share Button