An API for Ebola Data

By CJ Hendrix

The Humanitarian Data Exchange (HDX) repository offers several ways to access data. Currently, most of the data stored on the HDX repository is in one of two places: External sites, to which HDX simply catalogs a link for download (like this one); Datasets (such as spreadsheets) uploaded by users (like the files here).

Both of these options support only downloading of complete data files. We are experimenting with a new option, database storage (using the CKAN Datastore), which will allow users to query and extract individual data elements through a web API.

Downloading vs API

HDX maintains a dataset of top line figures for the Ebola crisis. It contains information for six indicators:

Cumulative Cases of Ebola
Cumulative Deaths from Ebola
Open Ebola Treatment Centers
People Receiving Food Aid
Appeal Coverage
Currently Affected Countries

If you are writing a report, the easiest approach is simply to download the dataset as a spreadsheet and copy the numbers into your document (possibly after doing some analysis or generating some charts). This is the workflow that HDX has supported in the past.

However, let’s say instead that you are building a web application that needs to use just one of these indicators, e.g. Appeal Coverage, and to keep that indicator up-to-date. In this case, repeatedly downloading the data as a spreadsheet, parsing it, then updating it in your own application is unnecessarily complicated, especially when some of the spreadsheets can be fairly large.

This is where our new data API can help: instead of downloading a large file and trying to find one piece of information in it, your application can send a simple query over the web, and HDX will return exactly the data it needs, in a machine-readable format (JSON).

Example: API query for Ebola data

To pull the value for the Appeal Coverage indicator from the data in the datastore, we send the CKAN standard datastore query to HDX:

https://data.hdx.rwlabs.org/api/action/datastore_search?resource_id=a02903a9-022b-4047-bbb5-45127b591c85

The above API call contains parameters for the resource_ID (which can be found in the URL for a given resource or through the CKAN Action API) and the query string “appeal coverage”, which would match on any field containing that string. The result is a JSON dictionary, including the following (along with some additional metadata omitted here to save space):

{
   "success": true,
   "q": "appeal coverage",
   "records": [
     {
       "latest_date": "2014-11-10T00:00:
       "title": "Appeal Coverage",
       "source_link": "https:\/\/data.hdx.rwlabs.org\/dataset\/fts-ebola-coverage",
       "notes": "",
       "value": 0.658,
       "source": "OCHA",
       "explore": "",
       "_full_count": "1",
       "rank": 0.0901673,
       "units": "ratio",
       "_id": 5
     }
   ]
}

If you wanted to pull all the data for this resource, you can simply search for the resource ID:

https://data.hdx.rwlabs.org/api/action/datastore_search?resource_id=a02903a9-022b-4047-bbb5-45127b591c85

It’s an experiment!

We have just started to experiment with the CKAN Datastore and with this dataset in particular. Most of our datasets are not datastore-enabled right now. We may shortly be adding sparklines and other visualizations that require a time series of these data, so the structure of this dataset may change. You can learn more about using the API on a datastore-enabled resource in the HDX repository by clicking on the “Data API” button, like the one on this page.