Share
The Humanitarian Data Exchange (HDX) repository offers several ways to access data. Currently, most of the data stored on the HDX repository is in one of two places: External sites, to which HDX simply catalogs a link for download (like this one); Datasets (such as spreadsheets) uploaded by users (like the files here).
Both of these options support only downloading of complete data files. We are experimenting with a new option, database storage (using the CKAN Datastore), which will allow users to query and extract individual data elements through a web API.
Downloading vs API
HDX maintains a dataset of top line figures for the Ebola crisis. It contains information for six indicators:
- Cumulative Cases of Ebola
- Cumulative Deaths from Ebola
- Open Ebola Treatment Centers
- People Receiving Food Aid
- Appeal Coverage
- Currently Affected Countries
If you are writing a report, the easiest approach is simply to download the dataset as a spreadsheet and copy the numbers into your document (possibly after doing some analysis or generating some charts). This is the workflow that HDX has supported in the past.
However, let’s say instead that you are building a web application that needs to use just one of these indicators, e.g. Appeal Coverage, and to keep that indicator up-to-date. In this case, repeatedly downloading the data as a spreadsheet, parsing it, then updating it in your own application is unnecessarily complicated, especially when some of the spreadsheets can be fairly large.
This is where our new data API can help: instead of downloading a large file and trying to find one piece of information in it, your application can send a simple query over the web, and HDX will return exactly the data it needs, in a machine-readable format (JSON).
Example: API query for Ebola data
To pull the value for the Appeal Coverage indicator from the data in the datastore, we send the CKAN standard datastore query to HDX:
https://data.hdx.rwlabs.org/api/action/datastore_search?resource_id=a02903a9-022b-4047-bbb5-45127b591c85
The above API call contains parameters for the resource_ID (which can be found in the URL for a given resource or through the CKAN Action API) and the query string “appeal coverage”, which would match on any field containing that string. The result is a JSON dictionary, including the following (along with some additional metadata omitted here to save space):
{ "success": true, "q": "appeal coverage", "records": [ { "latest_date": "2014-11-10T00:00: "title": "Appeal Coverage", "source_link": "https:\/\/data.hdx.rwlabs.org\/dataset\/fts-ebola-coverage", "notes": "", "value": 0.658, "source": "OCHA", "explore": "", "_full_count": "1", "rank": 0.0901673, "units": "ratio", "_id": 5 } ] }
If you wanted to pull all the data for this resource, you can simply search for the resource ID:
https://data.hdx.rwlabs.org/api/action/datastore_search?resource_id=a02903a9-022b-4047-bbb5-45127b591c85
It’s an experiment!
We have just started to experiment with the CKAN Datastore and with this dataset in particular. Most of our datasets are not datastore-enabled right now. We may shortly be adding sparklines and other visualizations that require a time series of these data, so the structure of this dataset may change. You can learn more about using the API on a datastore-enabled resource in the HDX repository by clicking on the “Data API” button, like the one on this page.
Feedback?
If you have some thoughts about using APIs to access our data, email us at hdx.feedback@gmail.com.
I can say from experience that Datastore is very flexible. You could easily build field apps that talk directly with a dataset via the API to keep datasets updated.
In times where a rapid response is required, with a need to publish data in a near real time basis, then CKAN can certainly be used.
You can also structure your datasets to be flexible in the type of values they store. Using the same sort of approach as Google Analytics you can store both the field name and field value with some basic details such as timestamps. This allows you to start injecting data for new fields into a dataset without any new columns added.
Such universal approaches to the data storage would allow developers to build visualizations or previews of the data that are somewhat independent of the situation being tracked. They’d need to transform the data rather than using the inbuilt CKAN preview tools, but that isn’t so hard.
Excellent work, platform and resource
This is fantastic. That is the way that the systems must work. The interface takes information from different places and displays it to the user, who is having a better experience using the tool. In the meantime the API work behind is transparent for them. In some moment the process may involve more than “displaying information” but processing some “Transactions”, I mean to affect one system based on the actions that happens in other system.
By the way, The API is using HXL in some way? and Is possible to have country level queries for those indicators, perhaps adding more variables to the API call?
You guys are doing it great 😀 Keep forward!
Thanks for the encouragement, Luis. Right now we are using the standard CKAN API, which is “blind” to the schema of any given dataset. But you can use the SQL form of the API queries to filter by any column in the dataset. Check out the documentation linked in the post for more details. As for HXL, we have thought about how we may eventually add HXL-aware storage and API functionality to the CKAN datastore. We’ll take your comment as a vote for prioritizing that work. 🙂
Hello – thank you for this wonderful resource. I’m doing a little web prototype and trying to read the data here into a JSON object (via XMLHTTPRequest) so that I can display it on the page: http://data.hdx.rwlabs.org/api/action/datastore_search?resource_id=a02903a9-022b-4047-bbb5-45127b591c85
I am receiving a no access error: “No ‘Access-Control-Allow-Origin’ header is present on the requested resource. ”
Can you please confirm that the URL is open and readable from an external resource?
Thank you!
Karen
Thank you for using HDX.
I confirm, the URL you mentioned is open and readable.
However, you might want to try directly the https version of that (we are redirecting all http to https); you may experience issue because of the redirect.
Use http://data.hdx.rwlabs.org/api/action/datastore_search?resource_id=a02903a9-022b-4047-bbb5-45127b591c85
After testing with:
curl https://data.hdx.rwlabs.org/api/action/datastore_search?resource_id=a02903a9-022b-4047-bbb5-45127b591c85 -o OUTPUT -D HEADERS
I get the resulting json in OUTPUT and the HEADERS file content is:
HTTP/1.1 200 OK
Server: nginx
Date: Fri, 30 Jan 2015 22:12:12 GMT
Content-Type: application/json;charset=utf-8
Content-Length: 5287
Pragma: no-cache
Cache-Control: no-cache
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: POST, PUT, GET, DELETE, OPTIONS
Access-Control-Allow-Headers: X-CKAN-API-KEY, Authorization, Content-Type
X-Nginx-Cache: BYPASS
X-Cache: MISS from rp-C1
X-Cache-Lookup: MISS from rp-C1:80
Connection: keep-alive
I hope this helps you.
Serban
[…] here’s another source with a few datasets: https://data.hdx.rwlabs.org/ebola edit update they just released an api http://docs.hdx.rwlabs.org/an-api-for-ebola-data/ […]
This is great – would love to see more API coverage.
Hi Maxwell, let us know (hdx.feedback@gmail.com) if you would like any specific datasets on API or if you have detailed suggestions. We are working to increase our coverage and would love to focus on what our users are interested on.