Manage Data Responsibly
Knowing the disclosure risk helps you make informed decisions about whether and how to share the data. Because we want to bias toward sharing data responsibly, it is important to consider options that will allow for you to share the data in a way that protects the individuals in the dataset as opposed to simply not sharing the data at all. Watch the video to learn more about your options for managing microdata responsibly.
Key Takeaways
Use disclosure control techniques to reduce the risk of disclosure.
Disclosure control techniques are either non-perturbative or perturbative. Non-perturbative methods preserve the integrity of the data but limit the disclosure risk by reducing the detail in the microdata. These methods include local suppression, recoding, and eliminating variables. Through local suppression, individual values are suppressed and replaced with missing values (NA) whereas, with global recoding, the number of distinct values for a given variable is reduced by creating intervals. Perturbative methods, on the other hand, alter values and limit disclosure risk by creating uncertainty around what the true values are.
Navigate the trade-off between disclosure risk and data utility.
The optimum trade-off between risk and utility in the statistical disclosure process depends greatly on who the users are and the conditions under which the microdata is shared. The application of disclosure control techniques will always result in the loss of information. After applying SDC, you need to quantify the information loss in order to determine if there is still value in sharing the data. Otherwise, it may be necessary to reverse course and find other methods for sharing the data.
Find other ways to share your data responsibly.
If the disclosure risk or information loss after applying SDC is too high, there are still options for sharing the data. For example, you share only the metadata on HDX via HDXConnect. This option allows you to let users know that the data exists and is available ‘by request’. Once users request access, you decide whether and how to share it. Alternatively, you could decide to share the data with trusted partners under strict terms and conditions defined in a data sharing agreement or information sharing protocol.
General Questions
There is a trade-off between risk of disclosure and data utility. Through the SDC process, the goal is to minimise the disclosure risk and maximise the data utility. Data utility is a measure of how useful and valid your data is following Statistical Disclosure Control. The reduction in data utility should be evaluated, whenever possible, with respect to the intended uses of the data. However, because it is not possible to imagine all possible uses of the data, you can also quantify the information loss following the application of SDC.
HDX allows microdata to be shared publicly through the site. However, to protect individuals and vulnerable groups, our team runs a disclosure risk assessment on any resource containing microdata. Once you have successfully uploaded the resource, our team will review it to better understand the likelihood of a disclosure taking place. We notify the contributor within 24 hours if any risk has been detected and then we work with them to make a decision together about whether and how it should be shared on HDX.