Prepare the Disclosure Risk Assessment
Before you start the risk assessment, it is important to explore your data. This could involve reviewing the original questionnaire and the sampling methodology, assessing the data environment and conducting exploratory analysis to understand the relationships between variables.
Key Takeaways
Start by reviewing the questionnaire.
In the case of survey data, you should review the questionnaire before starting the assessment. This will help you to understand the different variables represented in the dataset.
Have the sampling weights on hand.
Sampling weights are used to correct for the systematic differences in the selection probabilities of different respondents. If you are working with data collected through sampling, you will need the sample weights to perform a disclosure risk assessment.
Explore your data.
The first step in the risk assessment is to get to know the data you have. Applying Statistical Disclosure Control requires you to understand relationships between variables. Before jumping into the assessment, take the time to dig into those relationships.
Remove all direct identifiers from the dataset.
It is important to gather information about the survey methodology, such as strata, sampling methods, survey design and sample weights. This will be important throughout the statistical disclosure control process.
Set up your tool of choice.
At the Centre for Humanitarian Data, we use sdcMicro to perform the disclosure risk assessment. This is one of a few open-source tools that can be used to apply Statistical Disclosure Control. If this is your first time using sdcMicro, you can download the package from the Comprehensive R Archive Network.
General Questions
sdcMicro is an open source add-on package in R. It was developed by the World Bank and is one tool that can be used to assess the risk of re-identification of your data. Learn more about why we choose sdcMicro here. sdcMicro requires an understanding of the R programming language. We have developed a step-by-step tutorial that takes you through the steps required to conduct a disclosure risk assessment using sdcMicro.
Both R and sdcMicro are freely available from the CRAN (Comprehensive R Archive Network). Applying Statistical Disclosure Control using sdcMicro requires some basic knowledge statistics as well as the R programming language. R is freely available from the CRAN (Comprehensive R Archive Network) for Mac, Windows and Linux.