Improving data quality with digital data collection

james ICT4D, Methodology, Real Geek

Emily Tomkys Valteri, Alexia Pretari and Simone Lombardini share practical tips to help improve quality in survey data collection, and introduce the latest case study in the ‘Going Digital’ series.

Data collection in Zambia. Photo: Bryan/Oxfam

Data collection in Zambia. Photo: Bryan/Oxfam

Sometimes survey data doesn’t add up: two household members are married to each other and yet have different marital statuses, consent statements have not been fully read, percentages of income add up to 131%, data from the same respondents across different surveys cannot be linked up – just a few common issues you may have encountered.

In our latest paper from the Going Digital Series, Improving data quality with digital data collection, we share some of the survey features we have employed to improve the quality and the accuracy of data collected.

Oxfam’s Responsible Program Data Policy states that respondents have the right to make an informed decision as to whether they give consent to participate in the survey or any data collection exercise. It has, at times, been evident that consent forms are not receiving enough attention from enumerators at the beginning of an interview. To tackle this issue, we piloted uploading a pre-recorded audio consent statement which enumerators played before the beginning of the survey, and set up speed violations which flagged, if and at what point in the file, the audio file was skipped.

Whilst working to improve questionnaire design, Oxfam’s Impact Evaluation Advisers have conducted A/B testing of survey components testing different designs of time modules and consent forms. We also set up a range of both hard checks and soft checks which flag potential errors in the survey, allowing for follow-up with specific enumerators and re-training if needed.

These are some of the examples described in the paper, alongside the corresponding coding in SurveyCTO, a mobile survey software, and Stata, a statistical analysis software. Adding these types of features has positive implications for the quality of the data, and therefore the validity of the study. It also helps reduce the amount of time and resources spent on data cleaning which can instead be invested in other activities, for example sharing results with communities (see Going Digital: Using and sharing real-time data during fieldwork). 

Technology alone is not the solution, quality checks ultimately rely on survey supervisors and managers having enough time

Technology alone is not the solution, quality checks ultimately rely on survey supervisors and managers having enough time, expertise and resources to follow-up with enumerators and respondents, and invest time in the re-training of enumerators if needed. We therefore suggest planning which checks are a priority for the study validity, and carefully identifying a line between adding value and creating an extra burden for the survey team and the respondents.

Quality checks rarely replace human interaction, and discrepancies in the data should be investigated as a team.

Using digital data collection technologies responsibly has the potential to increase data quality, even when resources are limited. Our hope in releasing this paper and making these simple examples available alongside SurveyCTO and Stata coding, is that our colleagues will build on it, leading ultimately to greater knowledge, learning and impact.

Download “Improving data quality with digital data collection”

Jola Miziniak


Katie Whitehouse


Annie Kelly