Big data creates variety of business possibilities and helps to gain competitive advantage through predictions, optimization and adaptability. Impact of errors or inconsistencies across the different sources, from where the data is originated and how frequently data is acquired is not considered in much of the big data analysis.
This thesis examines big data quality challenges in the context of business analytics. The intent of the thesis is to improve the knowledge of big data quality issues and testing big data.
Most of the quality challenges are related to understanding the data, coping with messy source data and interpreting analytical results. Producing analytics requires subjective decisions along the analysis pipeline and analytical results may not lead to objective truth. Errors in big data are not corrected like in traditional data, instead the focus of testing is moved towards process oriented validation.