Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Author "Fang, Shuqing"

Sort by: Order: Results:

  • Fang, Shuqing (2017)
    Big data is now being utilized widely and developed rapidly. The researches on big data area is meaningful as it provides all kind of information. Answering aggregation queries are also very important in both research and commercial fields. In this paper we aim to introduce a sampling method to answer aggregation queries on realistic massive data with controlled relative error bound. We used JSON as the experiment material data which makes it different from the related and existed researches. Wikipeida records are stored as big JSON data provides the realistic data environment which makes the results meaningful and trustworthy. We utilize the Wikipedia big JSON file, process data, modify and adapt the sampling algorithm with given relative error bound. Specifically, preliminary process of big JSON file and implement retrieving interested attributes and store the filtered attributes for sampling use. Then modify the dividing buckets algorithm to divide data into buckets by their similarity and the weight of data group. Then answer the aggregation queries on the sampled data. We analyze the experiment results with the error bound, confidence and running time and the relations of error bound and sample sizes. We expect the results of error bound is under what have given so the results are reliable and the sampling method to dramatically reduce the running time and space for answering aggregation queries.