GuardSpark++: Fine-grained purpose-aware access control for secure data sharing and analysis in Spark
Jan 1, 2020ยท,,,,,,,,ยท
0 min read
Tao Xue
Yu Wen
Bo Luo
Boyang Zhang
Yang Zheng
Yanfei Hu
Yingjiu Li
Gang Li
Dan Meng
Abstract
With the development of computing and communication technologies, extremely large amount of data has been collected, stored, utilized, and shared, while new security and privacy challenges arise. Existing platforms do not provide flexible and practical access control mechanisms for big data analytics applications. In this paper, we present GuardSpark++, a fine-grained access control mechanism for secure data sharing and analysis in Spark. In particular, we first propose a purpose-aware access control (PAAC) model, which introduces new concepts of data processing/operation purposes to conventional purpose-based access control. An automatic purpose analysis algorithm is developed to identify purposes from data analytics operations and queries, so that access control could be enforced accordingly. Moreover, we develop an access control mechanism in Spark Catalyst, which provides unified PAAC enforcement for heterogeneous data sources and upper-layer applications. We evaluate GuardSpark++ with five data sources and four structured data analytics engines in Spark. The experimental results show that GuardSpark++ provides effective access control functionalities with a very small performance overhead (average 3.97%).
Type
Publication
Proceedings of the 36th Annual Computer Security Applications Conference (ACSAC), 2020. (Accepted, CCF-B)