SparkAC: Fine-Grained Access Control in Spark for Secure Data Sharing and Analytics
May 22, 2022ยท,,,,,,,,ยท
0 min read
Tao Xue
Yu Wen
Bo Luo
Gang Li
Yingjiu Li
Boyang Zhang
Yang Zheng
Yanfei Hu
Dan Meng
Abstract
With the development of computing and communication technologies, an extremely large amount of data has been collected, stored, utilized, and shared, while new security and privacy challenges arise. Existing access control mechanisms provided by big data platforms have limitations in granularity and expressiveness. In this article, we present SparkAC, a novel access control mechanism for secure data sharing and analysis in Spark. In particular, we first propose a purpose-aware access control (PAAC) model, which introduces new concepts of data processing purpose and data operation purposeand an automatic purpose analysis algorithm that identifies purposes from data analytics operations and queries. Moreover, we develop a unified access control mechanism that implements PAAC model in two modules. GuardSpark++ supports structured data access control in Spark Catalyst and GuardDAG supports unstructured data access control in Spark core. Finally, we evaluate GuardSpark++ and GuardDAG with multiple data sources, applications, and data analytics engines. Experimental results show that SparkAC provides effective access control functionalities with very small (GuardSpark++) or medium (GuardDAG) performance overhead.
Type
Publication
IEEE Transactions on Dependable and Secure Computing (TDSC), 2022. (Accepted, CCF-A)