More Web Proxy on the site http://driver.im/

tutorial

PRIVAaaS: privacy approach for a distributed cloud-based data analytics platforms

Authors:

Wagner Meira, Jr.Authors Info & Claims

CCGrid '17: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

Pages 1108 - 1116

https://doi.org/10.1109/CCGRID.2017.136

Published: 14 May 2017 Publication History

Abstract

Data privacy is a key challenge that is exacerbated by Big Data storage and analytics processing requirements. Big Data and Cloud Computing are related and allow the users to access data from any device, making data privacy essential as the data sets are exposed through the web. Organizations care about data privacy as it directly affects the confidence that clients have that their personal data are safe. This paper presents a data privacy approach - PRIVAaaS - and its integration to the LEMONADE Web-based platform, developed to compose ETL (Extract, Transform, Load) process and Machine Learning workflows. The 3-level approach of PRIVAaaS, based on data anonymization policies, is implemented in a software toolkit that provides a set of libraries and tools which allows controlling and reducing data leakage in the context of Big Data processing.

References

[1]

EUBra-BIGSEA. Europe - Brazil Collaboration of Big Data Scientific Research Through Cloud-Centric Applications. Available at http://www.eubra-bigsea.eu/. Last access in January 2017.

[2]

Higher Education Information Security Council (HEISC). Guidelines for Data De-Identification or Anonymization. Available at https://spaces.internet2.edu/display/2014infosecurityguide/Guidelines+for+Data+De-Identification+or+Anonymization. Last access on july, 2016.

[3]

T. Basso, R. Matsunaga, R. Moraes and N. Antunes, "Challenges on Anonymity, Privacy, and Big Data," Workshop on Dependability in Evolving Systems at the Seventh Latin-American Symposium on Dependable Computing (LADC 2016), Cali, Colombia, 2016, pp. 164--171.

[4]

L. Sweeney. "k-anonymity: A model for protecting privacy," International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2002. vol. 10, no 5, pp. 557--570.

Digital Library

[5]

A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. "l-diversity: Privacy beyond k-anonymity," ACM Transactions on Knowledge Discovery from Data (TKDD). 2007. vol. 1, no. 1, pp. 3.

Digital Library

[6]

N. Li, T. Li and S. Venkatasubramanian. "t-closeness: Privacy beyond k-anonymity and l-diversity", 2007. 2007 IEEE 23rd International Conference on Data Engineering. pp. 106--115.

[7]

J. Cao and P. Karras. "Publishing microdata with a robust privacy guarantee," Proceedings of the VLDB Endowment. 2012. vol. 5, no. 11, pp. 1388--1399.

Digital Library

[8]

W. E Winkler. "Re-identification methods for evaluating the confidentiality of analytically valid microdata". Statistics, vol. 9, 2005.

[9]

N. Maheshwarkar, k. Pathak and V. Chourey. "Privacy Issues for K-anonymity Model," International Journal of Engineering Research and Application. 2011. vol. 1, no. 4, pp. 1857--1861.

[10]

Apache Parquet. "Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language". Available at http://parquet.apache.org/. Last access on january, 2017.

[11]

M. R. Berthold, N. Cebron, F. Dill, T.R. Gabriel, T. Kötter, T. Meinl, P. Ohl, K. Thiel and B. Wiswedel. "KNIME -the Konstanz Information Miner: Version 2.0 and Beyond", SIGKDD Explor. Newsl. 2009. vol. 11, no = 1, pp. 26--31.

Digital Library

[12]

M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann and I.H. Witten. "The WEKA Data Mining Software: An Update", SIGKDD Explor. Newsl. 2009. vol. 11, no. 1, pp. 10--18.

Digital Library

[13]

D.Janez šar, C. Tomaž, E. Aleš, G. Črt,H. Tomaž, M. Mitar, M. Martin, P. Matija Polajnar, T. Marko, S. Anže, Š. Miha, U. Lan, Ž.Lan, Ž. Jure Ž. Marinka and Z. Blaž. "Orange: Data Mining Toolbox in Python, " Journal of Machine Learning Research. 2013. vol. 14, pp. 2349--2353.

Digital Library

[14]

I. Mierswa, M. Wurst, R. Klinkenberg, M. Scholz, and T. Euler. "YALE: Rapid Prototyping for Complex Data Mining Tasks". Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,KDD '06, 2006, pp. 935--940.

Digital Library

[15]

J. U. Kietz, F. Serban,A. Bernstein, S. Fischer. "Data Mining Workflow Templates for Intelligent Discovery Assistance".Third-Generation Data Mining: Towards Service-oriented Knowledge Discovery (SoKD-10),2010, pp. 1--12.

[16]

C. Sieb, T. Meinl, M. R. Berthold. "Parallel and distributed data pipelining with KNIME". Mediterranean Journal of Computers and Networks, 2007, vol. 3, num.2, pp. 43--51.

[17]

Z. Prekopcsák, G. Makrai, T. Henk, C. Gaspar-Papanek. "Radoop: Analyzing big data with rapidminer and hadoop". Proceedings of the 2nd RapidMiner community meeting and conference (RCOMM 2011),2011, pp. 865--874.

[18]

R. Khoussainov,X. Zuo, N. Kushmerick. "Grid-enabled Weka: A Toolkit for Machine Learning on the Grid". ERCIM news, 2004, vol. 59, pp. 47--48.

[19]

D. Talia, P. Trunfio and O. Verta. "Weka4WS: A WSRF-Enabled Weka Toolkit for Distributed Data Mining on Grids,". European Conference on Principles of Data Mining and Knowledge Discovery, 2005.

[20]

Microsoft. "Microsoft Azure: Machine Learning, ". 2016. https://azure.microsoft.com/pt-pt/services/machine-learning/.urldate = 2016-12-12, note = Last Access: 2016-12-12.

[21]

J. Kranjc, R. Orač, V. Podpečsan,N. Lavrač and M. Robnik-Šikonja. "ClowdFlows: Online workflows for distributed big data mining, ". Future Generation Computer Systems. 2017. vol. 68, pp. 38--58.

[22]

C. Chen,Y. Yan, L. Huang, and X. Dong. "A scalable and productive workflow-based cloud platform for big data analytics, ". IEEE International Conference on Big Data Analysis (ICBDA). 2016. pp. 1--5.

[23]

X. Zhang, C. Liu, S. Nepal, S. Pandey and J. Chen, "A Privacy Leakage Upper Bound Constraint-Based Approach for Cost-Effective Privacy Preserving of Intermediate Data Sets in Cloud," in IEEE Transactions on Parallel and Distributed Systems, 2013, vol. 24, no. 6, pp. 1192--1202.

Digital Library

[24]

D. Yuan, Y. Yang, X. Liu, and J. Chen, "On-Demand Minimum Cost Benchmarking for Intermediate Data Set Storage in Scientific Cloud Workflow Systems, J. Parallel Distributed Computing, 2011, vol. 71, no. 2, pp. 316--332.

Digital Library

[25]

S.Y. Ko, I. Hoque, B. Cho, and I. Gupta, "Making Cloud Intermediate Data Fault-Tolerant, Proc. First ACM Symp. Cloud Computing (SoCC 10), 2010, pp. 181--192.

Digital Library

[26]

V. Ciriani, S.D.C.D. Vimercati, S. Foresti, S. Jajodia, S. Paraboschi, and P. Samarati, "Combining Fragmentation and Encryption to Protect Privacy in Data Storage, ACM Trans. Information and System Security, 2010, vol. 13, no. 3, pp. 1--33.

Digital Library

[27]

S.B. Davidson, S. Khanna, T. Milo, D. Panigrahi, and S. Roy, "Provenance Views for Module Privacy, Proc. 30th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS 11), pp. 175--186, 2011.

Digital Library

[28]

S.B. Davidson, S. Khanna, S. Roy, J. Stoyanovich, V. Tannen, andvY. Chen, "On Provenance and Privacy, Proc. 14th Intl Conf. Database Theory, pp. 3--10, 2011.

Digital Library

[29]

S.B. Davidson, S. Khanna, V. Tannen, S. Roy, Y. Chen, T. Milo, and J. Stoyanovich, "Enabling Privacy in Provenance-Aware Workflow Systems, Proc. Fifth Biennial Conf. Innovative Data Systems Research (CIDR 11), pp. 215--218, 2011.

[30]

B.C.M. Fung, K. Wang, and P.S. Yu, "Anonymizing Classification Data for Privacy Preservation, IEEE Trans. Knowledge and Data Eng., vol. 19, no. 5, pp. 711--725, May 2007.

Digital Library

[31]

R. Matsunaga, T. Basso, I. Ricarte, R. Moraes. "Towards an Onthology while solving cases in spectacular fashion". Manuscript submitted for publication, 2017.

Cited By

Silva Jde França BRubira CCavalcante EDantas FBatista TPinto G(2020)Generating Trustworthiness Adaptation Plans Based on Quality Models for Cloud PlatformsProceedings of the 14th Brazilian Symposium on Software Components, Architectures, and Reuse10.1145/3425269.3425272(141-150)Online publication date: 19-Oct-2020
https://dl.acm.org/doi/10.1145/3425269.3425272

Recommendations

An anonymization protocol for continuous and dynamic privacy-preserving data collection
Abstract
Collecting personal data without privacy breaches is important to utilize distributed microdata. Privacy-preserving data collection is anonymizing personal data within the data transmission from data holders to a data collector without ...
Highlights
- We propose a novel protocol for privacy-preserving data collection.
- We devise a ...
Privacy and anonymization as a service: PASS
DASFAA'10: Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II

The Internet and the World Wide Web democratized the means to publish and share corporate and personal data. Many anecdotes occurred over the last decades that well illustrate the danger for privacy and confidentiality. The advent of Cloud computing ...
UtilityAware: A framework for data privacy protection in e-health
Abstract
Data privacy in e-health deals with the protection of sensitive medical information that is collected, stored, and analyzed in electronic health systems. Several organizations publish sensitive person-specific data for research ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CCGrid '17: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

May 2017

1167 pages

ISBN:9781509066100

Sponsors

Publisher

IEEE Press

Publication History

Published: 14 May 2017

Check for updates

Author Tags

Qualifiers

Tutorial
Research
Refereed limited

Conference

CCGrid '17

Sponsor:

SIGARCH

CCGrid '17: 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

May 14 - 17, 2017

Madrid, Spain

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
94
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Silva Jde França BRubira CCavalcante EDantas FBatista TPinto G(2020)Generating Trustworthiness Adaptation Plans Based on Quality Models for Cloud PlatformsProceedings of the 14th Brazilian Symposium on Software Components, Architectures, and Reuse10.1145/3425269.3425272(141-150)Online publication date: 19-Oct-2020
https://dl.acm.org/doi/10.1145/3425269.3425272

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten