[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/CCGRID.2017.136acmconferencesArticle/Chapter ViewAbstractPublication PagesccgridConference Proceedingsconference-collections
tutorial

PRIVAaaS: privacy approach for a distributed cloud-based data analytics platforms

Published: 14 May 2017 Publication History

Abstract

Data privacy is a key challenge that is exacerbated by Big Data storage and analytics processing requirements. Big Data and Cloud Computing are related and allow the users to access data from any device, making data privacy essential as the data sets are exposed through the web. Organizations care about data privacy as it directly affects the confidence that clients have that their personal data are safe. This paper presents a data privacy approach - PRIVAaaS - and its integration to the LEMONADE Web-based platform, developed to compose ETL (Extract, Transform, Load) process and Machine Learning workflows. The 3-level approach of PRIVAaaS, based on data anonymization policies, is implemented in a software toolkit that provides a set of libraries and tools which allows controlling and reducing data leakage in the context of Big Data processing.

References

[1]
EUBra-BIGSEA. Europe - Brazil Collaboration of Big Data Scientific Research Through Cloud-Centric Applications. Available at http://www.eubra-bigsea.eu/. Last access in January 2017.
[2]
Higher Education Information Security Council (HEISC). Guidelines for Data De-Identification or Anonymization. Available at https://spaces.internet2.edu/display/2014infosecurityguide/Guidelines+for+Data+De-Identification+or+Anonymization. Last access on july, 2016.
[3]
T. Basso, R. Matsunaga, R. Moraes and N. Antunes, "Challenges on Anonymity, Privacy, and Big Data," Workshop on Dependability in Evolving Systems at the Seventh Latin-American Symposium on Dependable Computing (LADC 2016), Cali, Colombia, 2016, pp. 164--171.
[4]
L. Sweeney. "k-anonymity: A model for protecting privacy," International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2002. vol. 10, no 5, pp. 557--570.
[5]
A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. "l-diversity: Privacy beyond k-anonymity," ACM Transactions on Knowledge Discovery from Data (TKDD). 2007. vol. 1, no. 1, pp. 3.
[6]
N. Li, T. Li and S. Venkatasubramanian. "t-closeness: Privacy beyond k-anonymity and l-diversity", 2007. 2007 IEEE 23rd International Conference on Data Engineering. pp. 106--115.
[7]
J. Cao and P. Karras. "Publishing microdata with a robust privacy guarantee," Proceedings of the VLDB Endowment. 2012. vol. 5, no. 11, pp. 1388--1399.
[8]
W. E Winkler. "Re-identification methods for evaluating the confidentiality of analytically valid microdata". Statistics, vol. 9, 2005.
[9]
N. Maheshwarkar, k. Pathak and V. Chourey. "Privacy Issues for K-anonymity Model," International Journal of Engineering Research and Application. 2011. vol. 1, no. 4, pp. 1857--1861.
[10]
Apache Parquet. "Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language". Available at http://parquet.apache.org/. Last access on january, 2017.
[11]
M. R. Berthold, N. Cebron, F. Dill, T.R. Gabriel, T. Kötter, T. Meinl, P. Ohl, K. Thiel and B. Wiswedel. "KNIME -the Konstanz Information Miner: Version 2.0 and Beyond", SIGKDD Explor. Newsl. 2009. vol. 11, no = 1, pp. 26--31.
[12]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann and I.H. Witten. "The WEKA Data Mining Software: An Update", SIGKDD Explor. Newsl. 2009. vol. 11, no. 1, pp. 10--18.
[13]
D.Janez šar, C. Tomaž, E. Aleš, G. Črt,H. Tomaž, M. Mitar, M. Martin, P. Matija Polajnar, T. Marko, S. Anže, Š. Miha, U. Lan, Ž.Lan, Ž. Jure Ž. Marinka and Z. Blaž. "Orange: Data Mining Toolbox in Python, " Journal of Machine Learning Research. 2013. vol. 14, pp. 2349--2353.
[14]
I. Mierswa, M. Wurst, R. Klinkenberg, M. Scholz, and T. Euler. "YALE: Rapid Prototyping for Complex Data Mining Tasks". Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,KDD '06, 2006, pp. 935--940.
[15]
J. U. Kietz, F. Serban,A. Bernstein, S. Fischer. "Data Mining Workflow Templates for Intelligent Discovery Assistance".Third-Generation Data Mining: Towards Service-oriented Knowledge Discovery (SoKD-10),2010, pp. 1--12.
[16]
C. Sieb, T. Meinl, M. R. Berthold. "Parallel and distributed data pipelining with KNIME". Mediterranean Journal of Computers and Networks, 2007, vol. 3, num.2, pp. 43--51.
[17]
Z. Prekopcsák, G. Makrai, T. Henk, C. Gaspar-Papanek. "Radoop: Analyzing big data with rapidminer and hadoop". Proceedings of the 2nd RapidMiner community meeting and conference (RCOMM 2011),2011, pp. 865--874.
[18]
R. Khoussainov,X. Zuo, N. Kushmerick. "Grid-enabled Weka: A Toolkit for Machine Learning on the Grid". ERCIM news, 2004, vol. 59, pp. 47--48.
[19]
D. Talia, P. Trunfio and O. Verta. "Weka4WS: A WSRF-Enabled Weka Toolkit for Distributed Data Mining on Grids,". European Conference on Principles of Data Mining and Knowledge Discovery, 2005.
[20]
Microsoft. "Microsoft Azure: Machine Learning, ". 2016. https://azure.microsoft.com/pt-pt/services/machine-learning/.urldate = 2016-12-12, note = Last Access: 2016-12-12.
[21]
J. Kranjc, R. Orač, V. Podpečsan,N. Lavrač and M. Robnik-Šikonja. "ClowdFlows: Online workflows for distributed big data mining, ". Future Generation Computer Systems. 2017. vol. 68, pp. 38--58.
[22]
C. Chen,Y. Yan, L. Huang, and X. Dong. "A scalable and productive workflow-based cloud platform for big data analytics, ". IEEE International Conference on Big Data Analysis (ICBDA). 2016. pp. 1--5.
[23]
X. Zhang, C. Liu, S. Nepal, S. Pandey and J. Chen, "A Privacy Leakage Upper Bound Constraint-Based Approach for Cost-Effective Privacy Preserving of Intermediate Data Sets in Cloud," in IEEE Transactions on Parallel and Distributed Systems, 2013, vol. 24, no. 6, pp. 1192--1202.
[24]
D. Yuan, Y. Yang, X. Liu, and J. Chen, "On-Demand Minimum Cost Benchmarking for Intermediate Data Set Storage in Scientific Cloud Workflow Systems, J. Parallel Distributed Computing, 2011, vol. 71, no. 2, pp. 316--332.
[25]
S.Y. Ko, I. Hoque, B. Cho, and I. Gupta, "Making Cloud Intermediate Data Fault-Tolerant, Proc. First ACM Symp. Cloud Computing (SoCC 10), 2010, pp. 181--192.
[26]
V. Ciriani, S.D.C.D. Vimercati, S. Foresti, S. Jajodia, S. Paraboschi, and P. Samarati, "Combining Fragmentation and Encryption to Protect Privacy in Data Storage, ACM Trans. Information and System Security, 2010, vol. 13, no. 3, pp. 1--33.
[27]
S.B. Davidson, S. Khanna, T. Milo, D. Panigrahi, and S. Roy, "Provenance Views for Module Privacy, Proc. 30th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS 11), pp. 175--186, 2011.
[28]
S.B. Davidson, S. Khanna, S. Roy, J. Stoyanovich, V. Tannen, andvY. Chen, "On Provenance and Privacy, Proc. 14th Intl Conf. Database Theory, pp. 3--10, 2011.
[29]
S.B. Davidson, S. Khanna, V. Tannen, S. Roy, Y. Chen, T. Milo, and J. Stoyanovich, "Enabling Privacy in Provenance-Aware Workflow Systems, Proc. Fifth Biennial Conf. Innovative Data Systems Research (CIDR 11), pp. 215--218, 2011.
[30]
B.C.M. Fung, K. Wang, and P.S. Yu, "Anonymizing Classification Data for Privacy Preservation, IEEE Trans. Knowledge and Data Eng., vol. 19, no. 5, pp. 711--725, May 2007.
[31]
R. Matsunaga, T. Basso, I. Ricarte, R. Moraes. "Towards an Onthology while solving cases in spectacular fashion". Manuscript submitted for publication, 2017.

Cited By

View all
  • (2020)Generating Trustworthiness Adaptation Plans Based on Quality Models for Cloud PlatformsProceedings of the 14th Brazilian Symposium on Software Components, Architectures, and Reuse10.1145/3425269.3425272(141-150)Online publication date: 19-Oct-2020

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CCGrid '17: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
May 2017
1167 pages
ISBN:9781509066100

Sponsors

Publisher

IEEE Press

Publication History

Published: 14 May 2017

Check for updates

Author Tags

  1. LEMONADE
  2. anonymization
  3. cloud-based data analytics platform
  4. data privacy

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

CCGrid '17
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Generating Trustworthiness Adaptation Plans Based on Quality Models for Cloud PlatformsProceedings of the 14th Brazilian Symposium on Software Components, Architectures, and Reuse10.1145/3425269.3425272(141-150)Online publication date: 19-Oct-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media