[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3698322.3698360acmotherconferencesArticle/Chapter ViewFull TextPublication PageseuroplopConference Proceedingsconference-collections
research-article
Open access

Patterns for Anonymization, Pseudonymization and Perturbation: Focus Group Report

Published: 10 December 2024 Publication History

Abstract

Ensuring privacy while sharing sensitive data is critical, particularly in fields such as healthcare, and everywhere compliance with data protection regulations is required. Anonymization and pseudonymization techniques are essential for preserving individual privacy but it is challenging to select the most appropriate methods given particular privacy and utility requirements. We conducted a focus group during the EuroPLoP 2024 conference that aimed to obtain feedback on patterns that we documented in this space and on a pattern map we outlined, and to identify patterns related to anonymization or pseudonymization of data that have not yet been documented. Some of the patterns we documented were not known by participants. On the other hand, we found some techniques that are potentially privacy-preserving patterns that have not yet been documented, and framed these techniques according to the category in our pattern map. Although the results suggest that our current patterns address some recurring privacy challenges, further exploration and documentation of the techniques are necessary to capture the full range of privacy-preserving solutions.

1 Introduction

Ensuring individuals’ identity protection and sensitive information is crucial when sharing data to comply with the principles of different data protection regulations. The General Data Protection Regulation (GDPR), for example, was established to guarantee that all companies that use or collect data of European citizens maintain their identities protected [8]. To fulfill this requirement, the legislation recommends using anonymization or pseudonymization techniques.
Anonymization is a process in which personal data is irreversibly changed to obtain privacy [2]. Various anonymization techniques, such as generalization, suppression, and slicing, are applied to achieve this goal. On the other hand, pseudonymization does not modify the personal data. Instead, it replaces the personal identifier with a pseudonym [3]. This pseudonym is associated with personal data and can be created using pseudonymization techniques like counter, encryption, and between others.
Preserving individual privacy is paramount in many scenarios, such as in healthcare where data contains sensitive personal information. When this kind of data needs to be shared or analyzed, it must comply with regulations to safeguard the identities of individuals. To achieve this, we can use some pseudonymization and anonymization techniques, but it remains challenging to identify and apply the most suitable methods. The choice of the appropriate techniques often depends on the project’s particularities regarding aim, scope, data privacy, or data utility prioritization, among other factors. Therefore, it is important to understand which techniques can be used in each context, especially to prioritize data privacy and utility.
Identifying and documenting patterns can bridge the gap in determining the most appropriate strategy for the context of each project. This report presents the results in terms of validation and pattern mining from the conduction of a focus group [7] during the EuroPLoP 2024 conference. We sought to achieve three main results: get feedback on the patterns that we have previously described and on the pattern map we outlined [5, 6], and identify patterns related to anonymization or pseudonymization of data that have not yet been documented.
The remainder of this report is organized as follows: Section 2 presents the methods performed during this research; Section 3 presents the results of the focus group; Section 5, discuss our findings, limitations and future works; and Section 6 summarizes the main findings of this research and discusses research directions.

2 Methods

Our study follows the method presented by Kontio et al. [4], which consists of the following steps: Defining the research problem, Selecting the participants, and Planning and conducting the focus group session. The steps are detailed below.

2.1 Defining the research problem

Our specific aim is to provide insights regarding the following three research questions:
To what extent do our documented patterns address recurring problems related to the anonymization of datasets?
To what extent does the previously designed pattern map show the main patterns related to the anonymization and pseudonymization of datasets?
What techniques that can address recurring problems related to data anonymization and pseudonymization have not yet been documented as patterns?

2.2 Selecting the participants

This focus group was held as part of the EuroPLoP 2024 conference [1]. The conference provided both participants for the activity and a slot in the program to carry it out. During the event, we announced the session, outlining its theme and objectives, and inviting conference attendees to join. In total, five participants took part in the session, in addition to the focus group moderators.

2.3 Planning and conducting the focus group session

The session is designed to have a duration of one hour and thirty minutes and to include at least three participants. The session begins with each participant introducing themselves and with the moderators introducing the context in which they needed to guarantee data privacy. This step is important to understand the context of each expert.
Next, the research team briefly presents the concepts of anonymization and pseudonymization. After the initial presentation, we ask whether the participants have used any anonymization or pseudonymization techniques in their professional practice. This question can help put into context the rest of the information gathered from the participants.
We then ask the participants to report which anonymization or pseudonymization techniques they have already used or heard about during their professional practice. This question aims to identify solutions to possible patterns used by the participants.
After gathering the cited techniques, we present and explain our pattern map and, together, attempt to match the cited techniques with the categories on the pattern map. Finally, we try to match the technique with our own patterns.

3 Results

We found that two participants used anonymization to provide (and/or receive) test data. One participant stated that she understood the concepts but thought she had never used them. However, we noticed that this participant had already used manual pseudonymization techniques to hide IDs in documents before making them available. The other two participants had heard about anonymization and pseudonymization but never used any related technique.
We also found that participants had already used or heard about the following techniques in their professional practice: Hide personal data, leave out the data from the dataset, deterministic pseudonym generation, shuffle attributes values, synthetic data keeps the attributes of real data, don’t store the data, add noise by adding attributes.
We presented our pattern map after collecting the aforementioned techniques, featuring a collection of patterns and their primary connections (cf. Figure 1). The pattern map is divided into three major categories: Anonymization, Pseudonymization, and Perturbation.
Figure 1:
Figure 1: Pattern map of our proposed patterns for anonymization, pseudonymization and perturbation.
Within the anonymization category, we included ten patterns. Furthermore, two subcategories represent the suppression technique (Suppression) and handling of outliers in the datasets (Handle Outliers). The anonymization category includes the Suppress Identifiers pattern, which can be used as an alternative to the pseudonymization category. The pseudonymization category contained five patterns. The other four pseudonymization patterns can support the Pseudonymize IDs pattern, because they are used to generate pseudonyms.
Finally, the Perturbation category is introduced as an alternative to anonymization. Instead of anonymizing the datasets, perturbation can be used to achieve privacy by adding noise to the data. These two categories complemented each other. This category includes two patterns that describe the two practices used in perturbation. We believe that this category can have more patterns to document in relation to the noise techniques used to achieve the two models: global and local differential privacy.
After presenting the pattern map, we attempt to include the cited techniques within our pattern map categories. The result is shown in Figure 2. The only technique that we could not categorize was Don´t store the data because we think this is a preventive technique for data privacy not related to anonymization or pseudonymization.
Figure 2:
Figure 2: Pattern map of our proposed patterns for anonymization, pseudonymization and perturbation, with annotations of the techniques that proposed during the focus group.
Finally, to understand whether our documented patterns address recurring problems, we attempted to match the techniques with our patterns. We found that the techniques Hide personal data and Leave out the data from the dataset are equivalent to our patterns Suppress ID and Suppress QID.

4 Limitations

Some factors may have influenced the results of this study, limiting the validity of our conclusions. In particular, the small sample size of five participants recruited within the same community is very likely not fully representative of the range of professional experiences with anonymization and pseudonymization techniques. Validity can be limited by participants’ varied understanding of the key terms. We mitigated this possibility by clarifying concepts at the beginning and during the session.

5 Discussion

The focus group results highlight varying levels of familiarity and experience with anonymization and pseudonymization techniques among the participants, revealing both practical applications and gaps in understanding. The pattern map provided valuable insights, particularly in showing how suppression and pseudonymization patterns overlap or complement each other. For instance, suppression techniques, such as Suppress Identifiers can serve as alternatives to pseudonymization, suggesting a fluid boundary between the two categories.
The pattern map categories accommodated most of the techniques elicited by the participants. This shows that although we have space for documenting and accommodating new patterns, the pattern map encompasses the most important categories of anonymization and pseudonymization patterns.
We also found that one-third of the techniques cited by participants were presented in our pattern map with different names. These results suggest that even though the literature documents some practices, further exploration is required to fully capture the range of privacy solutions used by practitioners.

6 Conclusion

We conducted a focus group with 5 participants during EuroPLoP 2024 to answer three research. The focus group results highlight varying levels of familiarity and experience with anonymization and pseudonymization techniques among the participants, revealing both practical applications and gaps in understanding.
Participants mentioned various techniques used in their professional practice and helped us in evaluating our documented patterns and pattern map. The answers to the research question are presented below.
To what extent do our documented patterns address recurring problems related to the anonymization of datasets? We found that one-third of the techniques cited by the participants were documented in our pattern map. This result indicates that our pattern mining technique successfully found recurring problems related to the anonymization of datasets.
To what extent does the previously designed pattern map show the main patterns related to the anonymization and pseudonymization of datasets? The focus group results showed that we successfully designed the most important pattern categories related to the anonymization and pseudonymization of datasets. However, further exploration is required to fully capture the range of privacy solutions used in practice.
What techniques that can address recurring problems related to data anonymization and pseudonymization have not yet been documented as patterns? We found four candidate techniques to be documented as patterns: deterministic pseudonym generation, shuffle attributes values, synthetic data keeps the attributes of real data, and add noise by adding attributes.
In future work, we plan to explore the techniques identified in this focus group session to document them as patterns. Additionally, we plan to hold new focus group sessions with a broader audience to identify other important techniques used in practice and the main recurring problems they resolve.

Acknowledgments

We would like to thank the participants in our focus group at EuroPLoP 2024 for attending the session and helping us think and discuss this topic—Tim Wellhausen, Thomas Majestrick, Luciane Adolfo, Diogo Maia, and Francisca Almeida.
This work is co-financed by Component 5 - Capitalization and Business Innovation, integrated in the Resilience Dimension of the Recovery and Resilience Plan within the scope of the Recovery and Resilience Mechanism (MRR) of the European Union (EU), framed in the Next Generation EU, for the period 2021 - 2026, within project HfPT, with reference 41.

References

[1]
2024. EuroPLoP ’24: Proceedings of the 29th European Conference on Pattern Languages of Programs (Irsee, Germany). Association for Computing Machinery, New York, NY, USA.
[2]
ENISA. 2018. Recommendations on shaping technology according to GDPR provisions - An overview on data pseudonymisation. https://www.enisa.europa.eu/publications/recommendations-on-shaping-technology-according-to-gdpr-provisions.
[3]
ENSINA. 2019. Pseudonymisation techniques and best practices. Report/Study. ESINA. https://www.enisa.europa.eu/publications/pseudonymisation-techniques-and-best-practices.
[4]
Jyrki Kontio, Laura Lehtola, and Johanna Bragge. 2004. Using the focus group method in software engineering: obtaining practitioner and user experiences. In Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE’04. IEEE, 271–280.
[5]
Mariana Monteiro, Filipe F. Correia, Paulo G. G. Queiroz, Rui J. Ramos, Dinis F. Trigo, and Gonçalo C. Gonçalves. 2024. Patterns of Data Anonymization. In Proceedings of the 29th European Conference on Pattern Languages of Programs (Irsee, Germany) (EuroPLoP ’24). Association for Computing Machinery, New York, NY, USA.
[6]
Mariana Mirra Monteiro. 2024. Patterns for anonymization and pseudonymization of datasets. Master’s thesis. Faculdade de Engenharia da Universidade do Porto.
[7]
Jane Farley Templeton. 1996. The Focus group: a strategic guide to organizing, conducting and analyzing the focus group interview. McGraw-Hill.
[8]
Ben Wolford. 2018. What is GDPR, the EU’s new data protection law? https://gdpr.eu/what-is-gdpr/.

Cited By

View all
  • (2024)Patterns of Data AnonymizationProceedings of the 29th European Conference on Pattern Languages of Programs, People, and Practices10.1145/3698322.3698337(1-9)Online publication date: 3-Jul-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroPLoP '24: Proceedings of the 29th European Conference on Pattern Languages of Programs, People, and Practices
July 2024
427 pages
ISBN:9798400716836
DOI:10.1145/3698322

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 December 2024

Check for updates

Author Tags

  1. Anonymization
  2. Design Patterns
  3. Pseudonymization
  4. Data Privacy

Qualifiers

  • Research-article

Conference

EuroPLoP 2024

Acceptance Rates

Overall Acceptance Rate 216 of 354 submissions, 61%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)105
  • Downloads (Last 6 weeks)105
Reflects downloads up to 11 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Patterns of Data AnonymizationProceedings of the 29th European Conference on Pattern Languages of Programs, People, and Practices10.1145/3698322.3698337(1-9)Online publication date: 3-Jul-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media