1 Introduction
The past few years have seen a push to ‘shift privacy to the left’
which indicates increasing incorporation of Privacy by Design [
10] to address user privacy
issues early in the product development lifecycle. Part of that push is implementing efficient security and privacy threat modeling workflows. Threat modeling is an iterative process of analyzing an abstracted version of a system to identify weakness or flaws in an application’s architecture, e.g., by analyzing data flows. Frameworks like STRIDE [
27] and PASTA [
51] have been operationalized to detect security weaknesses and help developers build security into products. A similar push for privacy has inspired research in privacy threat modeling frameworks. The foremost of these frameworks is LINDDUN [
53] framed in terms of privacy threat mechanisms. As organizations begin to adopt privacy threat modeling
processes based on these frameworks, they are
beginning to identify operational challenges. Some of these challenges are the following:
(1)
These frameworks often provide an incomplete picture of privacy threats. Many privacy threat frameworks are either attacker-centric, vulnerability-centric, or asset-centric, and context of a privacy threat as a whole remains hidden.
(2)
Frameworks like LINDDUN [
53] help address privacy threats of an application overall - but it is difficult to distinguish if the harm caused is
by the application towards individuals or
to the application directed towards an organization.
(3)
While adversarial categories of threat actors have been used to map security threats [45], privacy threats are not necessarily caused by malicious actors. A significant portion of privacy threats come from “benign” actors (as we see in Section 5), who do not intend to cause harm but may use an application in a way that is privacy invasive to others. For instance, retailers can provide targeted advertisements to consumers based on their purchase history, but these advertisements might inadvertently reveal sensitive details about them, like whether they are pregnant or not [23]. Adversarial threat models may miss benign actors as potential threat actors. (4)
Even when such frameworks provide concrete directions, they are too expansive in nature and broadly framed, which makes it time-consuming for developers to use [
49].
(5)
Operationalizing privacy threat frameworks need high levels of domain-knowledge expertise which developers without an extensive background in privacy may not have. This makes them unwilling to use such frameworks as an additional step in their development process.
All of these factors make it difficult for the framework implementations to scale beyond a few systems. What we need is a tool that is developer-centric that can be used by privacy non-experts to make applications more privacy friendly. Consequently, it would be easy to implement by developers who would not need privacy domain knowledge but can rely on existing common vocabulary in privacy. Such a tool should be able to provide a more comprehensive picture of a privacy threat and resulting harm. This would make it easier to reproduce for similar applications and simplify development of mitigation measures to address these threats.
In the past few decades, core HCI methods have been implemented successfully to improve security and privacy tools. One of these various useful user research methods has been personas [
39], widely used by designers to elicit concrete software use cases. Personas have been known to improve software usability by attempting to emulate the user - their motivations, skills, and goals [
25]. We use the same concept of personas but with a change - instead of modeling intended end users of systems, we apply personas as privacy threat actors who can break privacy in an application either intentionally or unintentionally. Furthermore, we provide additional context using established threat mechanism categories and privacy harms to situate the personas.
In this paper we build on existing work to present a personas-based approach to privacy threat modeling. Personas have been shown to improve user interaction with interfaces [
11] and identify edge users facing disproportionate harm who the designers might not have thought of in their process [
35]. The use of personas in threat modeling, consequently, can help identify threats that can harm
users, especially vulnerable groups
(we define vulnerable groups as “population groups... which have specific characteristics...” that put them at a higher risk of harm [31]. In the context of this work, such groups would be at a higher risk of privacy harms. These characteristics can be age, gender, race, economic status, among others which cause vulnerable groups to be at higher risk of privacy threats, like heightened surveillance. [47]).
Our current work follows a similar approach for privacy threat modeling. The general principle is that personas help think about the user in context and consequently, gather deeper insight into the background and behavior of the user. For example, Mead et al. identified Persona non-Gratae(PnGs) for smart drones and repurposed them to represent a security threat actor instead of an end user [
33] by amalgamating known security threat modeling approaches like STRIDE [
27] and PASTA [
51]. This fictional modeling of user behavior can guide developers in thinking about different threats more effectively. By looking at threat actors in terms of personas, acting on specific threats, and generating specific impacts, threat modelers can easily reference privacy threat scenarios. In this paper, we make three contributions:
(1)
We present a derived privacy threat framework called Models of Privacy (MAP) consisting of threat actor type, expertise, mechanism, and impact, by re-purposing personas as threat actors instead of users. This framework, built on the foundation of existing privacy threat modeling literature, is customizable, reproducible, and easy to implement by privacy non-experts. It can be used to create privacy threat personas during threat modeling sessions at the software requirement gathering stage. Threat modelers have the opportunity to add and customize threats according to their use case.
(2)
We present a fictional case study of how to operationalize these privacy personas in a communications organization, showing examples of how personas can describe different privacy threats, both malicious and benign. This case study shows a novel application of personas to privacy threat elicitation.
(3)
Finally, we check this framework against a known set of historical data breaches caused by privacy violations to demonstrate that the framework is extensive enough to cover over ninety percent of privacy threat vectors. Thus, we show that MAP can be used to categorize privacy threats to an application by presenting a use case of categorizing previous privacy breaches.
The paper is organized as follows. In Section 2 we discuss relevant literature in privacy, design, and threat modeling in the area. Sections 3 and 4 outline the MAP framework and discuss an example of operationalizing MAP using personas for a communications organization, respectively. In Section 5, we check applicability of MAP using a database of previous privacy breaches. Section 6 addresses scaling the framework for practical use and limitations of our approach. Finally, we close in Section 7 with a discussion of relevance of this work. 3 Framework Development
In this section, we propose a privacy threat modeling framework called MAP. The goal of this framework is to provide a scalable and customizable solution that addresses privacy threats engendered from both malicious actors and benign entities. We can then use personas to leverage this framework for specific applications. The key features of MAP are the following:
•
Flexible: The framework structure makes it easier to add and delete categories as required. For instance, it is easy to add an expert sub-type to a threat actor category if needed. Similarly, if there are categories in threat mechanisms that are not applicable, for example, disclosure of information, it can be removed with changing the structure of the framework.
•
Scalable: The components of the framework form a piece-wise architecture. Thus, it is easy for developers to pick a sub-type from each category and create a persona. If a template is created based on each category, a developer can select one item from each category to automatically generate a persona if desired.
•
Customizable: The framework is independent of industry type and scale. This makes the generated personas easy to customize. For instance, if a developer or a threat modeler selects one item from each of the three component categories, they can customize the resulting persona based on the nature of their industry.
•
Moving away from an attacker-only approach: Literature on security personas have majorly focused on adversarial personas since a majority of security threats originate from attackers. However, there are both malicious and benign threat actors who might cause privacy threats. Typically, when we think of threat personas, they are malicious. However, in several cases, the threat actor might be benign - they do not know that what they are doing is a privacy harm. In such instances, the threat actor is not deemed malicious for the organization and may slip through the cracks in traditional threat modeling since it is not an attack on data or a system, but on a person. MAP addresses both kinds of threat actors across different scenarios.
In terms of structure, the framework has three main components: (i) Threat Actor (characteristics and expertise), (ii) Threat Mechanism, and (iii) Threat Impact. The overall framework structure is shown in Figure
1. In the following sections, we detail the different components that constitute the framework.
Our central approach to development of this framework follows the Cause-and-Effect model of risk analysis [
38]. Better known as the “Fishbone Model”, this model helps us looks at
both the cause of an event
and the caused impact. Thus, we follow a holistic approach to understanding the threat - (i)
who caused it (threat actor), (ii)
how it was caused (threat mechanism), and (iii)
what resulted from it (threat impact).
3.1 Threat Actor
The most important component of MAP is the user. Since our ‘user’ is a privacy threat actor, we needed to establish motivation, skills, and goals. For motivation and goals, we decided on two broad categories, neutral and aggressive, recognizing that threat actors in privacy can also come from people who do not intend to cause harm to the organization. Furthermore, we had two more categories, inside and outside threat actors, depending on whether they are able to access an organization’s restricted systems. Thus, our four major threat actor characteristics were - inside aggressive, inside neutral, outside aggressive, and outside neutral. The definitions for each of these levels are given below:
•
Inside Aggressive: Present inside the organization, intends to cause privacy harm
•
Inside Neutral: Present inside the organization, does not intend to cause privacy harm
•
Outside Aggressive: Present outside the organization, intends to cause privacy harm
•
Outside Neutral: Present outside the organization, does not intend to cause privacy harm
In order to account for skills, we divided threat actors into experts and non-experts. We defined expertise as the ability to use ICT (Information and Communication Technology). ICT expertise was defined as having at least five years of experience either studying or working in Computer Science, Computer Networking, Information Technology, or a related field. This is based on the common simplified definition of expert and non-expert users in security literature [
24,
55]. The definitions for each of these levels are given in Table
1.
In each of these four categories, we identified several sub-types of threat actors that fall into each of these combination of characteristics and expertise. Table
2 enumerates ten such sub-types. The sub-types have been derived from CSAN’s threat actor typology developed in 2016 [
8]. Note that ‘related entities’ are considered as inside neutral. While they are technically outside an organization, for example third parties like vendors, suppliers, etc., they are considered inside because of their relationship with the organization and because they have access to organizational information that unrelated entities would not have. Similarly, employees considered as ‘non-expert’ are employees who do not actively use their technical skills to cause intentional privacy harm. If they did, they would be considered as ‘inside attackers’. Additionally, if customers were ‘experts’ causing privacy harm, they would fall into the ‘individual hacker’ sub-type.
As evident, these sub-types are not all encompassing and more sub-types can be added as necessary as more threat actors are determined.
3.2 Threat Mechanism
The second component of
MAP is threat mechanism. The generic risk model proposed in NIST 800-30,
NIST Special Publication 800-30 Revision 1, Guide for Conducting Risk Assessments [
4] provides a standard approach to conducting risk assessments, consisting of (i) threat source, (ii) threat event, (iii) vulnerability, and (iv) impact. Since we have already accounted for threat source, and threat impact as different categories in MAP, we decided to combine threat event and vulnerability. This was because we did not find a catalogue of privacy vulnerabilities like that for security (the CVE database [
57]). This single category was renamed “threat mechanism”. This category was a privacy threat library developed by combining the privacy threat categories from the LINDDUN framework [
53] and NIST Privacy Risk Assessment Methodology (PRAM) catalog of problematic data actions list [
22]. Please note that both are non-exhaustive lists of threat mechanisms, and the list can be expanded based on new threats as they are discovered.
The LINDDUN framework had the following threat categories: (i) Linkability, (ii) Identifiability, (iii) Non-repudiation, (iv) Detectability, (v) Disclosure of Information, (vi) Unawareness, and (vii) Non-compliance. Similarly, the NIST PRAM had the following problematic data actions: (i) Appropriation, (ii) Distortion, (iii) Induced Disclosure, (iv) Insecurity, (v) Re-identification, (vi) Stigmatization, (vii) Surveillance, (viii) Unanticipated Revelation, (ix) Unwarranted Restrictions. We kept all the threat categories from the LINDDUN framework. We excluded all the problematic data action categories from PRAM except distortion, stigmatization, and unanticipated revelation, since they are already accounted for in the LINDDUN framework and libraries developed based on the LINDDUN framework [
50]. There are additional, more granular threat libraries [
5] that can be integrated into the threat mechanism categories as appropriate for particular use cases. The resulting threat library had the following 10 threat categories as listed in Table
3 (along with their definitions). An important thing to note here is that we consider only threats which are already classified as privacy threats and are in addition to security threats. However, the category ‘Disclosure of Information’ is significantly accounted for in security threat modeling due to its data-centric description in LINDDUN. In fact, LINDDUN [
53] mentions that this category should have an extended process in security threat modeling. However, we have not removed this category to maintain the link between security and privacy threat frameworks. Organizations can decide whether to include or exclude this category depending on their use case.
3.3 Threat Impact
The third category in the personas framework is “threat impact”. Threat impact categories are well-documented in Citron and Solove’s article on taxonomy of privacy harms [
13]. However, these privacy harms could happen to both individuals and organizations. We replicated a definition for each of the seven privacy harms for individuals as well as organizations. We found that for organizations, only three of the seven harms were applicable - physical, economic, and reputational. Other types of harm like discrimination, relationship, autonomy, and psychological were limited to only individuals. We repurposed the definition of physical harm for organizations as damage to assets, operations, or critical infrastructure. We also redefined reputational harm for organizations as loss of trust from both customers and other business partners. Thus, we have a total of ten privacy harms. We denoted organizational harms with a suffix ‘-O’.
We further divided the privacy harms as explicit and implicit, based on if the harm is measurable or not. This distinction not only helps us create a more holistic framework but also would help developers focus on which category of impact is applicable for their system. This distinction is consistent with how courts approach privacy harms as ‘showing substantial damage’ [
9]. Measurable forms of harms like physical and economic are categorized as ‘explicit’, while other types of harm are categorized as implicit. The threat impact categories, along with their sub-types and definitions are present in Table
4.
Note that a single privacy threat incident could have multiple privacy impacts. For example, a Health Insurance Portability and Accountability Act (HIPAA) [
3] breach due to linkability between two data sets, can cause harm to individuals as well as the organization involved in the breach.
Thus, the final framework
would consist of three components in its expanded form as shown in Figure
2. As a persona tool, each of these broad categories (four for threat actors, ten for threat mechanisms, and four for threat impact) would be available as a drop-down menu for developers to select from.
Consequently, we present a unified scalable framework for privacy threat modeling and operationalize it using personas.4 Operationalizing the Personas in A Communication Organization
In this section, we demonstrate how personas can be created based on the framework in Section
3.
We leverage developers’ existing familiarity with personas to provide them with an understanding of privacy threats. We have four broad categories of threat actors, ten threat mechanisms, and four threat impacts. This gives us a total of 160 personas. As mentioned before, depending on additional threat actors or mechanisms discovered, this list can be expanded. We then situate these personas for our use case.
The step-by-step process of creating a persona is shown in Figure 3 and described as follows. These are the steps which developers would follow to create threat personas for their application. They can consult with threat modelers to find additional number of personas in a second iteration.Step 1: Review the framework. The first step would be to understand the different components in the framework and what the different threats mean. This is especially important for developers new to privacy threat modeling.
Step 2: Select relevant categories. For each of the three components, developers select sub-types of threats are relevant to the application they are building. For example, they can choose to consider threat actors that are of “Outside Neutral” and “Outside Aggressive” sub-types only.
Step 3: Select one category from each of the three components. Personas work as a combination of threat actors, mechanisms, and impact. A developer can choose a category from each of the three components - one actor, one mechanism, and one impact respectively. For example, she can choose “Employee” as threat actor, “Identifiability” as threat mechanism, and “Economic-Individual”. The next step would be to create a persona based on this combination.
Step 4: Create personas. Each combination of three categories will provide a situation for a persona. Following the example from the previous step, it could be “Employee (Threat Actor) of Company X can view other employees’ credit card information through application Y (Threat Mechanism). They can use this information to make unauthorized purchases (Threat Impact).” One can try other combinations with the same threat actor, and changing out the mechanisms, and impact.
Step 5: Apply personas. Multiple personas can be relevant for an application, all of which need to be listed for a complete picture of potential privacy threats. After creating different personas, they have to be crafted to fit a specific application. Continuing from the previous example, “Employee Banana of Company Telco can view credit card information of other employees in plain text. They can use this information to make unauthorized purchases.” It is then up to the development team to record this persona, communicate this to relevant teams, and come up with a plan to handle this potential threat.
Hypothetical Example
Let us consider a hypothetical use case of a communications organization Telco. Telco is a small communications company working as an internet service provider. Telco has a new division on smart devices and they have been releasing a couple of new devices on their new Telex platform. Telco has a team of threat modelers and developers working together to make sure this product is privacy-preserving.
We now consider MAP to see which components would be applicable for Telco’s use case (the newly developed Telex platform). In Table
5, we list the categories that are applicable for Telex. The categories from the example threat personas are highlighted in bold. We assume that Telco already has a threat modeling team that models security attackers like cybercriminals, individual hackers, and nation state actors. We exclude these actors from possible threat actors for illustration purposes. Furthermore, we assume that Telco is interested in privacy impact to the organization and some impact categories for individuals. They do keep autonomy because a large number of impacts are autonomy harm to individuals (from our validation in Section
5). As for threat mechanisms, Telex already implements mitigation strategies for linkability (for example, access control strategies and segregation) and non-repudiation (for example, maintaining logs and data access requests). Given that threat modeling is an iterative process at Telco, in the first round of threat modeling at the development stage, Telex developers
let the Telco user safety team handle distortion and stigmatization threats. Thus, as shown in Table
5, there are four threat actors (seven sub-types), six threat mechanisms, and four threat impacts (six sub-types). This gave us a total of 96 possible personas.
We listed out all possible combinations of threat actors, threat mechanisms, and threat impact for the 96 personas. We then tested their applicability for Telco’s use case. We found 84 personas which were applicable, six possibly applicable, and six not applicable. Note that not all 96 personas are necessarily needed for an application at Telco - this is simply the maximum number of personas that can be possibly generated. It is up to the needs of the application to scale up or down the number of personas based on applicability and risk. For example, Telco might want to look at only the top three from each component; they would need less number of personas in that case. Furthermore, it would be up to Telco’s cybersecurity or privacy team to decide which personas to include or exclude as they conduct this exercise for their application.
In the following sections, we describe four of the 84 applicable personas. These are hypothetical personas modeled after actual privacy breaches to provide the reader more context, but have been reframed to fit Telco’s narrative. In practice, Telco’s cybersecurity team would conduct brainstorming sessions to walk developers through these combinations for each threat actor persona and test if they are applicable. Whether these threats have been handled or not should ideally be documented to note the privacy posture of the application.
A noticeable aspect of the personas described as follows would be the usage of fruit names instead of people as threat actors. While personas are typically given attributes (name and characteristics) of that of a person, we refrained from doing so in order to limit projecting descriptive biases. For instance, we did not want to make ‘hacker’ personas to be of a specific gender or race.
Furthermore, such attributes have been found to be distracting and misleading instead of adding meaning to a persona [32]. Developers may choose to use different attributes for threat actors, but we have chosen fruit personas for illustration purposes.
We also provide a ‘persona card’ created using Figma which developers can use to map privacy threats.
4.1 Persona 1: Outside Neutral (Expert) + Unanticipated Revelation + Implicit-O
Telex is a new application for its smart home devices for users to measure their cycling distance and create a route map on the app based on it. Security researchers find that this application unexpectedly discloses information about location of military bases [
7] and poses a high security risk. The disclosure can lead to wide press coverage about Telex.
A privacy threat such as this could be described by the following MAP categories.4.1.1 Threat Actor: Outside Neutral, Expert (Researcher).
Coconut is a security researcher at Neverland University. Neverland University is Telco’s research partner at their headquarters in Hawaii. Coconut has a doctoral degree in Computer Science, specializes in location privacy and looks at how user location can be protected across Telco’s products. They are looking at one of the applications made by Telex, Telco’s new offering that helps fitness enthusiasts track their route while jogging, running, or cycling.
4.1.2 Threat Mechanism: Unanticipated Revelation.
The United States armed forces contracted Telco to provide its services to measure jogging duration for its personnel. However, the location map generated by Telex is available publicly on their website so that people can see their route data and share with others. By observing a heat map of the routes on the website for a specific area, anyone on the web can potentially view the location of military bases, which has a high density of such data.
4.1.3 Threat Impact: Implicit-O - Reputational (Organization).
When Coconut finds this, they can release it to the press instead of informing Telco so that people can know about it immediately. It can lead to loss of trust between the contractors for the armed forces and Telco since they were not informed about the public availability of data from Telex and it is a national security risk for them. Furthermore, this causes harm to Telco’s public reputation. Note that it also causes distress to users, which can be recorded as an additional impact.
This persona is described using the card in Figure
4.
4.2 Persona 2: Outside Aggressive (Expert) + Disclosure of Information + Explicit
An identity theft company gets access to a large credit reporting organization’s data. This data also contains personal information about thousands of users who have bought Telco’s smart home devices for home insurance purposes. The identity theft company can publish the dataset containing Telco’s users’ personal information on an online black market for profit. A privacy threat such as this could be described by the following MAP categories.
4.2.1 Threat Actor: Outside Aggressive, Expert (Private Organization).
Watermelon is an experienced hacker working at an identity theft company. This company profits by selling identification documents after stealing those from real people. Watermelon’s job is to find large exposed databases containing personal information.
4.2.2 Threat Mechanism: Disclosure of Information.
Watermelon, posing as a law enforcement official, creates a fake subpoena to ask Telco’s legal team for customer records containing personal and financial information of thousands of customers who access the credit reporting facilities of the breached agency. They then turn this data over to their employer, who turn sells the data for profit.
4.2.3 Threat Impact: Explicit - Economic.
The published financial information contained not only financial information but also usage data for the smart home devices linked to personal information like users’ full names. Cybercriminals can purchase this data and then buy gift cards worth several hundred dollars using the credit card information.
This persona is described using the persona card in Figure
5.
4.3 Persona 3: Outside Neutral (Expert) + Unawareness + Implicit
An activism group releases personal information of thousands of citizens to show that government websites were insecure, without the knowledge of these citizens or future consequences this may have [
26].
A privacy threat such as this could be described by the following MAP categories.4.3.1 Threat Actor: Outside Neutral, Expert (Regulators or Privacy Activists).
Orange participates in an activism group. They are passionate about user privacy and strongly believe that organizations should be audited for better privacy controls. In their free time, Orange is auditing fitness applications (like that of Telex) to test if personal information of citizens can be extracted or not.
4.3.2 Threat Mechanism: Unawareness.
Orange finds that the online fitness portal developed by the Telex team had an archive section that allows anyone to view health data of people who use the fitness application. Orange releases personal information of thousands of users who have registered on the portal to show how bad it is.
4.3.3 Threat Impact: Implicit - Autonomy.
The Telex application’s archive section is in a hidden part of the portal. Users do not have any knowledge that their historical data is being stored by the healthcare portal and do not have any control over restricting access to their data.
This persona is described using the card in Figure
6.
4.4 Persona 4: Outside Neutral (Non-Expert) + Identifiability + Explicit
Telex can add a new device Telco launched, called TelexTags. TelexTags can be used to share live location of any object attached to it by signal hopping off local Telco phones. Non-expert users are using TelexTags to identify and locate a person’s car.
A privacy threat such as this could be described by the following MAP categories.4.4.1 Threat Actor: Outside Neutral, Non-Expert (End user).
Telco has recently launched a real-time location tracking device called TelexTag to help customers easily find objects like keys and phones. Jackfruit bought and recently started using an TelexTag for their keys. They have a person whose car they liked at a restaurant and decided to attach the TelexTag to their car so that Jackfruit can know where it is in real-time.
4.4.2 Threat Mechanism: Identifiability.
TelexTags connect to and use surrounding Telco phones’ network to transmit real-time location of an object. This way, if there is a Telco phone with a user or anyone passing by the user, a TelexTag can provide an object’s location. The TelexTag attached to the car provides real-time location of the person Jackfruit is following.
4.4.3 Threat Impact: Explicit - Physical.
The person who is being followed can have their car stolen because Jackfruit is able to accurately locate where the car is kept. Jackfruit can also steal the car from the last location of the TelexTag.
This persona is described using the card in Figure
7.
5 Use Case: Categorizing Privacy Breaches Using MAP
Once the framework development was complete, there was a need to test whether MAP can sufficiently describe real world privacy threats so that it can help categorize privacy threats that developers may find. To do this, we looked at a repository of historical privacy breaches across different organizations to see if their described privacy threat can be categorized using our framework. This repository is our use case study to show that the categories of actors, mechanisms, and impact can be comprehensive of several privacy breaches. Hence, personas that are based on the MAP framework can account for a wide range of threats.
For this use case, we needed a breach database that has enough information about the threat actor, the mechanism of threat (how it happened), and impact. We used an open-source repository of data breaches called the VERIS Community Data Base (VCDB) [
1]. It contains information on threat actors, categories of breach, summary on the incident along with the information source, and impact. It is a non-exhaustive database of breaches that have occurred across the world which has been widely used in literature for data breach studies [
30,
41,
54].
We selected all breaches containing the word “privacy” from a list of data breaches that were verified and marked as validated (i.e., incidents which have been checked for correctness by the VCDB team). Selecting the last six years of incidents (2015 to 2021) gave us a total of 208 incidents. We then manually went through their summaries to verify if the description matched a real privacy breach event. There was one incident which was not relevant to privacy and was removed. This gave us a total of 207 incidents. For our purpose, this was an enough sample size to be representative of privacy breaches at 95% confidence interval and 7% margin of error (within the standard acceptable margin of error of 8%).
We relied on the summary and the description of the privacy incident given in the sample to categorize each according to MAP. One researcher went through the summaries of each incident, referring back to the sources provided in VCDB to cross-check the summaries. They noted the threat actor, mechanism, and impact for each of the 207 incidents. This test was then reviewed by another researcher to ensure that the classification were adequate and not missing information. If each incident had a complete set of at least one actor, one mechanism, and one impact, they marked that MAP was sufficient to describe that specific incident. For example, note the following incident summary:
“Nova Scotia’s privacy commissioner says she’s shocked by how a grocery-store pharmacist was able to snoop into the electronic personal health information of dozens of people she knew.”
We noted the threat actor as ‘Inside Neutral - Employee’, threat mechanism as ‘Identifiability’, and threat impact as ‘Implicit - Autonomy’ from the summary and additional information in the source provided. There could be an additional impact to the grocery store as ‘Implicit - Reputational-O’. Incidents where at least one of the categories were missing, were marked as non-classified. For example, the incident summary quoted below:
“As part of an ongoing investigation of a VA employee, an allegation has been made that this employee used his VA Outlook account to send sensitive information outside the VA. The Privacy Officer (PO) has been asked to investigate and make a determination whether any privacy and or security rules have been violated.”
While we could classify the threat actor as ‘Inside Neutral - Employee’ and threat mechanism as ‘Non-compliance’, the summary and related links did not have enough information about threat impact. We marked such incidents as non-classified.
From our sample of 207 incidents, 183 incidents (88.4%) matched the framework categories. 24 did not match (11.6%). Among these 24, 17 breaches did not have enough information about the threat actor or mechanism. The remaining seven incidents could not be categorized into a threat mechanism since they occurred due to an error (for example, a fax was sent to person A instead of person B due to an employee’s error and was reported as a privacy breach). Even though these incidents can be classified under the ‘Identifiability’ mechanism from our framework, we chose not to include these specific set of incidents. This was because all such errors in the VCDB were HIPAA violations and did not have enough information whether they contained sensitive or personally identifiable information. Developers, when they categorize possible threats, can decide if they want to include such information as a new threat category or include it under ‘Identifiability’. Thus, we were able to categorize 183 of 190 (96.3%) breaches, after excluding the 17 which lacked enough information.
In Table
6, we list the categories that were most common in each of the components - threat actor, threat mechanism, and threat impact.
There were several interesting observations made during
this use case analysis. First, that a large portion of privacy breaches were not from malicious actors, but from benign actors. The largest number of incidents were from “Inside Neutral” actors (152 incidents). This was indicative that organizations could possibly be falling short of adding adequate internal privacy controls or privacy training in place for entities within the organization, especially entities who had no intention of causing harm to the organization. Second, we found that 158 incidents occurred due to non-compliance, disclosure of information, and identifiability - all of which already have recommended mitigation measures in literature [
4,
53]. Organizations which conduct privacy compliance assessments have de-identification and other policies in place to prevent these from happening. This non-conformance is indicative of a possible need for a check if policies are well-enforced. Third, we were interested in which categories of privacy harm occur most frequently.
For this use case, since our goal was to see if the framework maps adequately, we considered at most one impact category which was indicated in the incident summary. While implicit harm to both individuals and organizations was the largest category, it is quite possible that one incident could fit into two or more categories. Either way, implicit harm was the predominant category of harm (73 + 43 = 116 instances in Table
6).
This use case shows how MAP can be used to categorize different potential privacy threats in applications and find which threat actors and mechanisms are the biggest contributors.
6 Discussion and Scaling the Framework
The four example personas in Section
4 show how components of MAP can be selected to represent a privacy threat by looking at both who a threat actor persona can be and what harm they can cause.
Personas make it easier to communicate these risks to other relevant stakeholders as well. Similarly, Section 5 shows how MAP can also be used to classify threats.. The use case can help a developer or a threat modeler understand privacy threats more structurally. using previous known literature as basis.
Since we combine privacy vocabulary already widely known, this helps developers reduce time spent learning about these threats.6.1 Scaling
While MAP as a framework can provide structure to classifying privacy threats, personas based on MAP can help developers to proactively operationalize threats. Systematic generation of personas is essential in such cases. However, with 160 personas (four actor sub-types, 10 mechanisms, and four impact sub-types), coming up with a persona for all applicable combinations can be time-consuming. As a potential way of reducing this, we propose a dictionary-based scaling approach which can be created by cybersecurity teams at organizations based on their threat definitions.
The first step would be to create dictionary entries of components (threat actors, threat mechanisms, and threat impacts), with a generic description for categories within each component. We can then insert the dictionary entries as required into a template. The template will take three categories from the component dictionaries, which are the three components in the personas framework – Threat Actor, Threat Mechanism, and Threat Impact. For example, a template could look like:
[Threat Actor]. They cause a data breach by [Threat Mechanism]. This incident leads to [Threat Impact].
As an example, let us consider from each of the dictionaries, [Threat Actor] as Inside Aggressive (Inside Attacker), [Threat Mechanism] as Identifiability, and [Threat Impact] as Implicit-O (Reputational). The category entries in the dictionary would be:
•
[Threat Actor] : Mango is a seasoned IT technician at Telco with 10 years of working in the industry. Due to a management conflict, they are wrongfully accused of deleting company information and fired. Mango is an expert developer with years of experience in open-source tools.
•
[Threat Mechanism] : releasing confidential company records which can identify other employees and contractors through their personal information.
•
[Threat Impact] : loss of public trust and trust of other businesses who are associated with the company.
This dictionary can have an entry for every category of actor, mechanism, and impact similar to the one shown above. For the previous example, the final persona generated would be the following:
[Mango is a seasoned IT technician at Telco with 10 years of working in the industry. Due to a management conflict, they are wrongfully accused of deleting company information and fired. Mango is an expert developer with years of experience in open-source tools]. They cause a data breach by [releasing confidential company records which can identify other employees and contractors through their personal information.]. This incident leads to [loss of public trust and trust of other businesses who are associated with the company.].
Once the dictionary is ready with a description of each of the threat actor, mechanism, and impact categories, a persona can be automatically generated as a developer chooses from each component. This way, there would be no need to create personas for all combinations - the only requirement would be a 4 x 10 x 4 (actor x mechanism x impact) dictionary of attacks.
This dictionary would have two additional caveats. First, threat modelers should provide rules based on an application’s use case to prevent generation of improbable personas. For example, if a combination of threat actor, mechanism, and impact is not possible, this scaling should not generate a persona. Second, there would be some loss of context for each application as the dictionary entries for threat mechanisms can be quite generic. It will be then up to the development teams to interpret the applicability of such personas in the context of their application.
6.2 Limitations and Future Work
Our work presents a personas-based approach to threat modeling in privacy. As such, the framework will benefit from continuous user testing with developers and threat modelers. We hope to include user studies that inform personas and test their usability on real-world applications in our future work. Such user testing is imperative to the development of better threat personas [21, 32] as privacy threats evolve. Nevertheless, the personas have received input and positive feedback from a threat modeling team at a large communications company and they are in the process of adopting the framework. With the help of threat modelers and product teams, we were able to pilot the MAP personas on two sample applications to find and classify some privacy threats. For this study, we present privacy threat personas as one way of understanding, finding, and communicating privacy threats from developers to stakeholders who are privacy non-experts. As indicated in Section
5, a privacy incident can have multiple impact categories. For instance, a data breach can cause both economic harm to customers and reputation harm to an organization. The impact from such cases should be considered separately while addressing privacy threats. Similarly, there is a possibility that a threat actor could be both a Customer (Outside Neutral) or an Employee (Inside Neutral). In such cases, we look at each breach to determine their role. If they acted in the capacity of an employee, they were categorized as an Employee (Inside Neutral).
One of the limitations of VCDB that we used for validation is that a large portion of its privacy breaches are healthcare related breaches. This is because the list of breaches are sourced heavily from databreaches.net, who maintains a repository of HIPAA related privacy violations. It can be argued that the VCDB is primarily a security focused database of incidents. While we did find incidents across several categories, future validation could include other privacy focused databases, especially those which track privacy lawsuits. Additional information sources and databases will be helpful for understanding a wider range of threats.
For automatic generation of personas (Section
6.1), a dictionary of inputs could be used to generate text instead of a fixed set of inputs. The current MAP framework is meant to be generic and applicable across all industry types. The flexible nature of the framework, however, makes it easier for industries to adjust their own contextually appropriate privacy personas.
7 Conclusion
In this work, we draw from existing literature to construct MAP - a framework for privacy threat modeling. We then demonstrate how MAP can be used to create privacy personas for a hypothetical application. Finally, we show that MAP can also be helpful in categorizing threat from and to an application through a use case of previous privacy breaches.
Privacy personas, in particular, can be used at any stage of the development process. However, it would be the most effective in the planning and threat modeling stages. To incorporate Privacy by Design, it is desirable to introduce privacy threat modeling earlier in the design process so that the development teams can add controls in place to address the threats that emerge. Once a data flow diagram is finalized and all application components are well understood, privacy personas can be applied to test it against potential threats. Privacy personas generated using MAP can also be used to evaluate an application post deployment. The threat analyst would have to consider applicable categories from the framework for the application. It is then up to the developers to accept, mitigate, avoid, or transfer the risk from the potential threats.