Abstract
This chapter is concerned with setting up practical guardrails within the research activities and environments of Computational Social Science (CSS). It aims to provide CSS scholars, as well as policymakers and other stakeholders who apply CSS methods, with the critical and constructive means needed to ensure that their practices are ethical, trustworthy, and responsible. It begins by providing a taxonomy of the ethical challenges faced by researchers in the field of CSS. These are challenges related to (1) the treatment of research subjects, (2) the impacts of CSS research on affected individuals and communities, (3) the quality of CSS research and to its epistemological status, (4) research integrity, and (5) research equity. Taking these challenges as motivation for cultural transformation, it then argues for the incorporation of end-to-end habits of Responsible Research and Innovation (RRI) into CSS practices, focusing on the role that contextual considerations, anticipatory reflection, impact assessment, public engagement, and justifiable and well-documented action should play across the research lifecycle. In proposing the inclusion of habits of RRI in CSS practices, the chapter lays out several practical steps needed for ethical, trustworthy, and responsible CSS research activities. These include stakeholder engagement processes, research impact assessments, data lifecycle documentation, bias self-assessments, and transparent research reporting protocols.
You have full access to this open access chapter, Download chapter PDF
Similar content being viewed by others
4.1 Introduction
Since its inception, one of the great promises of Computational Social Science (CSS) has been the possibility of leveraging a variety of algorithmic techniques to gain insights and identify patterns in big social data that would have otherwise been unavailable to the researchers and policymakers who had to draw on more traditional, non-computational approaches to the study of society. By applying computational methods to the vast amounts of data generated from today’s complex, digitised, and datafied society, CSS for policy is well placed to generate empirically grounded inferences, explanations, theories, and predictions about human behaviours, networks, and social systems, which not only effectively manage the volume and high dimensionality of big data but also, in fact, draw epistemic advantage from their unprecedented breadth, quantity, depth, and scale. This chapter is concerned with fleshing out the myriad ethical challenges faced by this endeavour. It aims to provide CSS scholars, as well as policymakers and other stakeholders who apply CSS methods, with the critical and constructive means needed to ensure that their research is ethical, trustworthy, and responsible.
Though some significant attempts to articulate the ethical stakes of CSS have been made by scholars and professional associations over the past two decades,Footnote 1 the scarcity of ethics in the mainstream labours of CSS, and across its history,Footnote 2 signals a general lack of awareness that is illustrative of several problematic dimensions of current CSS research practices that will motivate the arguments presented in this chapter. It is illustrative insofar as the absence of an active recognition of the ethical issues surrounding the social practice and wider human impacts of CSS may well shed light on a troublesome disconnection that persists between the self-understanding of CSS researchers who implicitly see themselves largely as neutral and disinterested scientists operating within the pure, self-contained confines of the laboratory or lecture hall, on the one hand, and the lived reality of their existence as contextually situated scholars whose framings, subject matters, categories, and methods have been forged in the crucible of history, society, and culture, on the other.
To be sure, when CSS researchers assume a scientistic “view from nowhere”Footnote 3 and regard the objects of their study solely through quantitative and computational lenses, they run two significant risks: First, they run the risk of assuming positivistic attitudes that frame the objects of their study through the quantifying and datafying lenses of models, formalisms, behaviours, networks, simulations, and systems thereby setting aside or trivialising ethical considerations in an effort to get to the real science without further ado. When the objects of the study of CSS are treated solely as elements of automated information analysis rather than as human subjects—each of whom possesses a unique dignity and is thus, first and foremost, worthy of moral regard and interpretive care—scientistic subspecies of CSS are liable to run roughshod over fundamental rights and freedoms like privacy, autonomy, meaningful consent, and non-discrimination with a blindered view to furthering computational insight and data quantification (Fuchs, 2018; Hollingshead et al., 2021). Second, they risk seeing themselves as operationally independent or even immune from the conditioning dynamics of the social environments they study and in which their own research activities are embedded (Feenberg, 1999, 2002). This can create conditions of deficient reflexivity—i.e., defective self-awareness of the limitations of one’s own standpoint—and ethical precarity (Leslie et al., 2022a). As John Dewey long ago put it, “the notion of the complete separation of science from the social environment is a fallacy which encourages irresponsibility, on the part of scientists, regarding the social consequences of their work” (Dewey, 1938, p. 489).
In the case of CSS for policy, the price of this misperceived independence of researchers from the formative dynamics of their sociohistorical environment has been extremely high. CSS practices have developed and matured in an age of unprecedented sociotechnical sea change—an age of unbounded digitisation, datafication, and mediatisation. The cascading societal effects of these revolutionary transformations have, in fact, directly shaped and implicated CSS in its research trajectories, motivations, objects, methods, and practices. The rise of the veritably limitless digitisation and datafication of social life has brought with it a corresponding impetus—among an expanding circle of digital platforms, private corporations, and governmental bodies—to engage in behavioural capture and manipulation at scale. In this wider societal context, the aggressive extraction and harvesting of data from the digital streams and traces generated by human activities, more often than not, occur without the meaningful consent or active awareness of the people whose digital and digitalised livesFootnote 4 are the targets of increasing surveillance, consumer curation, computational herding, and behavioural steering. Such extractive and manipulative uses of computational technologies also often occur neither with adequate reflection on the potential transformative effects that they could have on the identity formation, agency, and autonomy of targeted data subjects nor with appropriate and community-involving assessment of the adverse impacts they could have on civic and social freedoms, human rights, the integrity of interpersonal relationships, and communal and biospheric well-being.
The real threat here, for CSS, is that the prevailing “move fast and break things” attitude possessed by the drivers of the “big data revolution”, and by the beneficiaries of its financial and administrative windfalls, will simply be transposed into the key of the data-driven research practices they influence, making a “research fast and break things” posture a predominant disposition. This threat to the integrity of CSS research activity, in fact, derives from the potentially inappropriate dependency relationships which can emerge from power imbalances that exist between the CSS community of practice and those platforms, corporations, and public bodies who control access to the data resources, compute infrastructures, project funding opportunities, and career advancement prospects upon which CSS researchers rely for their professional viability and endurance. Here, the misperceived independence of researchers from their social environments can mask toxic and agenda-setting dependencies.
Taken together, these downstream hazards signal potential deficits in the social responsibility, trustworthiness, and ethical permissibility of its practices. To confront such hazards, this chapter will first provide a taxonomy of ethical challenges faced by CSS researchers. These are (1) challenges related to the treatment of research subjects, (2) challenges related to the impacts of CSS research on affected individuals and communities, (3) challenges related to the quality of CSS research and to its epistemological status, (4) challenges related to research integrity, and (5) challenges related to research equity. Taking these challenges as a motivation for cultural transformation, it will then argue for the incorporation into CSS practices of end-to-end habits of Responsible Research and Innovation (RRI), focusing, in particular, on the role that contextual considerations, anticipatory reflection, public engagement, and justifiable and well-documented action should play across the research lifecycle. The primary goal of this focus on RRI is to centre the understanding of CSS as “science with and for society” and to foster, in turn, critical self-reflection about the consequential role that human values, norms, and purposes play in its discovery and design processes and in considerations of the real-world effects of the insights and tools that these processes yield. In proposing the inclusion of habits of RRI in CSS practices, the chapter lays out several practical steps needed for ethical, trustworthy, and responsible CSS research activities. These include stakeholder engagement processes, research impact assessments, data lifecycle documentation, bias self-assessments, and transparent research reporting protocols.
4.2 Ethical Challenges Faced by CSS
A preliminary step needed to motivate the centring of Responsible Research and Innovation practices in CSS is the identification of the range of ethical challenges faced by its researchers. These challenges can be broken down into five categories:
-
1.
Challenges related to the treatment of research subjects. These challenges have to do with the interrelated aspects of confidentiality, data privacy and protection, anonymity, and informed consent.
-
2.
Challenges related to the impacts of CSS research on affected individuals and communities. These challenges cover areas such as the potential adverse impacts of CSS research activities on the respect for human dignity and on other fundamental rights and freedoms.
-
3.
Challenges related to the quality of CSS research and to its epistemological status. Challenges related to the quality of CSS research include erroneous data linkage, dubious “ideal user assumptions”, the infusion of algorithmic influence in observational datasets of digital traces, the “illusion of the veracity of volume”, and blind spots vis-à-vis non-human data generation that undermine data quality and integrity. Challenges related to the epistemological status of CSS include the inability of computation-driven techniques to fully capture non-random missingness in datasets and sociocultural conditions of data generation and hence a broader tendency to potentially misrepresent the real social world in the models, simulations, analyses, and predictions it generates.
-
4.
Challenges related to research integrity. These challenges are rooted in the asymmetrical dynamics of resourcing and influence that can emerge from power imbalances between the CSS research community and the corporations and public agencies upon whom CSS scholars rely for access to the data resources, compute infrastructures, project funding opportunities, and career advancement prospects they need for their professional subsistence and advancement.
-
5.
Challenges related to research equity. These challenges include the potential reinforcement of digital divides and data inequities through biased sampling techniques that render digitally marginalised groups invisible as well as potential aggregation biases in research results that mask meaningful differences between studied subgroups and therefore hide the existence of real-world inequities. Research equity challenges may also derive from long-standing dynamics of regional and global inequality that may undermine reciprocal sharing between research collaborators from more and less resourced geographical areas, universities, or communities of practice.
Let us expand on each of these challenges in turn.
4.2.1 Challenges Related to the Treatment of Research Subjects
When identifying and exploring challenges related to the treatment of research subjects in CSS, it is helpful to make a distinction between participation-based and observation-based research, namely, between CSS research that is gathering data directly from research subjects through their deliberate involvement in digital media (e.g., research that uses online methods to gather data by way of human involvement in surveys, experiments, or participatory activities) and CSS research that is investigating human action and social interaction in observed digital environments, like social media or search platforms, through the recording, measurement, and analysis of digital life, digital traces, and digitalised life (Eynon et al., 2017). Though participation-based and observation-based research raise some overlapping issues related to privacy and data protection, there are notable differences that yield unique challenges.
Several general concerns about privacy preservation, data protection, and the responsible handling and storage of data are common to participation-based and observation-based CSS research. This is because empirical CSS research often explores topics that require the collection, analysis, and management of personal data, i.e., data that can uniquely identify individual human beings. Although CSS research frequently spans different jurisdictions, which may have diverging privacy and data protection laws, responsible research practices that aim to optimally protect the rights and interests of research subjects in light of risks posed to confidentiality, privacy, and anonymity should recur to the highest standards of privacy preservation, data protection, and the responsible handling and storage of data. They should also establish and institute proportionate protocols for attaining informed and meaningful consent that are appropriate to the specific contexts of the data extraction and use and that cohere with the reasonable expectations of the targeted research subjects.
Notwithstanding this common footing for ethics considerations related to data protection and the privacy of research subjects, participation-based and observation-based approaches to CSS research each raise distinctive issues. For researchers who focus on online observation or who use data captured from digital traces or data extracted from connected mobile devices, the Internet of Things, public sensors and recording devices, or networked cyber-physical systems, coming to an appropriate understanding of the reasonable expectations of research subjects regarding their privacy and anonymity is a central challenge. When observed research subjects move through their synchronous digital and connected environments striving to maintain communication flows and coherent social interactions, they must navigate moment-to-moment choices about the disclosure of personal information (Joinson et al., 2007). In physical public spaces and in online settings, the perception of anonymity (i.e., of the ability to speak and act freely without feeling like one is continuously being identified or under constant watch) is an important precondition of frictionless information exchange and, correspondingly, of the exercise of freedoms of movement, expression, speech, assembly, and association (Jiang, 2013; Paganoni, 2019; Selinger & Hartzog, 2020).
On the internet, moreover, an increased sense of anonymity may lead data subjects to more freely disclose personal information, opinions, and beliefs that they may not have shared in offline milieus (Meho, 2006). In all these instances of perceived anonymity, research subjects may act under reasonable expectations of gainful obscurity and “privacy in public” (Nissenbaum, 1998; Reidenberg, 2014). These expectations are responsive to and bounded by the changing contexts of communication, namely, by contextual factors like who one is interacting with, how one is exchanging information, what type of information is being exchanged, how sensitive it is perceived to be, and where and when such exchanges are occurring (Quan-Haase & Ho, 2020). This means, not only, that the protection of privacy must, first and foremost, consider contextual determinants (Collmann & Matei, 2016; Nissenbaum, 2011; Steinmann et al., 2015). It also implies that privacy protection considerations must acknowledge that the privacy preferences of research subjects can change from circumstance to circumstance and are therefore not one-off or one-dimensional decisions that can be made at the entry point to the usage of digital or social media applications through Terms of Service or end-user license agreements—which often go unread—or the initial determination of privacy settings (Henderson et al., 2013). For this reason, the conduct of observation-based research in CSS that pertains to digital and digitalised life should be informed by contextual considerations about the populations and social groups from whom the data are drawn, the character and potential sensitivities of their data, the nature of the research question (as it may be perceived by observed research subjects), research subjects’ reasonable expectations of privacy in public, and the data collection practices and protocols of the organisation or company which has extracted the data (Hollingshead et al., 2021). Notably, thorough assessment of these issues by members of a research team may far exceed formal institutional processes for gaining ethics approval, and it is the responsibility of CSS researchers to evaluate the appropriate scale and depth of privacy considerations regardless of minimal legal and institutional requirements (Eynon et al., 2017; Henderson et al., 2013).
Apart from these contextual considerations, the protection of the privacy and anonymity of CSS research subjects also requires that risks of re-identification through triangulation and data linkage are anticipated and addressed. While processes of anonymisation and removal of personally identifiable information from datasets scraped or extracted from digital platforms and digitalised behaviour may seem straightforward when those data are treated in isolation, multiple sources of linkable data points and multiple sites of downstream data collection pose tangible risks of re-identification via the combination and linkage of datasets (de Montjoye et al., 2015; Eynon et al., 2017; Obole & Welsh, 2012). As Narayanan & Shmatikov (2009) and de Montjoye et al. (2015) both demonstrate, the inferential triangulation of social data collected from just a few sources can lead to re-identification even under conditions where datasets have been anonymised in the conventional, single dataset sense. Moreover, when risks of triangulation and re-identification are considered longitudinally, downstream risks of de-anonymisation also arise. In this case, the endurance of the public accessibility of social data on the internet over time means that information that could lead to re-identification is ready-to-hand indefinitely. By the same token, the production and extraction of new data that post-dates the creation and use of anonymised datasets also present downstream opportunities for data linkage and inference creep that can lead to re-identification through unanticipated triangulation (Weinhardt, 2020).
Although many of these privacy and data protection risks also affect participation-based research (especially in cases where observational research is combined or integrated with it), experimental and human-involving CSS projects face additional challenges. Signally, participation-based CSS research must confront several issues surrounding the ascertainment of informed and meaningful consent. The importance of consent has been a familiar part of the “human subjects” paradigm of research ethics from its earliest expressions in the World Medical Association (WMA) Declaration of HelsinkiFootnote 5 and the Belmont Report.Footnote 6 However, the exponentially greater scale and societal penetration of CSS in comparison to more conventional forms of face-to-face, survey-driven, or laboratory-based social scientific research present a new order of hazards and difficulties. First, since CSS researchers, or their collaborators, often control essential digital infrastructure like social media platforms, they have the capability to efficiently target and experiment on previously unimaginable numbers of human subjects, with potential N’s approaching magnitudes of hundreds of thousands or even millions of people. Moreover, in the mould of such platforms, these researchers have an unprecedented capacity to manipulate or surreptitiously intervene in the unsuspecting activities and behaviours of such large, targeted groups.
The controversy around the 2014 Facebook emotional contagion experiment demonstrates some of the potential risks generated by this new scale of research capacity (Grimmelmann, 2015; Lorenz, 2014; Puschmann & Bozdag, 2014). In the study, researchers from Facebook, Cornell, and the University of California involved almost 700,000 unknowing Facebook users in what has since been called a “secret mood manipulation experiment” (Meyer, 2014). Users were split into two experimental groups and exposed to negative or positive emotional content to test whether News Feed posts could spread the relevant positive or negative emotion. Critics of the approach soon protested that the failure to obtain consent—or even to inform research subjects about the experiment—violated basic research ethics. Some also highlighted the dehumanising valence of these research tactics: “To Facebook, we are all lab rats”, wrote Vindu Goel in the New York Times (Goel, 2014). Hyperbole aside, this latter comment makes explicit the internal logic of many of the moral objections to the experiment that were voiced at the time. The Facebook researchers had blurred the relationship between the laboratory and the lifeworld. They had, in effect, unilaterally converted the social world of people connecting and interacting online into a world of experimental objects that subsisted merely as standing reserve for computational intervention and study—a transformation of the interpersonally animated life of the community into the ethically impoverished terrain of an “information laboratory” (Cohen, 2019a). Behind such a degrading conversion was the assertion of the primacy of objectifying and scientistic attitudes over considerations of the equal moral status and due ethical regard of research subjects. The experiment had, on the critical view, reduced Facebook users to the non-human standing of laboratory rodents, thereby disregarding their dignity and autonomy and consequently failing to properly consult them so to attain their informed consent to participate.
Even when the consent of research participants is sought by CSS researchers, a few challenges remain. These revolve around the question of how to ensure that participants are fully informed so that they can freely, meaningfully, and knowledgeably consent to their involvement in the research (Franzke et al., 2020). Though diligent documentation protocols for gaining consent are an essential element of ascertaining informed and meaningful consent in any research environment, in the digital or online milieus of CSS, the provision of this kind of text-based information is often inadequate. When consent documentation is provided in online environments through one-way or vertical information flows that do not involve real, horizontal dialogue between researchers and potential research subjects, opportunities to clarify possible misunderstandings of the terms of consent can be lost (Varnhagen et al., 2005). What is more, it becomes difficult under these conditions of incomplete or impeded communication to confirm that research subject actually comprehend what they are agreeing to do as research participants (Eynon et al., 2017). Relatedly, barriers to information exchange in the online environment can prevent researchers from being able to verify the capacity of research subjects to consent freely and knowledgeably (Eynon et al., 2017; Kraut et al., 2004). That is, it is more difficult to detect potential limitations of or impairments in the competence of participants (e.g., from potentially vulnerable subgroups) in giving consent where researchers are at a significant digital remove from research subjects. In all these instances, various non-dialogical techniques for confirming informed consent are available—such as comprehension tests, smart forms that employ branching logic to ensure essential text is completely read, identity verification, etc. Such techniques, however, present varying degrees of uncertainty and drop-out risk (Kraut et al., 2004; Varnhagen et al., 2005), and they do not adequately substitute for interactive mechanisms that could connect researchers directly with participants and their potential questions and concerns.
4.2.2 Challenges Related to the Impacts of CSS Research on Affected Individuals and Communities
While drawing on the formal techniques and methods of mathematics, statistics, and the exact sciences, CSS is a research practice that is policy-oriented, problem-driven, and societally consequential. As an applied science that directly engages with issues of immense social concern like socioeconomic inequality, the spread of infectious disease, and the growth of disinformation and online harm, it impacts individuals and communities with the results, capabilities, and tools it generates. Moreover, CSS is an “instrument-enabled science” (Cioffi-Revilla, 2014, p. 4) that employs computational techniques, which can be applied to large-scale datasets excavated from veritably all societal sectors and spheres of human activity and experience. This makes its researchers the engineers and custodians of a general purpose research technology whose potential scope in addressing societal challenges is seemingly unbounded. With this in view, Lazer et al. (2020) call for the commitment of “resources, from public and private sources, that are extraordinary by current standards of social science funding” to underwrite the rapid expansion of CSS research infrastructure, so that its proponents can enlarge their quest to “solve real-world problems” (p. 1062). Beyond the dedication of substantial resources, such an expansion, Lazer et al. (2020) argue, also requires the formulation of “policies that would encourage or mandate the ethical use of private data that preserves public values like privacy, autonomy, security, human dignity, justice, and balance of power to achieve important public goals—whether to predict the spread of disease, shine a light on societal issues of equity and access, or the collapse of the economy” (p. 1061). CSS, along these lines, is not simply an applied social science, a science for policy. It is a social impact science par excellence.
The mission-driven and impact-oriented perspective conveyed here is, however, a double-edged sword. On the one hand, the drive to improve the human lot and to solve societal problems through the fruits of scientific discovery has constructively guided the impetus of modern scientific research and innovation at least since the seventeenth-century dawning of the Baconian and Newtonian revolutions. In this sense, the practical and problem-solving aspirations for CSS expressed by Lazer et al. (2020) are continuous with a deeper tradition of societally oriented science.
On the other hand, the view that CSS is a mission-driven and impact-oriented science raises a couple of thorny ethical issues that are not necessarily solvable by the application of its own methodological and epistemic resources. First, the assumption of a mission-driven starting point surfaces a difficult set of questions about the relationship of CSS research to the values, interests, and power dynamics that influence the trajectories of its practice: Whose missions are driving CSS and whose values and interests are informing the policies that are guiding these missions? To what extent are these values and interests shared by those who are likely to be impacted by the research? To what extent do these values and interests, and the policies they shape, sufficiently reflect the plurality of values and interests that are possessed by members of communities who will potentially be affected by the research (especially those from historically marginalised, discriminated-against, and vulnerable social groups)? Are these missions determined through democratic and community-involving processes or do other parties (e.g., funders, research collaborators, resource providers, principal investigators, etc.) wield asymmetrical agenda-setting power in setting the direction of travel for the research and its outputs? Who are the beneficiaries of these mission-driven research projects and who are at risk of any adverse impacts that they could have? Are these potential risks and benefits equitably distributed or are some stakeholders disparately exposed to harm while others in positions of disproportionate advantage?
Taken together, these questions about the role that values, interests, and power dynamics play in shaping mission-driven research and its potential impacts evoke critical, though often concealed, interdependencies that exist between the CSS community of practice and the social environments in which its research activities, subject matters, and outputs are embedded. They likewise evoke the inadequacy of evasive scientistic tendencies to appeal to neutral or value-free stances when faced with queries about how values, interests, and power dynamics motivate and influence the aims, purposes, and areas of concern that steer vectors of CSS research. Responding appropriately to such questions surrounding the social determinants of research paths and potential impacts demands an inclusive broadening of the conversations that shape, articulate, and determine the missions to be pursued, the problems to be addressed, and the assessment of potential harms and benefits—a broadening both in terms of the types of knowledge and expertise that are integrated into such deliberative processes and in terms of the range of stakeholder groups that should be involved.
Second, the recognition of a mission-driven and impact-oriented starting point elevates the importance of identifying the potential adverse effects of CSS research so that these can, as far as possible, be pinpointed at the outset of research projects and averted. Such practices of anticipatory reflection are necessary because the intended and unintended consequences of the societally impactful insights, tools, and capabilities CSS research produces could be negative and injurious rather than positive and mission-supporting. As the short history of the “big data revolution” demonstrates, the rapid and widespread proliferation of algorithmic systems, data-driven technologies, and computation-led analytics has already had numerous deleterious effects on human rights, fundamental freedoms, democratic values, and biospheric sustainability. Such harmful effects have penetrated society at multiple levels including on the planes of individual agency, social interaction, and biospheric integrity. Let us briefly consider these levels in turn.
4.2.2.1 Adverse Impacts at the Individual Level
At the agent level, the predominance “radical behaviourist” attitudes among the academic, industrial, and governmental drivers of data innovation ecosystems have led to the pervasive mobilisation of individual-targeting predictive analytics which have had damaging impacts across a range of human activities (Cardon, 2016; Cohen, 2019b; Zuboff, 2019). For instance, in the domain of e-commerce and ad-tech, strengthening regimes of consumer surveillance have fuelled the use of “large-scale behavioural technologies” (Ball, 2019) that have enabled incessant practices of hyper-personalised psychographic profiling, consumer curation, and behavioural nudging. As critics have observed, such technologies have tended to exploit the emotive vulnerabilities and psychological weaknesses of targeted people (Helbing et al., 2019), instrumentalising them as monetisable sites of “behavioural surplus” (Zuboff, 2019) and treating them as manipulable objects of prediction and “behavioural certainty” rather than as reflective subjects worthy of decision-making autonomy and moral regard (Ball, 2019; Yeung, 2017). Analogous behaviourist postures have spurred state actors and other public bodies to subject their increasingly datafied citizenries to algorithmic nudging techniques that aim to obtain aggregated patterns of desired behaviour which accord with government generated models and predictions (Fourcade & Gordon, 2020; Hern, 2021). Some scholars have characterised such an administrative ambit as promoting the paternalistic displacement of individual agency and the degradation of the conditions needed for the successful exercise of human judgment, moral reasoning, and practical rationality (Fourcade & Gordon, 2020; Spaulding, 2020).
In like manner, the nearly ubiquitous scramble to capture behavioural shares of user engagement across online search, entertainment, and social media platforms has led to parallel feedback loops of digital surveillance, algorithmic manipulation, and behavioural engineering (Van Otterlo, 2014). The proliferation of the so-called “attention market” business model (Wu, 2019) has prompted digital platforms to measure commercial success in terms of the non-consensual seizure and monopolisation of focused mental activity. This has fostered the deleterious attachment of targeted consumer populations to a growing ecosystem of “distraction technologies” (Syvertsen, 2020; Syvertsen & Enli, 2020) and compulsion-forming social networking sites and reputational platforms, consequently engendering, on some accounts, widespread forms of surveillant anxiety (Crawford, 2014), cognitive impairment (Wu, 2019), mental health issues (Banjanin et al., 2015; Barry et al., 2017; Lin et al., 2016; Méndez-Diaz et al., 2022; Peterka-Bonetta et al., 2019), and diminished adolescent self-esteem and quality of life (Scott & Woods, 2018; Viner et al., 2019; Woods & Scott, 2016).
4.2.2.2 Adverse Impacts at the Social Level
Setting aside the threats to basic individual dignity and human autonomy that these patterns of instrumentalisation, disempowerment, and exploitation present (Aizenberg & van den Hoven, 2020; Halbertal, 2015), the proliferation of data-driven behavioural steering at the collective level has also generated risks to the integrity of social interaction, interpersonal solidarity, and democratic ways of life. In current digital information and communication environments, for example, the predominant steering force of social media and search engine platforms has mobilised opaque computational methods of relevance ranking, popularity sorting, and trend predicting to produce calculated digital publics devoid of any sort of active participatory social or political choice (Beer, 2017; Bogost, 2015; Cardon, 2016; Gillespie, 2014; O’Neil, 2016; Striphas, 2015; Ziewitz, 2016). Rather than being guided by the deliberatively achieved political will of interacting citizens, this vast meshwork of connected digital services shapes these computationally fashioned publics in accordance with the drive to commodify monitored behaviour and to target and capture user attention (Carpentier, 2011; De Cleen & Carpentier, 2008; Dean, 2010; Fuchs, 2021; John, 2013; Zuckerman, 2020). And, as this manufacturing of digital publics is ever more pressed into the service of profit seeking by downstream algorithmic mechanisms of hyper-personalised profiling, engagement-driven filtering, and covert behavioural manipulation, democratic agency and participation-centred social cohesion will be increasingly supplanted by insidious forms of social sorting and digital atomisation (Vaidhyanathan, 2018; van Dijck, 2013; van Dijck et al., 2018). Combined with complimentary dynamics of wealth polarisation and rising inequality (Wright et al., 2021), such an attenuation of social capital, discursive interaction, and interpersonal solidarity is already underwriting the crisis of social and political polarisation, the widespread kindling of societal distrust, and the animus towards rational debate and consensus-based science that have come to typify contemporary post-truth contexts (Cosentino, 2020; D’Ancona, 2017; Harsin, 2018; McIntyre, 2018).
Indeed, as these and similar kinds of computation-based social sorting and management infrastructures continue to multiply, they promise to jeopardise more and more of the formative modes of open interpersonal communication that have enabled the development of crucial relations of mutual trust and responsibility among interacting individuals in modern democratic societies. This is beginning to manifest in the widespread deployment of algorithmic labour and productivity management technologies, where manager-worker and worker-worker relations of reciprocal accountability and interpersonal recognition are being displaced by depersonalising mechanisms of automated assessment, continuous digital surveillance and computation-based behavioural incentivisation, discipline, and control (Ajunwa et al., 2017; Akhtar & Moore, 2016; Kellogg et al., 2020; Moore, 2019). The convergence of the unremitting sensor-based tracking and monitoring of workers’ movements, affects, word choices, facial expressions, and other biometric cues, with algorithmic models that purport to detect and correct defective moods, emotions, and levels of psychological engagement and well-being, may not simply violate a worker’s sense of bodily, emotional, and mental integrity by rendering their inner life legible and available for managerial intervention as well as productivity optimisation (Ball, 2009). These forms of ubiquitous personnel tracking and labour management can also have so-called panoptic effects (Botan, 1996; Botan & McCreadie, 1990), causing people to alter their behaviour on suspicion it is being constantly observed or analysed and deterring the sorts of open worker-to-worker interactions that enable the development of reciprocal trust, social solidarity, and interpersonal connection. This labour management example merely signals a broader constellation of ethical hazards that are raised by the parallel use of sensor- and location-based surveillance, psychometric and physiognomic profiling (Agüera y Arcas et al., 2017; Barrett et al., 2019; Chen & Whitney, 2019; Gifford, 2020; Hoegen et al., 2019; Stark & Hutson, 2021), and computation-driven technologies of behavioural governance in areas like education (Andrejevic & Selwyn, 2020; Pasquale, 2020), job recruitment (Sánchez-Monedero et al., 2020; Sloane et al., 2022), criminal justice (Brayne, 2020; Pasquale & Cashwell, 2018), and border control (Amoore, 2021; Muller, 2019). The heedless deployment of these kinds of algorithmic systems could have transformative effects on democratic agency, social cohesion, and interpersonal intimacy, preventing people from exercising their freedoms of expression, assembly, and association and violating their right to participate fully and openly in the moral, cultural, and political life of the community.
4.2.2.3 Adverse Impacts at the Biospheric Level
Lastly, at the level of biospheric integrity and sustainability, the exploding computing power—which has played a major part in ushering in the “big data revolution” and the rise of CSS—has also had significant environmental costs that deserve ethical consideration. As Lannelongue et al. (2021) point out, “the contribution of data centers and high-performance computing facilities to climate change is substantial… with 100 megatonnes of CO2 emissions per year, similar to American commercial aviation”. At bottom, this increased energy consumption has hinged on the development of large, computationally intensive algorithmic models that ingest abundant amounts of data in their training and tuning, that undergo iterative model selection and hyperparameter experiments, and that require exponential augmentations in model size and complexity to achieve relatively modest gains in accuracy (Schwartz et al., 2020; Strubell et al., 2019). In real terms, this has meant that the amount of compute needed to train complex, deep learning models increased by 300,000 times in 6 years (from 2013 to 2019) with training expenditures of energy doubling every 6 months (Amodei & Hernandez, 2018; Schwartz et al., 2020). Strubell et al. (2019) observe, along these lines, that training Google’s large language model, BERT, on GPU, produces substantial carbon emissions “roughly equivalent to a trans-American flight”. Though recent improvements in algorithmic techniques, software, and hardware have meant some efficiency gains in the operational energy consumption of computationally hungry, state-of-the-art models, some have stressed that such training costs are increasingly compounded by the carbon emissions generated by hardware manufacturing and infrastructure (e.g., designing and fabricating integrated circuits) (Gupta et al., 2020). Regardless of the sources of emissions, important ethical issues emerge both from the overall contribution of data research and innovation practices to climate change and to the degradation of planetary health and from the differential distribution of the benefits and risks that derive from the design and use of computationally intensive models. As Bender et al. (2021) have emphasised, such allocations of benefits and risks have closely tracked the historical patterns of environmental racism, coloniality, and “slow violence” (Nixon, 2011) that have typified the disproportionate exposure of marginalised communities (especially those who inhabit what has conventionally been referred to as “the Global South”) to the pollution and destruction of local ecosystems and to involuntary displacement.
As a whole, these cautionary illustrations of the hazards posed at individual, societal, and environmental levels by ever more ubiquitous computational interventions in the social world should impel CSS researchers to adopt an ethically sober and pre-emptive posture when reflecting on the potential impacts of their projects. The reason for this is not just that many of the methods, tools, capabilities, and epistemic frameworks that they utilise have already operated, in the commercial and political contexts of datafication, as accessories to adverse societal impacts. It is, perhaps more consequentially, that, as Wagner et al. (2021) point out, CSS practices of measurement and corollary theory construction in “algorithmically infused societies… indirectly alter behaviours by informing the development of social theories and subsequently influence the algorithms and technologies that draw on those theories” (p. 197). This dimension of the “performativity” of CSS research—i.e., the way that the activities and theories of CSS researchers can function to reformat, reorganise, and shape the phenomena that they purport only to measure and analyse—is crucial (Healy, 2015; Wagner et al., 2021). It enjoins, for instance, an anticipatory awareness that the methodological predominance of measurement-centred and prediction-driven perspectives in CSS can support the noxious proliferation of the scaled computational manipulation and instrumentalisation of large populations of affected people (Eynon et al., 2017; Schroeder, 2014). It also implores cognizance that an unreflective embrace of unbounded sociometrics and the pervasive sensor-based observation and monitoring of research subjects may support wider societal patterns of “surveillance creep” (Lyon, 2003; Marx, 1988) and ultimately have chilling effects on the exercise of fundamental rights and freedoms. The intractable endurance of these kinds of risks of adverse effects and the possibilities for unintended harmful consequences recommends vigilance both in the assessment of the potential impacts of CSS research on affected individuals and communities and in the dynamic monitoring of the effects of the research outputs, and the affordances they create, once these are released into the social world.
4.2.3 Challenges Related to the Quality of CSS Research and to Its Epistemological Status
CSS research that is of dubious quality or that misrepresents the world can produce societal harms by misleading people, misdirecting policies, and misguiding further academic research. Many of the pitfalls that can undermine CSS research quality are precipitated by deficiencies in the accuracy and the integrity of the datasets on which it draws. First off, erroneous data linkage can lead to false theories and conclusions. Researchers face ongoing challenges when they endeavour to connect the data generated by identified research subjects to other datasets that are believed to include additional information about those individuals (Weinhardt, 2020). Mismatches can poison downstream inferences in undetectable ways and lead to model brittleness, hampered explanatory power, and distorted world pictures.
The poisoning of inferences by corrupted, inaccurate, invalid, or unreliable datasets can occur in a few other ways. Where CSS researchers are not sufficiently critical of the “ideal user assumption” (Lazer & Radford, 2017), they can overlook instances in which data subjects intentionally mispresent themselves, subsequently perverting the datasets in which they are included. For example, online actors can multiply their identities as “sock puppets” by creating fake accounts that serve different purposes; they can also engage in “gaslighting” or “catfishing” where intentional methods of deception about personal characteristics and misrepresentation of identities are used to fool other users or to game the system; they can additionally impersonate real internet users to purposefully mislead or exploit others (Bu et al., 2013; Ferrara, 2015; Lazer & Radford, 2017; Wang et al., 2006; Woolley, 2016; Woolley & Howard, 2018; Zheng et al., 2006). Such techniques of deception can be automated or deployed using various kinds of robots (e.g., chat bots, social media bots, robocalls, spam bots, etc.) (Ferrara et al., 2016; Gupta et al., 2015; Lazer & Radford, 2017; Ott et al., 2011). If researchers are not appropriately attentive to the distortions that may arise in datasets as a result of such non-human sources of misleading data, they can end up unintentionally baking the corresponding corruptions of the underlying distribution that are present in the sample into their models and theories, thereby misrepresenting or painting a false picture of the social world (Ruths & Pfeffer, 2014; Shah et al., 2015). Similar blind spots in detecting dataset corruption can arise when sparse attention is paid to how the algorithms, which pervade the curation and delivery of information on online platforms, affect and shape the data that is generated by the users that they influence and steer (Wagner et al., 2021).
Attentiveness to such data quality and integrity issues can be hindered by the illusion of the veracity of volume or, what has been termed, “big data hubris” (Hollingshead et al., 2021; Kitchin, 2014; Lazer et al., 2014; Mahmoodi et al., 2017). This is the misconception that, in virtue of their sheer volume, big data can “solve all problems”, including potential deficiencies in data quality, sampling, and research design (Hollingshead et al., 2021; Meng, 2018). When it is believed that “data quantity is a substitute for knowledge-driven methodologies and theories” (Mahmoodi et al., 2017, p. 57), the rigorous and epistemically vetted approaches to social measurement, theory construction, explanation, and understanding that have evolved over decades in the social sciences and statistics can be perilously neglected or even dismissed.
Such a potential impoverishment of epistemic vigour can also result when CSS researchers fall prey to the enticements of the flip side of big data hubris, namely, computational solutionism. Predispositions to computational solutionism have emerged as a result of the coalescence of the rapid growth of computing power and the accelerating development of complex algorithmic modelling techniques that have together complemented the explosion of voluminous data and the big data revolution. This new access to the computational tools availed by potent compute and high-dimensional algorithmic machinery have led to the misconception in some corners of CSS that tools themselves can, by and large, “solve all problems”. Rather than confronting the contextual complexities that lie behind the social processes and historical conditions that generate observational data (Shaw, 2015; Törnberg & Uitermark, 2021), and that concomitantly create manifold possibilities for non-random missingness and meaningful noise, the computational solutionist reverts to a toolbox of heuristic algorithms and technical tricks to “clean up” the data, so that computational analysis can forge ahead frictionlessly (Agniel et al., 2018; Leonelli, 2021). At heart, this contextual sightlessness among some CSS researchers originates in scientistic attitudes that tend to naturalise and reify digital trace data (Törnberg & Uitermark, 2021), treating them as primitive and organically given units of measurement that facilitate the analytical capture of “social physics” (Pentland, 2015), “the ‘physics of culture’” (Manovich, 2011), or the “physics of society” (Caldarelli et al., 2018). The scientistic aspiration to discover invariant “laws of society” rests on this erroneous naturalisation of social data. Were the confidence of CSS research in such a naturalist purity of data to be breeched and their contextual and sociohistorical origins appropriately acknowledged, then the scientistic metanarratives that underwrite beliefs in “social physics”, and in its nomological character, would consequently be subverted. Computational solutionism provides an epistemic strategy for the wholesale avoidance of this problem: it directs researchers to rely solely on the virtuosity of algorithmic tooling and the computational engineering of observational data to address congenital problems of noise, confounders, and non-random missingness rather than employing a genuine methodological pluralism that takes heed of the critical importance of context and of the complicated social and historical conditions surrounding the generation and construction of data. Such a solutionist tack, however, comes at the cost of potentially misapprehending the circumstantial intricacies and the historically contingent evolution of agential entanglements, social structures, and interpersonal relations and of thereby “misrepresenting the real world” in turn (Ruths & Pfeffer, 2014, p. 1063).
In addition to these risks posed to the epistemic integrity of CSS by big data hubris and computational solutionism, CSS researchers face another challenge related to the epistemological status of the claims and conclusion they hold forth. This has to do with the problem of interpretability. As the mathematical models employed in CSS research have come to possess ever greater access both to big data and to increasing computing power, their designers have correspondingly been able to enlarge the feature spaces of these computational systems and to turn to gradually more complex mapping functions in order either to forecast future observations or to explain underlying causal structures or effects. In many cases, this has meant vast improvements in the performance of models that have become more accurate and expressive, but this has also meant the growing prevalence of non-linearity, non-monotonicity, and high-dimensional complexity in an expanding array of so-called “black box” models (Leslie, 2019). Once high-dimensional feature spaces and complex functions are introduced into algorithmic models, the effects of changes in any given input can become so entangled with the values and interactions of other inputs that understanding the rationale behind how individual components are transformed into outputs becomes extremely difficult. The complex and unintuitive curves of many of these models’ decision functions preclude linear and monotonic relations between their inputs and outputs. Likewise, the high-dimensionality of their architectures—frequently involving millions of parameters and complex correlations—presents a sweep of compounding statistical associations that range well beyond the limits of human-scale cognition and understanding. Such increasing complexity in input-output mappings creates model opacity and barriers to interpretability. The epistemological problem, here, is that, as a science that seeks to explain, clarify, and facilitate a better understanding of the human phenomena it investigates, CSS would seemingly have to avoid or renounce incomprehensible models that obstruct the demonstration of sound scientific reasoning in the conclusions and results attained.
A few epistemic strategies have emerged over the past decade or so to deal with the challenge posed by the problem of interpretability in CSS. First, building on a longstanding distinction originally made by statisticians between the predictive and explanatory functions of computational modelling (Breiman, 2001; Mahmoodi et al., 2017; Shmueli, 2010), some CSS scholars have focused on the importance of predictive accuracy, de-prioritising the goals of discovering and explaining the causal mechanisms and reasons that lie behind the dynamics of human behaviour and social systems (Anderson, 2008; Hindman, 2015; Lin, 2015; Yarkoni & Westfall, 2017). Lin (2015), for instance, makes a distinction between the goal of “better science”, i.e., “to reveal insights about the human condition”, what Herbert Simon called the “basic science” of explaining phenomena (2002), and the goal of “better engineering”, i.e., “to produce computational artifacts that are more effective according to well-defined metrics” (p. 35)—what Simon called the “applied science” of inferring or predicting from known variables to unknown variables (Shmueli, 2010; Simon, 2002). For Lin, if the purpose of CSS, as an applied science, is “better engineering”, then “whatever improves those [predictive] metrics should be exploited without prejudice. Sound scientific reasoning, while helpful, is not necessary to improve engineering”. Such a positivistic view would, of course, tamp down or even cast aside the desideratum of interpretability.
However, even for scholars that aspire to retain both the explanatory and predictive dimensions of CSS, the necessity of using interpretable models is far from universally embraced. Illustratively, Hofman et al. (2021) argue for “integrating explanation and prediction in CSS” by treating these approaches as complementary (cf. Engel, 2021; James et al., 2013; Mahmoodi et al., 2017). Still, these authors simultaneously claim that explanatory modelling is about “the estimation of causal effects, regardless of whether those effects are explicitly tied to theoretically motivated mechanisms that are interpretable as ‘the cogs and wheels of the causal process’” (Hofman et al., 2021, p. 186). To be sure, they maintain that:
interpretability is logically independent of both the causal and predictive properties of a model. That is, in principle a model can accurately predict outcomes under interventions or previously unseen circumstances (out of distribution), thereby demonstrating that it captures the relevant causal relationships, and still be resistant to human intuition (for example, quantum mechanics in the 1920s). Conversely, a theory can create the subjective experience of having made sense of many diverse phenomena without being either predictively accurate or demonstrably causal (for example, conspiracy theories). (pp. 186–187)
These justifications for treating the goal of interpretability as independent from the causal and predictive characteristics of a model raise some concerns. At an epistemic level, the extreme claim that “interpretability is logically independent of both the causal and predictive properties of a model” is unsupported by the observation that people can be deluded into believing false states of affairs. The attempt to cast aside the principal need for the rational acceptability and justification of the assertoric validity claims that explain a model’s causal and predictive properties, because it is possible to be misled by “subjective experience”, smacks of a curious epistemological relativism which is inconsistent with the basic requisites of scientific reasoning and deliberation. It offends the “no magic doctrine” (Anderson & Lebiere, 1998) of interpretable modelling, namely, that “it needs to be clear how (good) model performance comes about, that the components of the model are understandable and linked to known processes” (Schultheis, 2021). To level off all adjudications of explanatory claims (strong or weak) about a model because humans can be duped by misled feelings of subjective experience amounts to an absurdity: People can be convinced of bad explanations that are not predictively or causally efficacious (look at all those sorry souls who have fallen prey to conspiracy theories), so all explanations of complex models are logically independent of their actual causal and predictive properties. This line of thinking ends up in a ditch of epistemic whataboutism.
Moreover, at an ethical level, the analogy offered by Hofman et al. between the opaqueness of quantum physics and the opaqueness of “black box” predictive models about human behaviours and social dynamics is misguided and unsupportable. Such an erroneous parallelism is based on a scientistic confusion of the properties of natural scientific variables (like the wavelike mechanics of electrons) that function as heuristics for theory generation, testing, and confirmation in the exact physical sciences and the properties of the social variables of CSS whose generation, construction, and correlation are the result of human choices, evolving cultural patterns, and path dependencies created by sociohistorical structures. Unlike the physics data generated, for instance, by firing a spectroscopic light through a perforated cathode and measuring the splitting of the Balmer lines of a radiated hydrogen spectrum, the all-too-human genealogy of social data means that they can harbour discriminatory biases and patterns of sociohistorical inequity and injustice that become buried within the architectures of complex computational models. In this respect, the “relevant causal relationships” that are inaccessible in opaque models might be fraught with objectionable sociohistorical patterns of inequity, prejudice, coloniality, and structural racism, sexism, ablism, etc. (Leslie et al., 2022a). Because “human data encodes human biases by default” (Packer et al., 2018), complex algorithmic models can house and conceal a troubling range of unfair biases and discriminatory associations—from social biases against gender (Bolukbasi et al., 2016; Lucy & Bamman, 2021; Nozza et al., 2021; Sweeney & Najafian, 2019; Zhao et al., 2017), race (Benjamin, 2019; Noble, 2018; Sweeney, 2013), accented speech (Lawrence, 2021; Najafian et al., 2017), and political views (Cohen & Ruths, 2013 Iyyer et al., 2014; Preoţiuc-Pietro et al., 2017) to structures of encoded prejudice like proxy-based digital redlining (Cottom, 2016; Friedline et al., 2020) and the perpetuation of harmful stereotyping (Abid et al., 2021; Bommasani et al., 2021; Caliskan et al., 2017; Garrido-Muñoz et al., 2021; Nadeem et al., 2020; Weidinger et al., 2021). A lack of interpretability in complex computational models whose performant causal and predictive properties could draw opaquely on secreted discriminatory biases or patterns of inequity is therefore ethically intolerable. As Wallach (2018) observes:
the use of black box predictive models in social contexts…[raises] a great deal of concern—and rightly so—that these models will reinforce existing structural biases and marginalize historically disadvantaged populations… we must [therefore] treat machine learning for social science very differently from the way we treat machine learning for, say, handwriting recognition or playing chess. We cannot just apply machine learning methods in a black-box fashion, as if computational social science were simply computer science plus social data. We need transparency. We need to prioritize interpretability—even in predictive contexts. (p. 44) (cf. Lazer et al., 2020, p. 1062)
4.2.4 Challenges Related to Research Integrity
Challenges related to research integrity are rooted in the asymmetrical dynamics of resourcing and influence that can emerge from power imbalances between the CSS research community and the corporations and government agencies upon whom CSS scholars often rely for access to the data resources, compute infrastructures, project funding opportunities, and career advancement prospects they need for their professional subsistence and advancement. Such challenges can manifest, inter alia, in the exercise of research agenda-setting power by private corporations and governmental institutions, which set the terms of project funding schemes and data sharing agreements, and in the willingness of CSS researchers to produce insights and tools that support scaled behavioural manipulation and surveillance infrastructures.
These threats to the integrity of CSS research activity manifests in a cluster of potentially unseemly alignments and conflicts of interest between its own community of practice and those platforms, corporations, and public bodies who control access to the data resources and compute infrastructures upon which CSS researchers depend (Theocharis & Jungherr, 2021). First, there is the potentially unseemly alignment between the extractive motives of digital platforms, which monetise, monger, and link their vast troves of personal data and marshal inferences derived from these to classify, mould, and behaviourally nudge targeted data subjects, and the professional motivations CSS researchers who desire to gain access to as much of this kind of social big data as possible (Törnberg & Uitermark, 2021). A similar alignment can be seen between the motivations of CSS researchers to accumulate data and the security and control motivations of political bodies, which collect large amounts of personal data from the provision and administration of essential social goods and services often in the service of such motivations (Fourcade & Gordon, 2020). There is also a potentially unseemly alignment between the epistemic leverage and sociotechnical capabilities desired by private corporations and political bodies interested in scaled behavioural control and manipulation and the epistemic leverage and sociotechnical capabilities cultivated, as a vocational raison d’être, by some CSS researchers who build predictive tools. This alignment is made all-the-more worrying by the asymmetrical power dynamics that can be exercised by the former organisations over the latter researchers, who not only are increasingly reliant on private companies and governmental bodies for essential data access and computing resources but are also increasingly the obliged beneficiaries of academic-corporate research partnerships and academic-corporate “dual-affiliation” career trajectories that are funded by large tech corporations (Roberge et al., 2019). Finally, there is a broader scale cultural alignment between the way that digital platforms and tech companies pursue their corporate interests through technology practices that privilege considerations of strategic control, market creation, and efficiency and that are thereby functionally liberated from the constraints of social licence, democratic governance, and considerations of the interests of impacted people (Feenberg, 1999, 2002) and the way that CSS scholars can pursue of their professional interests through research practices similarly treated as operationally autonomous and independent from the societal conditions they impact and the governance claims of affected individuals and communities.
4.2.5 Challenges Related to Research Equity
Challenges related to research equity fall under two categories: (1) inequities that arise within the outputs of CSS research in virtue of biases that crop up within its methods and analytical approaches and (2) inequities that arise within the wider field of CSS research that result from material inequalities and capacity imbalances between different research communities. Challenges emerging from the first category include the potential reinforcement of digital divides and data inequities through biased sampling techniques that render digitally marginalised groups invisible as well as potential aggregation biases in research results that mask meaningful differences between studied subgroups and therefore hide the existence of real-world inequities. Challenges emerging from the second category include exploitative data appropriation by well-resourced researchers and the perpetuation of capacity divides between research communities, both of which derive from long-standing dynamics of regional and global inequality that may undermine reciprocal sharing and collaboration between researchers from more and less resourced geographical areas, universities, or communities of practice.
Issues of sampling or population bias in CSS datasets extracted from social media platforms, internet use, and connected devices arise when the sampled population that is being studied differs from the larger target population in virtue of the non-random selection of certain groups into the sample (Hargittai, 2015, 2020; Hollingshead et al., 2021; Mehrabi et al., 2021; Olteanu et al., 2019; Tufekci, 2014). It has been widely observed that people do not select randomly into social media sites like Twitter (Blank, 2017; Blank & Lutz, 2017), MySpace (boyd, 2011), Facebook (boyd, 2011; Hargittai, 2015), and LinkedIn (Blank & Lutz, 2017; Hargittai, 2015). As Hargittai (2015) shows, in the US context, people with greater educational attainment and higher income were more likely to be users of Twitter, Facebook, and LinkedIn than others of less privilege. Hargittai (2020) claims, more generally, that “big data derived from social media tend to oversample the views of more privileged people” and people who possess greater levels of “internet skill”. Earlier studies and surveys have also demonstrated that, at any given time, “different user demographics tend to be drawn to different social platforms” (Olteanu et al., 2019), with men and urban populations significantly over-represented among Twitter users (Mislove et al., 2011) and women over-represented on Pinterest (Ottoni et al., 2013).
The oversampling of self-selecting privileged and dominant groups, and the under-sampling or exclusion of members of other groups who may lack technical proficiency, digital resources, or access to connectivity, for example, large portions of elderly populations (Friemel, 2016; Haight et al., 2014; Quan-Haase et al., 2018), can lead to an inequitable lack of representativity in CSS datasets—rendering those who have been left out of data collection for reason of accessibility, skills, and resource barriers “digitally invisible” (Longo et al., 2017). Such sampling biases can cause deficiencies in the ecological validity of research claims (Olteanu et al., 2019), impaired performance of predictive models for non-majority subpopulations (Johnson et al., 2017), and, more broadly speaking, the failure of CSS models to generalise from sampled behaviours and opinions to the wider population (Blank, 2017; Hargittai & Litt, 2012; Hollingshead et al., 2021). This hampered generalisability can be especially damaging when the insights and results of CSS models, which oversample privileged subpopulations and thus disadvantage those missing from datasets, are applied willy-nilly to society as a whole and used to shape the policymaking approaches to solving real-world problems. As Hollingshead et al. (2021) put it, “the ethical concern here is that, as policymakers and corporate stakeholders continue to draw insights from big data, the world will be recursively fashioned into a space that reflects the material interests of the infinitely connected” (p. 173).Footnote 7
Another research inequity that can crop up within CSS methods and analytical approaches is aggregation bias (Mehrabi et al., 2021; Suresh & Guttag, 2021). This occurs when a model’s analysis is applied in a “one-size-fits-all” manner to subpopulations that have different conditional distributions, thereby treating the results as “population-level trends” that map inputs to outputs uniformly across groups despite their possession of diverging characteristics (Hollingshead et al., 2021; Suresh & Guttag, 2021). Such aggregation biases can lead models to fit optimally for dominant or privileged subpopulations that are oversampled while underperforming for groups that lack adequate representation. These biases can also conceal patterns of inequity and discrimination that are differentially distributed among subpopulations (Barocas & Selbst, 2016; boyd & Crawford, 2012; Hollingshead et al., 2021; Longo et al., 2017; Olteanu et al., 2019), consequently entrenching or even augmenting structural injustices that are hidden from view on account of the irresponsible statistical homogenisation of target populations.
A different set of research inequities arise within the wider field of CSS research as a consequence of material inequalities and capacity imbalances that exist between different research communities. Long-standing dynamics of global inequality, for instance, may undermine reciprocal sharing between research collaborators from high-income countries (HICs) and those from low-/middle-income countries (LMICs) (Leslie, 2020). Given asymmetries in resources, infrastructure, and research capabilities, data sharing between LMICs and HICs, and transnational research collaboration, can lead to inequity and exploitation (Bezuidenhout et al., 2017; Leonelli, 2013; Shrum, 2005). That is, data originators from LMICs may put immense amounts of effort and time into developing useful datasets (and openly share them) only to have their countries excluded from the benefits derived by researchers from HICs who have capitalised on such data in virtue of greater access to digital resources and compute infrastructure (World Health Organization, 2022). Moreover, data originators from LMICs may generate valuable datasets that they are then unable to independently and expeditiously utilise for needed research, because they lack the aptitudes possessed by researchers from HICs who are the beneficiaries of arbitrary asymmetries in education, training, and research capacitation (Bull et al., 2015; Merson et al., 2015).
This can create a twofold architecture of research inequity wherein the benefits of data production and sharing do not accrue to originating researchers and research subjects and the scientists from LMICs are put in a position of relative disadvantage vis-à-vis those from HICs whose research efficacy and ability to more rapidly convert data into insights function, in fact, to undermine the efforts of their disadvantaged research partners (Bezuidenhout et al., 2017; Crane, 2011). It is important to note, here, that such gaps in research resources and capabilities also exist within HICs where large research universities and technology corporations (as opposed to less well-resourced universities and companies) are well positioned to advance data research given their access to data and compute infrastructures (Ahmed & Wahed, 2020).
In redressing these access barriers, emphasis must be placed on “the social and material conditions under which data can be made useable, and the multiplicity of conversion factors required for researchers to engage with data” (Bezuidenhout et al., 2017, p. 473). Equalising know-how and capability is a vital counterpart to equalising access to resources, and both together are necessary preconditions of just research environments. CSS scholars engaging in international research collaborations should focus on forming substantively reciprocal partnerships where capacity-building and asymmetry-aware practices of cooperative innovation enable participatory parity and thus greater research access and equity.
4.3 Incorporating Habits of Responsible Research and Innovation into CSS Practices
The foregoing taxonomy of the five main ethical challenges faced by CSS is intended to provide CSS researchers with a critical lens that enables them to sharpen their field of vision so that they are equipped to engage in the sort of anticipatory reflection which roots out irresponsible research practices and harmful impacts. However, circumvention of the potential endurance of “research fast and break things” attitudes requires a deeper cultural transformation in the CSS community of practice. It requires the end-to-end incorporation of habits of Responsible Research and Innovation (RRI) into all its research activities. An RRI perspective provides CSS researchers with an awareness that all processes of scientific discovery and problem-solving possess sociotechnical aspects and ethical stakes. Rather than conceiving research as independent from human values, RRI regards these activities as ethically implicated social practices. For this reason, such practices are charged with a responsibility for critical self-reflection about the role that these values play both in discovery, engineering, and design processes and in considerations of the real-world effects of the insights and technologies that these processes yield.
Those who have been writing on the ethical dimension of CSS for the past decade have emphasised the importance of precisely these kinds of self-reflective research practices (for instance, British Sociological Association, 2016; Eynon et al., 2017 Franzke et al., 2020; Hollingshead et al., 2021; Lomborg, 2013; Markham & Buchanan, 2012; Moreno et al., 2013; Weinhardt, 2020). Reacting to recent miscarriages of research ethics that have undermined public trust, such as the 2016 mass sharing of sensitive personal information that had been extracted by researchers from the OKCupid dating site (Zimmer, 2016), they have stressed the need for “a bottom-up, case-based approach to research ethics, one that emphasizes that ethical judgment must be based on a sensible examination of the unique object and circumstances of a study, its research questions, the data involved, and the expected analysis and reporting of results, along with the possible ethical dilemmas arising from the case” (Lomborg, 2013, p. 20). What is needed to operationalise such a “bottom-up, case-based approach to research ethics” is the development across the CSS community of habits of RRI. In this section, we will explore how CSS practices can incorporate habits of RRI, focusing, in particular, on the role that contextual considerations, anticipatory reflection, public engagement, and justifiable action should play across the research lifecycle.
Building on research in Science and Technology Studies and Applied Technology Ethics, the RRI view of “science with and for society” has been transformed into helpful general guidance in such interventions as Engineering and Physical Sciences Research Council (EPSRC)’s 2013 AREA frameworkFootnote 8 and the 2014 Rome DeclarationFootnote 9 (Fisher & Rip, 2013; Owen, 2014; Owen et al., 2012, 2013; Stilgoe et al., 2013; von Schomberg, 2013). More recently, EPSRC’s AREA principles (anticipate, reflect, engage, act) have been extended into the fields of data science and AI by the CARE & Act Framework (consider context, anticipate impacts, reflect on purposes, positionality, and power, engage inclusively, act responsibly and transparently) (Leslie, 2020; Leslie et al., 2022b). The application of the CARE & Act principles to CSS aims to provide a handy tool that enables its researchers to continuously sense check the social and ethical implications of their research practices and that helps them to establish and sustain responsible habits of scientific investigation and reporting. Putting the CARE & Act Framework into practice involves taking its several guiding maxims as a launching pad for continuously reflective and deliberate choice-making across the research workflow. Let us explore each of these maxims in turn.
4.3.1 Consider Context
The imperative of considering context enjoins CSS researchers to think diligently about the conditions and circumstances surrounding their research activities and outputs. This involves focusing on the norms, values, and interests that inform the people undertaking the research and that shape and motivate the reasonable expectations of research subject and those who are likely to be impacted by the research and its results: How are these norms, values and interests influencing or steering the project and its outputs? How could they influence research subjects’ meaningful consent and expectations of privacy, confidentiality, and anonymity? How could they shape a research project’s reception and impacts across impacted communities? Considering context also involves taking into account the specific domain(s), geographical location(s), and jurisdiction(s) in which the research is situated and reflecting on the expectations of affected stakeholders that derive these specific contexts: How do the existing institutional norms and rules in a given domain or jurisdiction shape expectations regarding research goals, practices, and outputs? How do the unique social, cultural, legal, economic, and political environments in which different research projects are embedded influence the conditions of data generation, the intentions and behaviours of the research subjects that are captured by extracted data, and the space of possible inferences that data analytics, modelling, and simulation can yield?
The importance of responsiveness to context has been identified as significant in internet research ethics for nearly two decades (Buchanan, 2011; Markham, 2006) and has especially been emphasised more recently in the Internet Research: Ethical Guidelines 3.0 of the Association of Internet Researchers (AoIR), where the authors stress that a “basic ethical approach” involves focussing on “on the fine-grained contexts and distinctive details of each specific ethical challenge” (Franzke et al., 2020, p. 4).Footnote 10 For Franzke et al., such a
process- and context-oriented approach… helps counter a common presumption of “ethics” as something of a “one-off” tick-box exercise that is primarily an obstacle to research. On the contrary…taking on board an ongoing attention to ethics as inextricably interwoven with method often leads to better research as this attention entails improvements on both research design and its ethical dimensions throughout the course of a project. (pp. 4–5)
This ongoing attention entails a keen awareness of the need to “respect people’s values or expectations in different settings” (Eynon et al., 2017) as well as the need to acknowledge cultural differences, ethical pluralism, and diverging interpretations of moral values and concepts (Capurro, 2005, 2008; C. M. Ess, 2020; Hongladarom & Ess, 2007; Leslie et al., 2022a). Likewise, contextual considerations need to include a recognition of interjurisdictional differences in legal and regulatory requirements (for instance, variations in data protection laws and legal privacy protections across regions and countries whence digital trace data is collected).
All in all, contextual considerations should, at minimum, track three vectors: The first involves considering the contextual determinants of the condition of the production of the research (e.g., thinking about the positionality of the research team, the expectations of the relevant CSS community of practice, and the external influences on the aims and means of research by funders, collaborators, and providers of data and research infrastructure). The second involves considering the context of the subjects of research (e.g., thinking about research subjects’ reasonable expectations of gainful obscurity and “privacy in public” and considering the changing contexts of their communications such as with whom they are interacting, where, how, and what kinds of data are being shared). The third involves considering the contexts of the social, cultural, legal, economic, and political environments in which different research projects are embedded as well as the historical, geographic, sectoral, and jurisdictional specificities that configure such environments (e.g., thinking about the ways different social groups—both within and between cultures—understand and define key values, research variables, and studied concepts differently as well as the ways that these divergent understandings place limitations on what computational approaches to prediction, classification, modelling, and simulation can achieve).
4.3.2 Anticipate Impacts
The imperative of anticipating impacts enjoins CSS researchers to reflect on and assess the potential short-term and long-term effects their research may have on impacted individuals (e.g., research participants, data subjects, and the researchers themselves) and on affected communities and social groups, more broadly. The purpose of this kind of anticipatory reflection is to safeguard the sustainability of CSS projects across the entire research lifecycle. To ensure that the activities and outputs of CSS research remain socially and environmentally sustainable and support the sustainability of the communities they affect, researchers must proceed with a continuous responsiveness to the real-world impacts that their research could have. This entails concerted and stakeholder-involving exploration of the possible adverse and beneficial effects that could otherwise remain hidden from view if deliberate and structured processes for anticipating downstream impacts were not in place. Attending to sustainability, along these lines, also entails the iterative re-visitation and re-evaluation of impact assessments. To be sure, in its general usage, the word “sustainability” refers to the maintenance of and care for an object or endeavour over time. In the CSS context, this implies that building sustainability into a research project is not a “one-off” affair. Rather, carrying out an initial research impact assessment at the inception of a project is only a first, albeit critical, step in a much longer, end-to-end process of responsive re-evaluation and re-assessment. Such an iterative approach enables sustainability-aware researchers to pay continuous attention both to the dynamic and changing character of the research lifecycle and to the shifting conditions of the real-world environments in which studies are embedded.
This demand to anticipate research impacts is not new in the modern academy—especially in the biomedical and social sciences, where Institutional Review Board (IRB) processes for research involving human subjects have been in place for decades (Abbott & Grady, 2011; Grady, 2015). However, the novel human scale, breadth, and reach of CSS research, as well as the new (and often subtler) range of potential harms it poses to impacted individuals, communities, and the biosphere, call into question the adequacy of conventional IRB processes (Metcalf & Crawford, 2016). While the latter have been praised a necessary step forward in protecting the physical, mental, and moral integrity of human research subjects, building public trust in science, and institutionalising needed mechanisms for ethical oversight (Resnik, 2018), critics have also highlighted their unreliability, superficiality, narrowness, and inapplicability to the new set of information hazards posed by the processing of aggregated big data (Prunkl et al., 2021; Raymond, 2019).
A growing awareness of these deficiencies has generated an expanding interest in CSS-adjacent computational disciplines (like machine learning, artificial intelligence, and computational linguistics) to come up with more robust impact assessment regimes and ethics review processes (Hecht et al., 2021; Leins et al., 2020; Nanayakkara et al., 2021). For instance, in 2020, the NeurIPS (Neural Information Processing Systems) conference introduced a new ethics review protocol that required paper submissions to include an impact statement “discussing the broader impact of their work, including possible societal consequences—both positive and negative” (Neural Information Processing Systems Conference, 2020). Informatively, this protocol was converted into a responsible research practices checklist in 2021 (Neural Information Processing Systems, 2021) after technically oriented researchers protested that they lacked the training and guidance needed to carry out impact assessments effectively (Ashurst et al., 2021; Johnson, 2020; Prunkl et al., 2021). Though there has been recent progress made, in both AI and CSS research communities, to integrate some form of ethics training into professional development (Ashurst et al., 2020; Salganik & The Summer Institutes in Computational Social Science, n.d.) and to articulate guidelines for anticipating ethical impacts (Neural Information Processing Systems, 2022), there remains a lack of institutionalised instruction, codified guidance, and professional stewardship for research impact assessment processes. As an example, conferences such as International AAAI Conference on Web and Social Media, ICWSM (2022); International Conference on Machine Learning, ICML (2022); North American Chapter of the Association for Computational Linguistics, NAACL (2022); and Empirical Methods in Natural Language Processing, EMNLP ((2022) each require some form of research impact evaluation and ethical consideration, but aside from directing researchers to relevant professional guidelines and codes of conduct (e.g., from the Association for Computational Linguistics, ACL; Association for Computing Machinery, ACM; and Association for the Advancement of Artificial Intelligence, AAAI), there is scant direction on how to operationalise impact assessment processes (Prunkl et al., 2021).
What is missing from this patchwork of ethics review requirements and guidance is a set of widely accepted procedural mechanisms that would enable and standardise conscientious research impact assessment practices. To fill this gap, recent research into the governance practices needed to create responsible data research environments has called for a coherent, integrated, and holistic approach to impact assessment that includes several interrelated elements (Leslie, 2019, 2020; Leslie et al., 2021; Leslie et al., 2022c, 2022d, 2022e):
-
Stakeholder analysis: Diligent research impact assessment practices should include processes that allows researchers to identify and evaluate the salience and contextual characteristics of individuals or groups who may be affected by, or may affect, the research project under consideration (Mitchell et al., 2017; Reed et al., 2009; Schmeer, 1999; Varvasovszky & Brugha, 2000). Stakeholder analysis aims to help researchers understand the relevance of each identified stakeholder to their project and to its use contexts. It does this by providing a structured way to assess the relative interests, rights, vulnerabilities, and advantages of identified stakeholders as these characteristics may be impacted by, or may impact, the research.
-
Establishment of clear normative criteria for impact assessment: Effective research impact assessment practices should start from a clear set of ethical values or human rights criteria against which the potential impacts of a project on affected individuals and communities can be evaluated. Such criteria should provide common but non-exclusive point of departure for collective deliberation about the ethical permissibility of the research project under consideration. Adopting common normative criteria from the outset enables reciprocally respectful, sincere, and open discussion about the ethical challenges a research project may face by helping to create a shared vocabulary for informed dialogue and impact assessment. Such a common starting point also facilitates deliberation about how to balance ethical values when they come into tension.
-
Methodical evaluation of potential impacts: The actual research impact assessment process provides an opportunity for research teams (and engaged stakeholders, where deemed appropriate) to produce detailed evaluations of the potential and actual impacts that the project may have, to contextualise and corroborate potential harms and benefits, to make possible the collaborative assessment of the severity of potential adverse impacts identified, and to facilitate the co-design of an impact mitigation plan.
-
Impact mitigation planning: Once impacts have been evaluated and the severity of any potential harms assessed, impact prevention and mitigation planning should commence. Diligent impact mitigation planning begins with a scoping and prioritisation stage. Research team members (and engaged stakeholders, where appropriate) should go through all the identified potential adverse impacts and map out the interrelations and interdependencies between them as well as surrounding social factors (such as contextually specific stakeholder vulnerabilities and precariousness) that could make impact mitigation more challenging. Where prioritisation of prevention and mitigation actions is necessary (for instance, where delays in addressing a potential harm could reduce its remediability), decision-making should be steered by the relative severity of the impacts under consideration. As a general rule, while impact prevention and mitigation planning may involve prioritisation of actions, all potential adverse impacts must be addressed. When potential adverse impacts have been mapped out and organised, and mitigation actions have been considered, the research team (and engaged stakeholders, where appropriate) should begin co-designing an impact mitigation plan (IMP). The IMP will become the part of your transparent reporting methodology that specifies the actions and processes needed to address the adverse impacts which have been identified and that assign responsibility for the completions of these tasks and processes. As such, the IMP will serve a crucial documenting function.
-
Establishment of protocols for re-visitation and re-evaluation of the research impact assessment: Research impact assessments must pay continuous attention both to the dynamic and changing character of the research lifecycles and to the shifting conditions of the real-world environments in which research practices, results, and outputs are embedded. There are two sets of factors that should inform when and how often initial research impact assessments are re-visited to ensure that they remain adequately responsive to factors that could present new potential harms or significantly influence impacts that have been previously identified: (1) research workflow and production factors: Choices made at any point along the research workflow may affect the veracity of prior impact assessments, leading to a need for re-assessment, reconsideration, and amendment. For instance, research design choices could be made that were not anticipated in the initial impact assessment (such choices might include adjusting the variables that are included in the model, choosing more complex algorithms, or grouping variables in ways that may impact specific groups); (2) environmental factors, changes in project-relevant social, regulatory, policy, or legal environments (occurring during the time in which the research is taking place) may have a bearing on how well the resulting computational model works and on how the research outputs impact affected individuals and groups. Likewise, domain-level reforms, policy changes, or changes in data recording methods may take place in the population of concern in ways that affect whether the data used to train the model accurately portrays phenomena, populations, or related factors in an accurate manner.
4.3.3 Reflect on Purposes, Positionality, and Power
The foregoing elements of research impact assessment presuppose that the CSS researchers who undertake them also engage in reflexive practices that scrutinise the way potential perspectival limitations and power imbalances can exercise influence on the equity and integrity of research projects and on the motivations, interests, and aims that steer them. The imperative of reflecting on purposes, positionality, and power makes explicit the importance of this dimension of inward-facing reflection.
All individual human beings come from unique places, experiences, and life contexts that shape their perspectives, motivations, and purposes. Reflecting on these contextual attributes is important insofar as it can help researchers understand how their viewpoints might differ from those around them and, more importantly, from those who have diverging cultural and socioeconomic backgrounds and life experiences. Identifying and probing these differences enables individual researchers to better understand how their own backgrounds, for better or worse, frame the way they see others, the way they approach and solve problems, and the way they carry out research and engage in innovation. By undertaking such efforts to recognise social position and differential privilege, they may gain a greater awareness of their own personal biases and unconscious assumptions. This then can enable them to better discern the origins of these biases and assumptions and to confront and challenge them in turn.
Social scientists have long referred to this site of self-locating reflection as “positionality” (Bourke, 2014; Kezar, 2002; Merriam et al., 2001). When researchers take their own positionalities into account, and make this explicit, they can better grasp how the influence of their respective social and cultural positions potentially creates research strengths and limitations. On the one hand, one’s positionality—with respect to characteristics like ethnicity, race, age, gender, socioeconomic status, education and training levels, values, geographical background, etc.—can have a positive effect on an individual’s contributions to a research project; the uniqueness of each person’s lived experience and standpoint can play a constructive role in introducing insights and understandings that other team members do not have. On the other hand, one’s positionality can assume a harmful role when hidden biases and prejudices that derive from a person’s background, and from differential privileges and power imbalances, creep into decision-making processes undetected and subconsciously sway the purposes, trajectories, and approaches of research projects.Footnote 11
4.3.4 Engage Inclusively
While practices of inward-facing reflection on purposes, positionality, and power can strengthen the reflexivity, objectivity, and reasonableness of CSS research activities (D’Ignazio & Klein, 2020; Haraway, 1988; Harding, 1992, 1995, 2008, 2015), practices of outward-facing stakeholder engagement and community involvement can bolster a research project’s legitimacy, social license, and democratic governance as well as ensure that its outputs will possess an appropriate degree of public accountability and transparency. A diligent stakeholder engagement process can help research teams to identify stakeholder salience, undertake team positionality reflection, and facilitate proportionate community involvement and input throughout the research project workflow. This process can also safeguard the equity and the contextual accuracy of impact assessments and facilitate appropriate end-to-end processes of transparent project governance by supporting their iterative re-visitation and re-evaluation. Moreover, community-involving engagement processes can empower the public and the CSS community alike by introducing the transformative agency of “citizen science” into research processes (Albert et al., 2021; Sagarra et al., 2016; Tauginienė et al., 2020).
It is important to note, however, that all stakeholder engagement processes can run the risk either of being cosmetic or tokenistic tools employed to legitimate research projects without substantial and meaningful participation or of being insufficiently participatory, i.e., of being one-way information flows or nudging exercises that serve as public relations instruments (Arnstein, 1969; Tritter & McCallum, 2006). To avoid such hazards of superficiality, CSS researchers should shore up a proportionate approach to stakeholder engagement through deliberate and precise goal setting. Researchers should prioritise the establishment of clear and explicit stakeholder engagement objectives. Relevant questions to pose in establishing these goals include Why are we engaging with stakeholders? What do we envision the ideal purpose and the expected outcomes of engagement activities to be? How can we best drawn on the insights and lived experience of participants to inform and shape our research? Footnote 12
4.3.5 Act Transparently and Responsibly
The imperative of acting transparently and responsibly enjoins CSS researchers to marshal the habits of Responsible Research and Innovation cultivated in the CARE processes to produce research that prioritises data stewardship and that is robust, accountable, fair, non-discriminatory, explainable, reproducible, and replicable. While the mechanisms and procedures which are put in place to ensure that these normative goals are achieved will differ from project to project (based on the specific research contexts, research design, and research methods), all CSS researchers should incorporate the following priorities into their governance, self-assessment, and reporting practices:
-
Full documentation of data provenance, lineage, linkage, and sourcing: This involves keeping track of and documenting both responsible data management practices across the entire research lifecycle, from data extraction or procurement and data analysis, cleaning, and pre-processing to data use, retention, deletion, and updating (Bender & Friedman, 2018; Gebru et al., 2021; Holland et al., 2018). It also involves demonstrating that the data is ethically sourced, responsibly linked, and legally available for research purposes (Weinhardt, 2020) and making explicit measures taken to ensure data quality (source integrity and measurement accuracy, timeliness and recency, relevance, sufficiency of quantity, dataset representativeness), data integrity (attributability, consistency, completeness, contemporaneousness, traceability, and auditability), and FAIR data (findable, accessible, interoperable, and reusable).
-
Full documentation of privacy, confidentiality, consent, and data protection due diligence: This involves demonstrating that data has been handled securely and responsibly from beginning to end of the research lifecycle so that any potential breaches of confidentiality, privacy, and anonymity have been prevented and any risks of re-identification through triangulation and data linkage mitigated. Regardless of the jurisdictions of data collection and use, researchers should aim to optimally protect the rights and interests of research subjects by adhering to the highest standards of privacy preservation, data protection, and responsible data handling and storage such as those contained in the IRE 3.0 and the National Committee for Research Ethics in the Social Sciences and the Humanities (NESH) guidelines (Franzke et al., 2020; National Committee for Research Ethics in the Social Sciences and the Humanities (NESH), 2019). They should also demonstrate that they have sufficiently taken into account contextual factors in meeting the privacy expectations of observed research subjects (like who is involved in observed interactions, how and what type of information is exchanged, how sensitive it is perceived to be, and where and when such exchanges occur). Documentation should additionally include evidence that researchers have instituted proportionate protocols for attaining informed and meaningful consent that are appropriate to the specific contexts of the data extraction and use and that cohere with the reasonable expectations of targeted research subjects.
-
Transparent and accountable reporting of research processes and results and appropriate publicity of datasets: Research practices and methodological conduct should be carried out deliberately, transparently, and in accordance with recording protocols that enable the interpretability, reproducibility, and replicability of results. For prediction models, the documentation protocols presented in Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) provide a good starting point for best conduct guidelines in research reporting (Collins et al., 2015; Moons et al., 2015).Footnote 13 Following TRIPOD, transparent and accountable reporting should demonstrate diligent methodological conduct across all stages and elements of research. For prediction models, this includes clear descriptions of research participants, predictors, outcome variables, sample size, missing data, statistical analysis methods, model specification, model performance, model validation, model updating, and study limitations. While transparent research conduct can facilitate reproducibility and replicability, concerns about the privacy and anonymity of research subjects should also factor into how training data, models, and results are made available to the scientific community. This notwithstanding, CSS researchers should prioritise the publication of well-archived, high-quality, and accessible datasets that enable the replication of results and the advancement of further research (Hollingshead et al., 2021). They should also pursue research design, analysis, and reporting in an interpretability-aware manner that prioritises process transparency, the understandability of models, and the accessibility and explainability of the rationale behind their results.
-
An end-to-end process for bias self-assessment: This should cover all research stages as well as all sources of biases that could arise in the data; in the data collection; in the data pre-processing; in the organising, categorising, describing, annotating, structuring of data (text-as-data, in particular); and in research design and execution choices. Bias self-assessment processes should cover social, statistical, and cognitive biases (Leslie et al., 2022a). An end-to-end process for bias self-assessment should move across the research lifecycle, pinpointing specific forms of social, statistical, and cognitive bias that could arise at each stage (for instance, social biases like representation bias and label bias as well as statistical biases like missing data bias and measurement bias could arise in the data pre-processing stage of a research project).
4.4 Conclusion
This chapter has explored the spectrum of ethical challenges that CSS for policy faces across the myriad possibilities of its application. It has further elaborated on how these challenges can be met head-on only through the adoption of habits of RRI that are instantiated in end-to-end governance mechanisms which set up practical guardrails throughout the research lifecycle. As a quintessential social impact science, CSS for policy holds great promise to advance social justice, human flourishing, and biospheric sustainability. However, CSS is also an all-too-human science—conceived in particular social, cultural, and historical contexts and pursued amidst intractable power imbalances, structural inequities, and potential conflicts of interest. Its proponents, in both research and policymaking communities, must thus remain continuously self-critical about the role that values, interests, and power dynamics play in shaping mission-driven research. Likewise, they must vigilantly take heed of the complicated social and historical conditions surrounding the generation and construction of data as well as the way that the activities and theories of CSS researchers can function to reformat, reorganise, and shape the phenomena that they purport only to measure and analyse. Such a continuous labour of exposing and redressing the often-concealed interdependencies that exist between CSS and the social environments in which its research activities, subject matters, and outputs are embedded will only strengthen its objectivity and ensure that its impacts are equitable, ethical, and responsible. Such a human-centred approach will make CSS for policy a “science with and for society” second-to-none.
Notes
- 1.
See, for instance, the series of Association of Internet Researchers (AoIR) guidelines on internet research ethics published in 2002, 2012, and 2019 as well as the British Sociological Association (BSA) guidance. For scholarly interventions, see (Collmann & Matei, 2016; Dobrick et al., 2018; Ess & Jones, 2004; Eynon et al., 2017; Franzke et al., 2020; Giglietto et al., 2012; Hollingshead et al., 2021; Lomborg, 2013; Markham & Buchanan, 2012; Moreno et al., 2013; Salganik, 2019; Weinhardt, 2020).
- 2.
For example, Across the four volumes of Nigel Gilbert’s magisterial Computational Social Science (2010), none of the 66 contributing chapters are dedicated to ethics. Likewise, no explicit mention or discussion of research ethics appears in Conte et al. (2012). There are only two passing mentions of ethics in the 10 chapters of Cioffi-Revilla’s substantial Introduction to Computational Social Science (2014), and the word “ethics” also appears only twice (and only in the final chapter) of Chen’s edited volume, Big Data for the Computational Social Sciences and Humanities (2018).
- 3.
As Sorell (2013) has argued, scientism is typified by the privileging of natural or exact scientific language, knowledge, and methods over those of other branches of learning and culture, especially those of the “human sciences” like philosophy, ethics, history, anthropology, and sociology. Such a privileging of exact scientific “ideas, methods, practices, and attitudes” can be especially damaging where these are extended “to matters of human social and political concern” (Olson, 2008, p. 1)—matters that require an understanding of subtle historical, ethical, and sociocultural contexts, contending human values, norms, and purposes, and subjective meaning-complexes of action and interaction (Apel, 1984; Habermas, 1988; Taylor, 2021; von Wright, 2004; Weber, 1978; Wittgenstein, 2009).
- 4.
The use of the terms ‘digital’ and ‘digitalised’ follows Lazer & Radford (2017).
- 5.
- 6.
- 7.
A similar and compounding form of sampling bias can occur when survey data is linked, through participant consent, to digital trace data from social media networks. Here the dynamic of non-random self-selection manifests in the select group of research subjects (likely those who are privileged and young and more frequently male) who have social media accounts and who consent to having them linked to the survey research (Al Baghal et al., 2020; Stier et al., 2020).
- 8.
- 9.
- 10.
It is important to note that the importance of contextual considerations has also been present in earlier versions of the AoIR guidelines which date back two decades (Internet Research Ethics—IRE 1.0, 2002; Internet Research Ethics-IRE 2.0, 2012).
- 11.
When taking positionality into account, researchers should reflect on their own positionality matrix. They should ask: to what extent do my personal characteristics, group identifications, socioeconomic status, educational, training, and work background, team composition, and institutional frame represent sources of power and advantage or sources of marginalisation and disadvantage? How does this positionality influence my (and my research team’s) ability to identify and understand affected stakeholders and the potential impacts of my project? For details on this process see Leslie et al. (2022b).
- 12.
An elaboration on the essential components of a responsible stakeholder engagement process can be found in Leslie et al. (2022b).
- 13.
Though the TRIPOD method is intended to be applied in the medical domain, its reporting protocols are largely applicable to CSS studies.
References
Abbott, L., & Grady, C. (2011). A systematic review of the empirical literature evaluating IRBs: What we know and what we still need to learn. Journal of Empirical Research on Human Research Ethics, 6(1), 3–19. https://doi.org/10.1525/jer.2011.6.1.3
Abid, A., Farooqi, M., & Zou, J. (2021). Persistent Anti-Muslim Bias in Large Language Models. Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 298–306. https://doi.org/10.1145/3461702.3462624
Agniel, D., Kohane, I. S., & Weber, G. M. (2018). Biases in electronic health record data due to processes within the healthcare system: Retrospective observational study. BMJ, 361, k1479. https://doi.org/10.1136/bmj.k1479
Agüera y Arcas, B., Mitchell, M., & Todorov, A. (2017, May 7). Physiognomy’s New Clothes. Medium. https://medium.com/@blaisea/physiognomys-new-clothes-f2d4b59fdd6a
Ahmed, N., & Wahed, M. (2020). The De-democratization of AI: Deep learning and the compute divide in artificial intelligence research. Cornell University Library, arXiv.org. https://doi.org/10.48550/ARXIV.2010.15581
Aizenberg, E., & van den Hoven, J. (2020). Designing for human rights in AI. Big Data & Society, 7(2), 205395172094956. https://doi.org/10.1177/2053951720949566
Ajunwa, I., Crawford, K., & Schultz, J. (2017). Limitless worker surveillance. California Law Review, 105, 735. https://doi.org/10.15779/Z38BR8MF94
Akhtar, P., & Moore, P. (2016). The psychosocial impacts of technological change in contemporary workplaces, and trade union responses. International Journal of Labour Research, 8(1/2), 101.
Al Baghal, T., Sloan, L., Jessop, C., Williams, M. L., & Burnap, P. (2020). Linking twitter and survey data: The impact of survey mode and demographics on consent rates across three UK studies. Social Science Computer Review, 38(5), 517–532. https://doi.org/10.1177/0894439319828011
Albert, A., Balázs, B., Butkevičienė, E., Mayer, K., & Perelló, J. (2021). Citizen social science: New and established approaches to participation in social research. In K. Vohland, A. Land-Zandstra, L. Ceccaroni, R. Lemmens, J. Perelló, M. Ponti, R. Samson, & K. Wagenknecht (Eds.), The science of citizen science (pp. 119–138). Springer International Publishing. https://doi.org/10.1007/978-3-030-58278-4_7
Amodei, D., & Hernandez, D. (2018, May 16). AI and Compute. OpenAI. https://openai.com/blog/ai-and-compute/
Amoore, L. (2021). The deep border. Political Geography, 102547. https://doi.org/10.1016/j.polgeo.2021.102547
Anderson, C. (2008, June 23). The end of theory: The data deluge makes the scientific method obsolete. Wired Magazine. https://www.wired.com/2008/06/pb-theory/
Anderson, J. R., & Lebiere, C. (1998). The atomic components of thought. Lawrence Erlbaum Associates.
Andrejevic, M., & Selwyn, N. (2020). Facial recognition technology in schools: Critical questions and concerns. Learning, Media and Technology, 45(2), 115–128. https://doi.org/10.1080/17439884.2020.1686014
Apel, K.-O. (1984). Understanding and explanation: A transcendental-pragmatic perspective. MIT Press.
Arnstein, S. R. (1969). A ladder of citizen participation. Journal of the American Institute of Planners, 35(4), 216–224. https://doi.org/10.1080/01944366908977225
Ashurst, C., Barocas, S., Campbell, R., Raji, D., & Russell, S. (2020). Navigating the broader impacts of AI research. https://aibroader-impacts-workshop.github.io/
Ashurst, C., Hine, E., Sedille, P., & Carlier, A. (2021). AI ethics statements—Analysis and lessons learnt from NeurIPS broader impact statements. ArXiv: 2111.01705 [Cs]. http://arxiv.org/abs/2111.01705
Ball, K. (2009). Exposure: Exploring the subject of surveillance. Information, Communication & Society, 12(5), 639–657. https://doi.org/10.1080/13691180802270386
Ball, K. (2019). Review of Zuboff’s the age of surveillance capitalism. Surveillance & Society, 17(1/2), 252–256. 10.24908/ss.v17i1/2.13126.
Banjanin, N., Banjanin, N., Dimitrijevic, I., & Pantic, I. (2015). Relationship between internet use and depression: Focus on physiological mood oscillations, social networking and online addictive behavior. Computers in Human Behavior, 43, 308–312. https://doi.org/10.1016/j.chb.2014.11.013
Barocas, S., & Selbst, A. D. (2016). Big data’s disparate impact. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.2477899
Barrett, L. F., Adolphs, R., Marsella, S., Martinez, A. M., & Pollak, S. D. (2019). Emotional expressions reconsidered: Challenges to inferring emotion from human facial movements. Psychological Science in the Public Interest, 20(1), 1–68. https://doi.org/10.1177/1529100619832930
Barry, C. T., Sidoti, C. L., Briggs, S. M., Reiter, S. R., & Lindsey, R. A. (2017). Adolescent social media use and mental health from adolescent and parent perspectives. Journal of Adolescence, 61(1), 1–11. https://doi.org/10.1016/j.adolescence.2017.08.005
Beer, D. (2017). The social power of algorithms. Information, Communication & Society, 20(1), 1–13. https://doi.org/10.1080/1369118X.2016.1216147
Bender, E. M., & Friedman, B. (2018). Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6, 587–604. https://doi.org/10.1162/tacl_a_00041
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623. https://doi.org/10.1145/3442188.3445922
Benjamin, R. (2019). Race after technology: Abolitionist tools for the new Jim code. Polity.
Bezuidenhout, L. M., Leonelli, S., Kelly, A. H., & Rappert, B. (2017). Beyond the digital divide: Towards a situated approach to open data. Science and Public Policy, 44(4), 464–475. https://doi.org/10.1093/scipol/scw036
Blank, G. (2017). The digital divide among twitter users and its implications for social research. Social Science Computer Review, 35(6), 679–697. https://doi.org/10.1177/0894439316671698
Blank, G., & Lutz, C. (2017). Representativeness of social Media in Great Britain: Investigating Facebook, LinkedIn, Twitter, Pinterest, Google+, and Instagram. American Behavioral Scientist, 61(7), 741–756. https://doi.org/10.1177/0002764217717559
Bogost, I. (2015, January 15). The Cathedral of computation. The Atlantic. https://www.theatlantic.com/technology/archive/2015/01/the-cathedral-of-computation/384300/
Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V., & Kalai, A. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. ArXiv:1607.06520 [Cs, Stat]. http://arxiv.org/abs/1607.06520
Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon, R., Chatterji, N., Chen, A., Creel, K., Davis, J. Q., Demszky, D., et al. (2021). On the opportunities and risks of foundation models. Cornell University Library, arXiv.org. https://doi.org/10.48550/ARXIV.2108.07258
Botan, C. (1996). Communication work and electronic surveillance: A model for predicting panoptic effects. Communication Monographs, 63(4), 293–313. https://doi.org/10.1080/03637759609376396
Botan, & McCreadie. (1990). Panopticon: Workplace of the information society. International Communication Association Conference, Dublin, Ireland.
Bourke, B. (2014). Positionality: Reflecting on the research process. The Qualitative Report, 19, 1. https://doi.org/10.46743/2160-3715/2014.1026
boyd, danah. (2011). White flight in networked publics? How race and class shaped American teen engagement with MySpace and Facebook. In Race After the Internet (pp. 203–222). Routledge.
boyd, d., & Crawford, K. (2012). Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society, 15(5), 662–679. https://doi.org/10.1080/1369118X.2012.678878
Brayne, S. (2020). Predict and Surveil: Data, discretion, and the future of policing (1st ed.). Oxford University Press. https://doi.org/10.1093/oso/9780190684099.001.0001
Breiman, L. (2001). Statistical Modeling: The two cultures (with comments and a rejoinder by the author). Statistical Science, 16(3), 199. https://doi.org/10.1214/ss/1009213726
British Sociological Association. (2016). Ethics guidelines and collated resources for digital research. Statement of ethical practice annexe. https://www.britsoc.co.uk/media/24309/bsa_statement_of_ethical_practice_annexe.pdf
Bu, Z., Xia, Z., & Wang, J. (2013). A sock puppet detection algorithm on virtual spaces. Knowledge-Based Systems, 37, 366–377. https://doi.org/10.1016/j.knosys.2012.08.016
Buchanan, E. A. (2011). Internet research ethics: Past, present, and future. In M. Consalvo & C. Ess (Eds.), The handbook of internet studies (pp. 83–108). Wiley-Blackwell. https://doi.org/10.1002/9781444314861.ch5
Bull, S., Cheah, P. Y., Denny, S., Jao, I., Marsh, V., Merson, L., Shah More, N., Nhan, L. N. T., Osrin, D., Tangseefa, D., Wassenaar, D., & Parker, M. (2015). Best practices for ethical sharing of individual-level Health Research data from low- and middle-income settings. Journal of Empirical Research on Human Research Ethics, 10(3), 302–313. https://doi.org/10.1177/1556264615594606
Caldarelli, G., Wolf, S., & Moreno, Y. (2018). Physics of humans, physics for society. Nature Physics, 14(9), 870–870. https://doi.org/10.1038/s41567-018-0266-x
Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183–186. https://doi.org/10.1126/science.aal4230
Capurro, R. (2005). Privacy. An intercultural perspective. Ethics and Information Technology, 7(1), 37–47. https://doi.org/10.1007/s10676-005-4407-4
Capurro, R. (2008). Intercultural information ethics: Foundations and applications. Journal of Information, Communication and Ethics in Society, 6(2), 116–126. https://doi.org/10.1108/14779960810888347
Cardon, D. (2016). Deconstructing the algorithm: Four types of digital information calculations. In R. Seyfert & J. Roberge (Eds.), Algorithmic cultures (pp. 95–110). Routledge. http://spire.sciencespo.fr/hdl:/2441/19a26i12vl9epootg7j45rfpmk
Carpentier, N. (2011). Media and participation: A site of ideological-democratic struggle. Intellect Ltd. https://doi.org/10.26530/OAPEN_606390
Chen, S.-H. (Ed.). (2018). Big data in computational social science and humanities (1st ed.). Springer. https://doi.org/10.1007/978-3-319-95465-3
Chen, Z., & Whitney, D. (2019). Tracking the affective state of unseen persons. Proceedings of the National Academy of Sciences, 116(15), 7559–7564. https://doi.org/10.1073/pnas.1812250116
Cioffi-Revilla, C. (2014). Introduction to computational social science. Springer London. https://doi.org/10.1007/978-1-4471-5661-1
Cohen, J. E. (2019a). Between truth and power: The legal constructions of informational capitalism. Oxford University Press.
Cohen, J. E. (2019b). Review of Zuboff’s the age of surveillance capitalism. Surveillance & Society, 17(1/2), 240–245. https://doi.org/10.24908/ss.v17i1/2.13144
Cohen, R., & Ruths, D. (2013). Classifying political orientation on twitter: It’s not easy! Proceedings of the International AAAI Conference on Web and Social Media, 7(1), 91–99.
Collins, G. S., Reitsma, J. B., Altman, D. G., & Moons, K. (2015). Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. BMC Medicine, 13(1), 1. https://doi.org/10.1186/s12916-014-0241-z
Collmann, J., & Matei, S. A. (2016). Ethical reasoning in big data: An exploratory analysis (1st ed.). Springer. https://doi.org/10.1007/978-3-319-28422-4
Conte, R., Gilbert, N., Bonelli, G., Cioffi-Revilla, C., Deffuant, G., Kertesz, J., Loreto, V., Moat, S., Nadal, J.-P., Sanchez, A., Nowak, A., Flache, A., San Miguel, M., & Helbing, D. (2012). Manifesto of computational social science. The European Physical Journal Special Topics, 214(1), 325–346. https://doi.org/10.1140/epjst/e2012-01697-8
Cosentino, G. (2020). Social media and the post-truth world order: The global dynamics of disinformation. Springer International Publishing. https://doi.org/10.1007/978-3-030-43005-4
Cottom, T. M. (2016). Black cyberfeminism: Intersectionality, institutions and digital sociology. Policy Press.
Crane, J. (2011). Scrambling for Africa? Universities and global health. The Lancet, 377(9775), 1388–1390. https://doi.org/10.1016/S0140-6736(10)61920-4
Crawford, K. (2014, May 30). The anxieties of big data. The New Inquiry. https://thenewinquiry.com/the-anxieties-of-big-data/
D’Ancona, M. (2017). Post truth: The new war on truth and how to fight back. Ebury Press.
De Cleen, B., & Carpentier, N. (2008). Introduction: Blurring participations and convergences. In N. Carpentier & B. De Cleen (Eds.), Participation and media production. Critical reflections on content creation (pp. 1–12). Cambridge Scholars Publishing.
de Montjoye, Y.-A., Radaelli, L., Singh, V. K., & “Sandy” Pentland, A. (2015). Unique in the shopping mall: On the reidentifiability of credit card metadata. Science, 347(6221), 536–539. https://doi.org/10.1126/science.1256297
Dean, J. (2010). Blog theory: Feedback and capture in the circuits of drive. Polity Press.
Dewey, J. (1938). Logic: The theory of inquiry. Holt, Richart and Winston.
D’Ignazio, C., & Klein, L. F. (2020). Data feminism. The MIT Press.
Dobrick, F. M., Fischer, J., & Hagen, L. M. (Eds.). (2018). Research ethics in the digital age. Springer Fachmedien Wiesbaden. https://doi.org/10.1007/978-3-658-12909-5
Engel, U. (2021). Causal and predictive modeling in computational social science. In I. U. Engel, A. Quan-Haase, S. X. Liu, & L. Lyberg (Eds.), Handbook of computational social science, volume 1 (1st ed., pp. 131–149). Routledge. https://doi.org/10.4324/9781003024583-10
Ess, C., & Jones, S. (2004). Ethical decision-making and Internet research: Recommendations from the aoir ethics working committee. In Readings in virtual research ethics: Issues and controversies (pp. 27–44). IGI Global.
Ess, C. M. (2020). Interpretative pros hen pluralism: From computer-mediated colonization to a pluralistic intercultural digital ethics. Philosophy & Technology, 33(4), 551–569. https://doi.org/10.1007/s13347-020-00412-9
Eynon, R., Fry, J., & Schroeder, R. (2017). The ethics of online research. In I. N. Fielding, R. Lee, & G. Blank (Eds.), The SAGE handbook of online research methods (pp. 19–37). SAGE Publications, Ltd. https://doi.org/10.4135/9781473957992.n2
Feenberg, A. (1999). Questioning technology. Routledge.
Feenberg, A. (2002). Transforming technology: A critical theory revisited. Oxford University Press.
Ferrara, E. (2015). Manipulation and abuse on social media. Cornell University Library, arXiv.org. https://doi.org/10.48550/ARXIV.1503.03752
Ferrara, E., Varol, O., Davis, C., Menczer, F., & Flammini, A. (2016). The rise of social bots. Communications of the ACM, 59(7), 96–104. https://doi.org/10.1145/2818717
Fisher, E., & Rip, A. (2013). Responsible innovation: Multi-level dynamics and soft intervention practices. In R. Owen, J. Bessant, & M. Heintz (Eds.), Responsible Innovation (pp. 165–183). Wiley. https://doi.org/10.1002/9781118551424.ch9
Fourcade, M., & Gordon, J. (2020). Learning like a state: Statecraft in the digital age. Journal of Law and Political Economy, 1(1). https://doi.org/10.5070/LP61150258
Franzke, Aline Shakti, Bechmann, A., Zimmer, M., Ess, C. M., & the Association of Internet Researchers. (2020). Internet research: Ethical guidelines 3.0. https://aoir.org/reports/ethics3.pdf
Friedline, T., Naraharisetti, S., & Weaver, A. (2020). Digital redlining: Poor rural communities’ access to fintech and implications for financial inclusion. Journal of Poverty, 24(5–6), 517–541. https://doi.org/10.1080/10875549.2019.1695162
Friemel, T. N. (2016). The digital divide has grown old: Determinants of a digital divide among seniors. New Media & Society, 18(2), 313–331. https://doi.org/10.1177/1461444814538648
Fuchs, C. (2018). ‘Dear Mr. Neo-Nazi, Can You Please Give Me Your Informed Consent So That I Can Quote Your Fascist Tweet?’: Questions of social media research ethics in online ideology critique. In G. Meikle (Ed.), The Routledge companion to media and activism. Routledge.
Fuchs, C. (2021). Social media: A critical introduction (3rd ed.). SAGE.
Garrido-Muñoz, I., Montejo-Ráez, A., Martínez-Santiago, F., & Ureña-López, L. A. (2021). A survey on bias in deep NLP. Applied Sciences, 11(7), 3184. https://doi.org/10.3390/app11073184
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Iii, H. D., & Crawford, K. (2021). Datasheets for datasets. Communications of the ACM, 64(12), 86–92. https://doi.org/10.1145/3458723
Gifford, C. (2020, June 15). The problem with emotion-detection technology. The New Economy. https://www.theneweconomy.com/technology/the-problem-with-emotion-detection-technology
Giglietto, F., Rossi, L., & Bennato, D. (2012). The open laboratory: Limits and possibilities of using Facebook, twitter, and YouTube as a research data source. Journal of Technology in Human Services, 30(3–4), 145–159. https://doi.org/10.1080/15228835.2012.743797
Gilbert, G. N. (Ed.). (2010). Computational Social Science. SAGE.
Gillespie, T. (2014). The relevance of algorithms. In T. Gillespie, P. J. Boczkowski, & K. A. Foot (Eds.), Media technologies (pp. 167–194). The MIT Press. https://doi.org/10.7551/mitpress/9780262525374.003.0009
Goel, V. (2014, June 29). Facebook tinkers with users’ emotions in news feed experiment, stirring outcry. The New York Times. https://www.nytimes.com/2014/06/30/technology/facebook-tinkers-with-users-emotions-in-news-feed-experiment-stirring-outcry.html
Grady, C. (2015). Institutional review boards. Chest, 148(5), 1148–1155. https://doi.org/10.1378/chest.15-0706
Grimmelmann, J. (2015). The law and ethics of experiments on social media users. Colorado Technology Law Journal, 13, 219.
Gupta, P., Srinivasan, B., Balasubramaniyan, V., & Ahamad, M. (2015). Phoneypot: Data-driven understanding of telephony threats. In: Proceedings 2015 Network and Distributed System Security Symposium. Network and Distributed System Security Symposium, San Diego, CA. https://doi.org/10.14722/ndss.2015.23176
Gupta, U., Kim, Y. G., Lee, S., Tse, J., Lee, H.-H. S., Wei, G.-Y., Brooks, D., & Wu, C.-J. (2020). Chasing carbon: The elusive environmental footprint of computing. Cornell University Library, arXiv.org. https://doi.org/10.48550/ARXIV.2011.02839
Habermas, J. (1988). On the logic of the social sciences. MIT Pr.
Haight, M., Quan-Haase, A., & Corbett, B. A. (2014). Revisiting the digital divide in Canada: The impact of demographic factors on access to the internet, level of online activity, and social networking site usage. Information, Communication & Society, 17(4), 503–519. https://doi.org/10.1080/1369118X.2014.891633
Halbertal, M. (2015, November 11). The Dewey lecture: Three concepts of human dignity. https://www.law.uchicago.edu/news/dewey-lecture-three-concepts-human-dignity
Haraway, D. (1988). Situated knowledges: The science question in feminism and the privilege of partial perspective. Feminist Studies, 14(3), 575. https://doi.org/10.2307/3178066
Harding, S. (1992). Rethinking standpoint epistemology: What is ‘strong objectivity?’. The Centennial Review, 36(3), 437–470. JSTOR.
Harding, S. (1995). ‘Strong objectivity’: A response to the new objectivity question. Synthese, 104(3), 331–349. https://doi.org/10.1007/BF01064504
Harding, S. G. (2008). Sciences from below: Feminisms, postcolonialities, and modernities. Duke University Press.
Harding, S. G. (2015). Objectivity and diversity: Another logic of scientific research. The University of Chicago Press.
Hargittai, E. (2015). Is bigger always better? Potential biases of big data derived from social network sites. The Annals of the American Academy of Political and Social Science, 659(1), 63–76. https://doi.org/10.1177/0002716215570866
Hargittai, E. (2020). Potential biases in big data: Omitted voices on social media. Social Science Computer Review, 38(1), 10–24.
Hargittai, E., & Litt, E. (2012). Becoming a tweep: How prior online experiences influence Twitter use. Information, Communication & Society, 15(5), 680–702. https://doi.org/10.1080/1369118X.2012.666256
Harsin, J. (2018). Post-truth and critical communication studies. In J. Harsin (Ed.), Oxford research Encyclopedia of communication. Oxford University Press. https://doi.org/10.1093/acrefore/9780190228613.013.757
Healy, K. (2015). The performativity of networks. European Journal of Sociology, 56(2), 175–205. https://doi.org/10.1017/S0003975615000107
Hecht, B., Wilcox, L., Bigham, J. P., Schöning, J., Hoque, E., Ernst, J., Bisk, Y., De Russis, L., Yarosh, L., Anjum, B., Contractor, D., & Wu, C. (2021). It’s time to do something: Mitigating the negative impacts of computing through a change to the peer review process. ArXiv:2112.09544 [Cs]. http://arxiv.org/abs/2112.09544
Helbing, D., Frey, B. S., Gigerenzer, G., Hafen, E., Hagner, M., Hofstetter, Y., van den Hoven, J., Zicari, R. V., & Zwitter, A. (2019). Will democracy survive big data and artificial intelligence? In D. Helbing (Ed.), Towards digital enlightenment (pp. 73–98). Springer International Publishing. https://doi.org/10.1007/978-3-319-90869-4_7
Henderson, M., Johnson, N. F., & Auld, G. (2013). Silences of ethical practice: Dilemmas for researchers using social media. Educational Research and Evaluation, 19(6), 546–560. https://doi.org/10.1080/13803611.2013.805656
Hern, A. (2021, September 8). Study finds growing government use of sensitive data to ‘nudge’ behaviour. The Guardian. https://www.theguardian.com/technology/2021/sep/08/study-finds-growing-government-use-of-sensitive-data-to-nudge-behaviour#:~:text=Study%20finds%20growing%20government%20use%20of%20sensitive%20data%20to%20'nudge'%20behaviour,-This%20article%20is&text=A%20new%20form%20of%20%E2%80%9Cinfluence,tech%20firms%2C%20researchers%20have%20warned
Hindman, M. (2015). Building better models: Prediction, replication, and machine learning in the social sciences. The Annals of the American Academy of Political and Social Science, 659(1), 48–62. https://doi.org/10.1177/0002716215570279
Hoegen, R., Gratch, J., Parkinson, B., & Shore, D. (2019). Signals of emotion regulation in a social dilemma: Detection from face and context. In: 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII) (pp. 1–7). https://doi.org/10.1109/ACII.2019.8925478
Hofman, J. M., Watts, D. J., Athey, S., Garip, F., Griffiths, T. L., Kleinberg, J., Margetts, H., Mullainathan, S., Salganik, M. J., Vazire, S., Vespignani, A., & Yarkoni, T. (2021). Integrating explanation and prediction in computational social science. Nature, 595(7866), 181–188. https://doi.org/10.1038/s41586-021-03659-0
Holland, S., Hosny, A., Newman, S., Joseph, J., & Chmielinski, K. (2018). The dataset nutrition label: A framework to drive higher data quality standards. ArXiv:1805.03677 [Cs]. http://arxiv.org/abs/1805.03677
Hollingshead, W., Quan-Haase, A., & Chen, W. (2021). Ethics and privacy in computational social science. A call for pedagogy. Handbook of Computational Social Science, 1, 171–185.
Hongladarom, S., & Ess, C. (2007). Information technology ethics: Cultural perspectives. IGI Global. https://doi.org/10.4018/978-1-59904-310-4
Iyyer, M., Enns, P., Boyd-Graber, J., & Resnik, P. (2014). Political ideology detection using recursive neural networks. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1113–1122). https://doi.org/10.3115/v1/P14-1105
James, G., Witten, D., Hastie, T., & Tibshirani, R. (Eds.). (2013). An introduction to statistical learning: With applications in R. Springer.
Jiang, M. (2013). Internet sovereignty: A new paradigm of internet governance. In M. Haerens & M. Zott (Eds.), Internet censorship (opposing viewpoints series) (pp. 23–28). Greenhaven Press.
John, N. A. (2013). Sharing and Web 2.0: The emergence of a keyword. New Media & Society, 15(2), 167–182. https://doi.org/10.1177/1461444812450684
Johnson, I., McMahon, C., Schöning, J., & Hecht, B. (2017). The effect of population and ‘structural’ biases on social media-based algorithms: A case study in geolocation inference across the urban-rural spectrum. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (pp. 1167–1178). https://doi.org/10.1145/3025453.3026015
Johnson, K. (2020, February 24). NeurIPS requires AI researchers to account for societal impact and financial conflicts of interest. Venturebeat. https://venturebeat.com/ai/neurips-requires-ai-researchers-to-account-for-societal-impact-and-financial-conflicts-of-interest/
Joinson, A. N., Woodley, A., & Reips, U.-D. (2007). Personalization, authentication and self-disclosure in self-administered internet surveys. Computers in Human Behavior, 23(1), 275–285. https://doi.org/10.1016/j.chb.2004.10.012
Kellogg, K. C., Valentine, M. A., & Christin, A. (2020). Algorithms at work: The new contested terrain of control. Academy of Management Annals, 14(1), 366–410. https://doi.org/10.5465/annals.2018.0174
Kezar, A. (2002). Reconstructing static images of leadership: An application of positionality theory. Journal of Leadership Studies, 8(3), 94–109. https://doi.org/10.1177/107179190200800308
Kitchin, R. (2014). Big data, new epistemologies and paradigm shifts. Big Data & Society, 1(1), 205395171452848. https://doi.org/10.1177/2053951714528481
Kraut, R., Olson, J., Banaji, M., Bruckman, A., Cohen, J., & Couper, M. (2004). Psychological research online: Report of Board of Scientific Affairs’ advisory group on the conduct of research on the internet. American Psychologist, 59(2), 105–117. https://doi.org/10.1037/0003-066X.59.2.105
Lannelongue, L., Grealey, J., & Inouye, M. (2021). Green algorithms: Quantifying the carbon footprint of computation. Advanced Science, 8(12), 2100707. https://doi.org/10.1002/advs.202100707
Lawrence, H. M. (2021). Siri Disciplines. In T. S. Mullaney, B. Peters, M. Hicks, & K. Philip (Eds.), Your computer is on fire (pp. 179–198). The MIT Press. https://doi.org/10.7551/mitpress/10993.003.0013
Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu: Traps in big data analysis. Science, 343(6176), 1203–1205.
Lazer, D. M. J., Pentland, A., Watts, D. J., Aral, S., Athey, S., Contractor, N., Freelon, D., Gonzalez-Bailon, S., King, G., Margetts, H., Nelson, A., Salganik, M. J., Strohmaier, M., Vespignani, A., & Wagner, C. (2020). Computational social science: Obstacles and opportunities. Science, 369(6507), 1060–1062. https://doi.org/10.1126/science.aaz8170
Lazer, D., & Radford, J. (2017). Data ex machina: Introduction to big data. Annual Review of Sociology, 43(1), 19–39. https://doi.org/10.1146/annurev-soc-060116-053457
Leins, K., Lau, J. H., & Baldwin, T. (2020). Give me convenience and give her death: Who should decide what uses of NLP are appropriate, and on what basis? ArXiv:2005.13213 [Cs]. http://arxiv.org/abs/2005.13213
Leonelli, S. (2013). Why the current insistence on open access to scientific data? Big data, knowledge production, and the political economy of contemporary biology. Bulletin of Science, Technology & Society, 33(1–2), 6–11. https://doi.org/10.1177/0270467613496768
Leonelli, S. (2021). Data science in times of pan(dem)ic. Harvard Data Science Review. https://doi.org/10.1162/99608f92.fbb1bdd6
Leslie, D. (2019). Understanding artificial intelligence ethics and safety. ArXiv:1906.05684 [Cs, Stat]. doi:https://doi.org/10.5281/zenodo.3240529
Leslie, D. (2020). Tackling COVID-19 through responsible AI innovation: Five steps in the right direction. Harvard Data Science Review. https://doi.org/10.1162/99608f92.4bb9d7a7
Leslie, D., Burr, C., Aitken, M., Katell, M., Briggs, M., & Rincón, C. (2021). Human rights, democracy, and the rule of law assurance framework: A proposal. The Alan Turing Institute. https://doi.org/10.5281/zenodo.5981676
Leslie, D., Katell, M., Aitken, M., Singh, J., Briggs, M., Powell, R., Rincón, C., Chengeta, T., Birhane, A., Perini, A., Jayadeva, S., & Mazumder, A. (2022a). Advancing data justice research and practice: An integrated literature review. Zenodo. https://doi.org/10.5281/ZENODO.6408304
Leslie, D., Katell, M., Aitken, M., Singh, J., Briggs, M., Powell, R., Rincón, C., Perini, A., Jayadeva, S., & Burr, C. (2022c). Data justice in practice: A guide for developers. arXiv.org. https://doi.org/10.5281/ZENODO.6428185
Leslie, D., Rincón, C., Burr, C., Aitken, M., Katell, M., & Briggs, M. (2022b). AI fairness in practice. The Alan Turing Institute and the UK Office for AI.
Leslie, D., Rincón, C., Burr, C., Aitken, M., Katell, M., & Briggs, M. (2022d). AI sustainability in practice: Part I. The Alan Turing Institute and the UK Office for AI.
Leslie, D., Rincón, C., Burr, C., Aitken, M., Katell, M., & Briggs, M. (2022e). AI sustainability in practice: Part II. The Alan Turing Institute and the UK Office for AI.
Lin, J. (2015). On building better mousetraps and understanding the human condition: Reflections on big data in the social sciences. The Annals of the American Academy of Political and Social Science, 659(1), 33–47. https://doi.org/10.1177/0002716215569174
Lin, L. Y. I., Sidani, J. E., Shensa, A., Radovic, A., Miller, E., Colditz, J. B., Hoffman, B. L., Giles, L. M., & Primack, B. A. (2016). Association between social media use and depression among U.S. young adults. Depression and Anxiety, 33(4), 323–331. https://doi.org/10.1002/da.22466
Lomborg, S. (2013). Personal internet archives and ethics. Research Ethics, 9(1), 20–31. https://doi.org/10.1177/1747016112459450
Longo, J., Kuras, E., Smith, H., Hondula, D. M., & Johnston, E. (2017). Technology use, exposure to natural hazards, and being digitally invisible: Implications for policy analytics: Policy implications of the digitally invisible. Policy & Internet, 9(1), 76–108. https://doi.org/10.1002/poi3.144
Lorenz, T. (2014, March 7). Plugin allows you to recreate Facebook’s controversial mood-altering experiment on YOUR News Feed. The Daily Mail. https://www.dailymail.co.uk/sciencetech/article-2678561/Facebook-mood-altering-experiment-News-Feed.html
Lucy, L., & Bamman, D. (2021). Gender and representation bias in GPT-3 generated stories. Proceedings of the Third Workshop on Narrative Understanding (pp. 48–55). https://doi.org/10.18653/v1/2021.nuse-1.5
Lyon, D. (Ed.). (2003). Surveillance as social sorting: Privacy, risk, and digital discrimination. Routledge
Mahmoodi, J., Leckelt, M., van Zalk, M., Geukes, K., & Back, M. (2017). Big data approaches in social and behavioral science: Four key trade-offs and a call for integration. Current Opinion in Behavioral Sciences, 18, 57–62. https://doi.org/10.1016/j.cobeha.2017.07.001
Manovich, L. (2011). Trending: The promises and the challenges of big social data. Debates in the Digital Humanities, 2(1), 460–475.
Markham, A. (2006). Ethic as method, method as ethic: A case for reflexivity in qualitative ICT research. Journal of Information Ethics, 15(2), 37–54. https://doi.org/10.3172/JIE.15.2.37
Markham, A., & Buchanan, E. (2012). Ethical Decision-Making and Internet Research: Recommendations from the AoIR Ethics Working Committee (Version 2.0). Association of Internet Researchers. https://aoir.org/reports/ethics2.pdf
Marx, G. T. (1988). Undercover: Police surveillance in America. University of California Press.. http://site.ebrary.com/id/10676197
McIntyre, L. C. (2018). Post-truth. MIT Press.
Meho, L. I. (2006). E-mail interviewing in qualitative research: A methodological discussion. Journal of the American Society for Information Science and Technology, 57(10), 1284–1295. https://doi.org/10.1002/asi.20416
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys, 54(6), 1–35. https://doi.org/10.1145/3457607
Méndez-Diaz, N., Akabr, G., & Parker-Barnes, L. (2022). The evolution of social media and the impact on modern therapeutic relationships. The Family Journal, 30(1), 59–66. https://doi.org/10.1177/10664807211052495
Meng, X.-L. (2018). Statistical paradises and paradoxes in big data (I): Law of large populations, big data paradox, and the 2016 US presidential election. The Annals of Applied Statistics, 12(2). https://doi.org/10.1214/18-AOAS1161SF
Merriam, S. B., Johnson-Bailey, J., Lee, M.-Y., Kee, Y., Ntseane, G., & Muhamad, M. (2001). Power and positionality: Negotiating insider/outsider status within and across cultures. International Journal of Lifelong Education, 20(5), 405–416. https://doi.org/10.1080/02601370120490
Merson, L., Phong, T. V., Nhan, L. N. T., Dung, N. T., Ngan, T. T. D., Kinh, N. V., Parker, M., & Bull, S. (2015). Trust, respect, and reciprocity: Informing culturally appropriate data-sharing practice in Vietnam. Journal of Empirical Research on Human Research Ethics, 10(3), 251–263. https://doi.org/10.1177/1556264615592387
Metcalf, J., & Crawford, K. (2016). Where are human subjects in big data research? The emerging ethics divide. Big Data & Society, 3(1), 205395171665021. https://doi.org/10.1177/2053951716650211
Meyer, R. (2014, June 28). Everything we know about Facebook’s secret mood manipulation experiment. The Atlantic. https://www.theatlantic.com/technology/archive/2014/06/everything-we-know-about-facebooks-secret-mood-manipulation-experiment/373648/
Mislove, A., Lehmann, S., Ahn, Y.-Y., Onnela, J.-P., & Rosenquist, J. (2011). Understanding the demographics of Twitter users. Proceedings of the International AAAI Conference on Web and Social Media, 5(1), 554–557.
Mitchell, R. K., Lee, J. H., & Agle, B. R. (2017). Stakeholder prioritization work: The role of stakeholder salience in stakeholder research. In D. M. Wasieleski & J. Weber (Eds.), Business and society 360 (Vol. 1, pp. 123–157). Emerald Publishing Limited. https://doi.org/10.1108/S2514-175920170000006
Moons, K. G. M., Altman, D. G., Reitsma, J. B., Ioannidis, J. P. A., Macaskill, P., Steyerberg, E. W., Vickers, A. J., Ransohoff, D. F., & Collins, G. S. (2015). Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): Explanation and elaboration. Annals of Internal Medicine, 162(1), W1–W73. https://doi.org/10.7326/M14-0698
Moore, P. V. (2019). E(a)ffective precarity, control and resistance in the digitalised workplace. In D. Chandler & C. Fuchs (Eds.), Digital objects, digital subjects (pp. 125–144). University of Westminster Press; JSTOR. http://www.jstor.org/stable/j.ctvckq9qb.12
Moreno, M. A., Goniu, N., Moreno, P. S., & Diekema, D. (2013). Ethics of social media research: Common concerns and practical considerations. Cyberpsychology, Behavior and Social Networking, 16(9), 708–713. https://doi.org/10.1089/cyber.2012.0334
Muller, B. J. (2019). Biometric borders. In Handbook on Critical Geographies of Migration. Edward Elgar Publishing.
Nadeem, M., Bethke, A., & Reddy, S. (2020). StereoSet: Measuring stereotypical bias in pretrained language models. ArXiv:2004.09456 [Cs]. http://arxiv.org/abs/2004.09456
Najafian, M., Hsu, W.-N., Ali, A., & Glass, J. (2017). Automatic speech recognition of Arabic multi-genre broadcast media. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (pp. 353–359). https://doi.org/10.1109/ASRU.2017.8268957
Nanayakkara, P., Hullman, J., & Diakopoulos, N. (2021). Unpacking the expressed consequences of AI research in broader impact statements. AI, Ethics, and Society. https://doi.org/10.48550/ARXIV.2105.04760
Narayanan, A., & Shmatikov, V. (2009). De-anonymizing Social Networks. In: 2009 30th IEEE Symposium on Security and Privacy (pp. 173–187). https://doi.org/10.1109/SP.2009.22
National Committee for Research Ethics in the Social Sciences and the Humanities (NESH). (2019). A guide to internet research ethics. NESH.
Neural Information Processing Systems. (2021). NeurIPS 2021 paper checklist guidelines. https://neurips.cc/Conferences/2021/PaperInformation/PaperChecklist
Neural Information Processing Systems. (2022). NeurIPS 2022 ethical review guidelines. https://nips.cc/public/EthicsGuidelines
Neural Information Processing Systems Conference. (2020). Getting started with NeurIPS 2020. https://neuripsconf.medium.com/getting-started-with-neurips-2020-e350f9b39c28
Nissenbaum, H. (1998). Protecting privacy in an information age: The problem of privacy in public. Law and Philosophy, 17(5), 559–596. https://doi.org/10.1023/A:1006184504201
Nissenbaum, H. (2011). A contextual approach to privacy online. Daedalus, 140(4), 32–48. https://doi.org/10.1162/DAED_a_00113
Nixon, R. (2011). Slow violence and the environmentalism of the poor. Harvard University Press.
Noble, S. U. (2018). Algorithms of oppression: How search engines reinforce racism. New York University Press.
Nozza, D., Bianchi, F., & Hovy, D. (2021). HONEST: Measuring hurtful sentence completion in language models. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 2398–2406). https://doi.org/10.18653/v1/2021.naacl-main.191
Obole, A., & Welsh, K. (2012). The danger of big data: Social media as computational social science. First Monday, 17(7). https://doi.org/10.5210/fm.v17i7.3993
Olson, R. (2008). Science and scientism in nineteenth-century Europe. University of Illinois Press.
Olteanu, A., Castillo, C., Diaz, F., & Kıcıman, E. (2019). Social data: Biases, methodological pitfalls, and ethical boundaries. Frontiers in Big Data, 2, 13.
O’Neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy (1st ed.). Crown.
Ott, M., Choi, Y., Cardie, C., & Hancock, J. T. (2011). Finding deceptive opinion spam by any stretch of the imagination. ArXiv:1107.4557 [Cs]. http://arxiv.org/abs/1107.4557
Ottoni, R., Pesce, J. P., Las Casas, D., Franciscani, G., Jr., Meira, W., Jr., Kumaraguru, P., & Almeida, V. (2013). Ladies first: Analyzing gender roles and Behaviors in Pinterest. Proceedings of the International AAAI Conference on Web and Social Media, 7(1), 457–465.
Owen, R. (2014). The UK Engineering and Physical Sciences Research Council’s commitment to a framework for responsible innovation. Journal of Responsible Innovation, 1(1), 113–117. https://doi.org/10.1080/23299460.2014.882065
Owen, R., Macnaghten, P., & Stilgoe, J. (2012). Responsible research and innovation: From science in society to science for society, with society. Science and Public Policy, 39(6), 751–760. https://doi.org/10.1093/scipol/scs093
Owen, R., Stilgoe, J., Macnaghten, P., Gorman, M., Fisher, E., & Guston, D. (2013). A framework for responsible innovation. Responsible Innovation: Managing the Responsible Emergence of Science and Innovation in Society, 31, 27–50.
Packer, B., Halpern, Y., Guajardo-Céspedes, M., & Mitchell, M. (2018, April 13). Text embeddings contain Bias. Here’s why that matters. Google AI. https://developers.googleblog.com/2018/04/text-embedding-models-contain-bias.html
Paganoni, M. C. (2019). Ethical concerns over facial recognition technology. Anglistica AION, 23(1), 85–94. https://doi.org/10.19231/angl-aion.201915
Pasquale, F. (2020). New laws of robotics: Defending human expertise in the age of AI. The Belknap Press of Harvard University Press.
Pasquale, F., & Cashwell, G. (2018). Prediction, persuasion, and the jurisprudence of behaviourism. University of Toronto Law Journal, 68(supplement 1), 63–81. https://doi.org/10.3138/utlj.2017-0056
Pentland, A. (2015). Social physics: How social networks can make us smarter. Penguin Press.
Peterka-Bonetta, J., Sindermann, C., Elhai, J. D., & Montag, C. (2019). Personality associations with smartphone and internet use disorder: A comparison study including links to impulsivity and social anxiety. Frontiers in Public Health, 7, 127. https://doi.org/10.3389/fpubh.2019.00127
Preoţiuc-Pietro, D., Liu, Y., Hopkins, D., & Ungar, L. (2017). Beyond binary labels: Political ideology prediction of Twitter users. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers, pp. 729–740). https://doi.org/10.18653/v1/P17-1068
Prunkl, C. E. A., Ashurst, C., Anderljung, M., Webb, H., Leike, J., & Dafoe, A. (2021). Institutionalizing ethics in AI through broader impact requirements. Nature Machine Intelligence, 3(2), 104–110. https://doi.org/10.1038/s42256-021-00298-y
Puschmann, C., & Bozdag, E. (2014). Staking out the unclear ethical terrain of online social experiments. Internet Policy Review, 3(4). https://doi.org/10.14763/2014.4.338
Quan-Haase, A., & Ho, D. (2020). Online privacy concerns and privacy protection strategies among older adults in East York, Canada. Journal of the Association for Information Science and Technology, 71(9), 1089–1102. https://doi.org/10.1002/asi.24364
Quan-Haase, A., Williams, C., Kicevski, M., Elueze, I., & Wellman, B. (2018). Dividing the Grey divide: Deconstructing myths about older adults’ online activities, skills, and attitudes. American Behavioral Scientist, 62(9), 1207–1228. https://doi.org/10.1177/0002764218777572
Raymond, N. (2019). Safeguards for human studies can’t cope with big data. Nature, 568(7752), 277–277. https://doi.org/10.1038/d41586-019-01164-z
Reed, M. S., Graves, A., Dandy, N., Posthumus, H., Hubacek, K., Morris, J., Prell, C., Quinn, C. H., & Stringer, L. C. (2009). Who’s in and why? A typology of stakeholder analysis methods for natural resource management. Journal of Environmental Management, 90(5), 1933–1949. https://doi.org/10.1016/j.jenvman.2009.01.001
Reidenberg, J. R. (2014). Privacy in public. University of Miami Law Review, 69, 141.
Resnik, D. B. (2018). The ethics of research with human subjects: Protecting people, advancing science, promoting trust (1st ed.). Springer. https://doi.org/10.1007/978-3-319-68756-8
Roberge, J., Morin, K., & Senneville, M. (2019). Deep Learning’s governmentality: The other black box. In A. Sudmann (Ed.), The democratization of artificial intelligence (pp. 123–142). transcript Verlag. https://doi.org/10.1515/9783839447192-008
Ruths, D., & Pfeffer, J. (2014). Social media for large studies of behavior. Science, 346(6213), 1063–1064. https://doi.org/10.1126/science.346.6213.1063
Sagarra, O., Gutiérrez-Roig, M., Bonhoure, I., & Perelló, J. (2016). Citizen science practices for computational social science research: The conceptualization of pop-up experiments. Frontiers in Physics, 3. https://doi.org/10.3389/fphy.2015.00093
Salganik, M. J. (2019). Bit by bit: Social research in the digital age. https://app.kortext.com/Shibboleth.sso/Login?entityID=https%3A%2F%2Felibrary.exeter.ac.uk%2Fidp%2Fshibboleth&target=https://app.kortext.com/borrow/277287
Salganik, M., & The Summer Institutes in Computational Social Science. (n.d.). Ethics and Computational Social Science. https://sicss.io/overview/ethics-part-1
Sánchez-Monedero, J., Dencik, L., & Edwards, L. (2020). What does it mean to ‘solve’ the problem of discrimination in hiring?: Social, technical and legal perspectives from the UK on automated hiring systems. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, (pp. 458–468). doi:https://doi.org/10.1145/3351095.3372849
Schmeer, K. (1999). Stakeholder analysis guidelines. Policy Toolkit for Strengthening Health Sector Reform, 1, 1–35.
Schroeder, R. (2014). Big data and the brave new world of social media research. Big Data & Society, 1(2), 205395171456319. https://doi.org/10.1177/2053951714563194
Schultheis, H. (2021). Computational cognitive modeling in the social sciences. In U. Engel, A. Quan-Haase, S. X. Liu, & L. Lyberg (Eds.), Handbook of computational social science, volume 1 (pp. 53–65). Routledge.
Schwartz, R., Dodge, J., Smith, N. A., & Etzioni, O. (2020). Green AI. Communications of the ACM, 63(12), 54–63. https://doi.org/10.1145/3381831
Scott, H., & Woods, H. C. (2018). Fear of missing out and sleep: Cognitive behavioural factors in adolescents’ nighttime social media use. Journal of Adolescence, 68(1), 61–65. https://doi.org/10.1016/j.adolescence.2018.07.009
Selinger, E., & Hartzog, W. (2020). The inconsentability of facial surveillance. Loyola Law Review, 66, 33.
Shah, D. V., Cappella, J. N., & Neuman, W. R. (2015). Big data, digital media, and computational social science: Possibilities and perils. The Annals of the American Academy of Political and Social Science, 659(1), 6–13. https://doi.org/10.1177/0002716215572084
Shaw, R. (2015). Big data and reality. Big Data & Society, 2(2), 205395171560887. https://doi.org/10.1177/2053951715608877
Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3). https://doi.org/10.1214/10-STS330
Shrum, W. (2005). Reagency of the internet, or, how I became a guest for science. Social Studies of Science, 35(5), 723–754. https://doi.org/10.1177/0306312705052106
Simon, H. A. (2002). Science seeks parsimony, not simplicity: Searching for pattern in phenomena. In A. Zellner, H. A. Keuzenkamp, & M. McAleer (Eds.), Simplicity, inference and modelling: Keeping it sophisticatedly simple (pp. 32–72). Cambridge University Press. https://doi.org/10.1017/CBO9780511493164.003
Sloane, M., Moss, E., & Chowdhury, R. (2022). A Silicon Valley love triangle: Hiring algorithms, pseudo-science, and the quest for auditability. Patterns, 3(2), 100425. https://doi.org/10.1016/j.patter.2021.100425
Sorell, T. (2013). Scientism: Philosophy and the infatuation with science. Routledge.
Spaulding, N. W. (2020). Is human judgment necessary?: Artificial intelligence, algorithmic governance, and the law. In M. D. Dubber, F. Pasquale, & S. Das (Eds.), The Oxford handbook of ethics of AI (pp. 374–402). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780190067397.013.25
Stark, L., & Hutson, J. (2021). Physiognomic artificial intelligence. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3927300
Steinmann, M., Shuster, J., Collmann, J., Matei, S. A., Tractenberg, R. E., FitzGerald, K., Morgan, G. J., & Richardson, D. (2015). Embedding privacy and ethical values in big data technology. In S. A. Matei, M. G. Russell, & E. Bertino (Eds.), Transparency in social media (pp. 277–301). Springer International Publishing. https://doi.org/10.1007/978-3-319-18552-1_15
Stier, S., Breuer, J., Siegers, P., & Thorson, K. (2020). Integrating survey data and digital trace data: Key issues in developing an emerging field. Social Science Computer Review, 38(5), 503–516. https://doi.org/10.1177/0894439319843669
Stilgoe, J., Watson, M., & Kuo, K. (2013). Public engagement with biotechnologies offers lessons for the governance of geoengineering research and beyond. PLoS Biology, 11(11), e1001707. https://doi.org/10.1371/journal.pbio.1001707
Striphas, T. (2015). Algorithmic culture. European Journal of Cultural Studies, 18(4–5), 395–412. https://doi.org/10.1177/1367549415577392
Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. arXiv.org. https://doi.org/10.48550/ARXIV.1906.02243
Suresh, H., & Guttag, J. V. (2021). A framework for understanding sources of harm throughout the machine learning life cycle. Equity and Access in Algorithms, Mechanisms, and Optimization, 1–9. https://doi.org/10.1145/3465416.3483305
Sweeney, C., & Najafian, M. (2019). A transparent framework for evaluating unintended demographic bias in word embeddings. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 1662–1667). https://doi.org/10.18653/v1/P19-1162
Sweeney, L. (2013). Discrimination in online ad delivery. Communications of the ACM, 56(5), 44–54. https://doi.org/10.1145/2447976.2447990
Syvertsen, T. (2020). Digital detox: The politics of disconnecting. Emerald Publishing.
Syvertsen, T., & Enli, G. (2020). Digital detox: Media resistance and the promise of authenticity. Convergence: The International Journal of Research into New Media Technologies, 26(5–6), 1269–1283. https://doi.org/10.1177/1354856519847325
Tauginienė, L., Butkevičienė, E., Vohland, K., Heinisch, B., Daskolia, M., Suškevičs, M., Portela, M., Balázs, B., & Prūse, B. (2020). Citizen science in the social sciences and humanities: The power of interdisciplinarity. Palgrave Communications, 6(1), 89. https://doi.org/10.1057/s41599-020-0471-y
Taylor, C. (2021). The explanation of behaviour. Routledge.
Theocharis, Y., & Jungherr, A. (2021). Computational social science and the study of political communication. Political Communication, 38(1–2), 1–22. https://doi.org/10.1080/10584609.2020.1833121
Törnberg, P., & Uitermark, J. (2021). For a heterodox computational social science. Big Data & Society, 8(2), 205395172110477. https://doi.org/10.1177/20539517211047725
Tritter, J. Q., & McCallum, A. (2006). The snakes and ladders of user involvement: Moving beyond Arnstein. Health Policy, 76(2), 156–168. https://doi.org/10.1016/j.healthpol.2005.05.008
Tufekci, Z. (2014). Big questions for social media big data: Representativeness, validity and other methodological pitfalls. Cornell University Library, arXiv.org. https://doi.org/10.48550/ARXIV.1403.7400
Vaidhyanathan, S. (2018). Antisocial media: How facebook disconnects US and undermines democracy. Oxford University Press.
van Dijck, J. (2013). The culture of connectivity: A critical history of social media. Oxford University Press.
van Dijck, J., Poell, T., & de Waal, M. (2018). The platform society. Oxford University Press.
Van Otterlo, M. (2014). Automated experimentation in Walden 3.0.: The next step in profiling, predicting, control and surveillance. Surveillance & Society, 12(2), 255–272. https://doi.org/10.24908/ss.v12i2.4600
Varnhagen, C. K., Gushta, M., Daniels, J., Peters, T. C., Parmar, N., Law, D., Hirsch, R., Sadler Takach, B., & Johnson, T. (2005). How informed is online informed consent? Ethics & Behavior, 15(1), 37–48. https://doi.org/10.1207/s15327019eb1501_3
Varvasovszky, Z., & Brugha, R. (2000). A stakeholder analysis. Health Policy and Planning, 15(3), 338–345. https://doi.org/10.1093/heapol/15.3.338
Viner, R. M., Gireesh, A., Stiglic, N., Hudson, L. D., Goddings, A.-L., Ward, J. L., & Nicholls, D. E. (2019). Roles of cyberbullying, sleep, and physical activity in mediating the effects of social media use on mental health and wellbeing among young people in England: A secondary analysis of longitudinal data. The Lancet Child & Adolescent Health, 3(10), 685–696. https://doi.org/10.1016/S2352-4642(19)30186-5
von Schomberg, R. (2013). A vision of responsible research and innovation. In R. Owen, J. Bessant, & M. Heintz (Eds.), Responsible Innovation (pp. 51–74). Wiley. https://doi.org/10.1002/9781118551424.ch3
von Wright, G. H. (2004). Explanation and understanding. Cornell University Press.
Wagner, C., Strohmaier, M., Olteanu, A., Kıcıman, E., Contractor, N., & Eliassi-Rad, T. (2021). Measuring algorithmically infused societies. Nature, 595(7866), 197–204. https://doi.org/10.1038/s41586-021-03666-1
Wallach, H. (2018). Computational social science ≠ computer science + social data. Communications of the ACM, 61(3), 42–44. https://doi.org/10.1145/3132698
Wang, G. A., Chen, H., Xu, J. J., & Atabakhsh, H. (2006). Automatically detecting criminal identity deception: An adaptive detection algorithm. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 36(5), 988–999. https://doi.org/10.1109/TSMCA.2006.871799
Weber, M. (1978). Economy and society: An outline of interpretive sociology (Vol. 2). University of California press.
Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P.-S., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L. A., … Gabriel, I. (2021). Ethical and social risks of harm from Language Models. ArXiv:2112.04359 [Cs]. http://arxiv.org/abs/2112.04359
Weinhardt, M. (2020). Ethical issues in the use of big data for social research. Historical Social Research, 45(3), 342–368. https://doi.org/10.12759/HSR.45.2020.3.342-368
Wittgenstein, L. (2009). Philosophical investigations (P. M. S. Hacker & J. Schulte, Eds.; G. E. M. Anscombe, P. M. S. Hacker, & J. Schulte, Trans.; Rev. 4th ed). Wiley-Blackwell.
Woods, H. C., & Scott, H. (2016). #Sleepyteens: Social media use in adolescence is associated with poor sleep quality, anxiety, depression and low self-esteem. Journal of Adolescence, 51(1), 41–49. https://doi.org/10.1016/j.adolescence.2016.05.008
Woolley, S. C. (2016). Automating power: Social bot interference in global politics. First Monday. https://doi.org/10.5210/fm.v21i4.6161
Woolley, S., & Howard, P. N. (Eds.). (2018). Computational propaganda: Political parties, politicians, and political manipulation on social media. Oxford University Press.
World Health Organization. (2022). Report of the WHO global technical consultation on public health and social measures during health emergencies: Online meeting, 31 August to 2 September 2021. World Health Organization. https://apps.who.int/iris/handle/10665/352096
Wright, J., Leslie, D., Raab, C., Ostmann, F., Briggs, M., & Kitagawa, F. (2021). Privacy, agency and trust in human-AI ecosystems: Interim report (short version). The Alan Turing Institute. https://www.turing.ac.uk/research/publications/privacy-agency-and-trust-human-ai-ecosystems-interim-report-short-version
Wu, T. (2019). Blind spot: The attention economy and the Law. Antitrust Law Journal, 82(3), 771–806.
Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6), 1100–1122. https://doi.org/10.1177/1745691617693393
Yeung, K. (2017). ‘Hypernudge’: Big data as a mode of regulation by design. Information, Communication & Society, 20(1), 118–136. https://doi.org/10.1080/1369118X.2016.1186713
Zhao, J., Wang, T., Yatskar, M., Ordonez, V., & Chang, K.-W. (2017). Men also like shopping: Reducing gender bias amplification using corpus-level constraints. arXiv.org. https://doi.org/10.48550/ARXIV.1707.09457
Zheng, R., Li, J., Chen, H., & Huang, Z. (2006). A framework for authorship identification of online messages: Writing-style features and classification techniques. Journal of the American Society for Information Science and Technology, 57(3), 378–393. https://doi.org/10.1002/asi.20316
Ziewitz, M. (2016). Governing algorithms: Myth, mess, and methods. Science, Technology, & Human Values, 41(1), 3–16. https://doi.org/10.1177/0162243915608948
Zimmer, M. (2016, May 14). OkCupid study reveals the perils of big-data science. Wired Magazine. https://www.wired.com/2016/05/okcupid-study-reveals-perils-big-data-science/
Zuboff, S. (2019). The age of surveillance capitalism: The fight for a human future at the new frontier of power (1st ed.). Public Affairs.
Zuckerman, E. (2020). The case for digital public infrastructure. Springer. https://doi.org/10.7916/D8-CHXD-JW34
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2023 The Author(s)
About this chapter
Cite this chapter
Leslie, D. (2023). The Ethics of Computational Social Science. In: Bertoni, E., Fontana, M., Gabrielli, L., Signorelli, S., Vespe, M. (eds) Handbook of Computational Social Science for Policy. Springer, Cham. https://doi.org/10.1007/978-3-031-16624-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-16624-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16623-5
Online ISBN: 978-3-031-16624-2
eBook Packages: Computer ScienceComputer Science (R0)