Keywords

1 Introduction

Data has become a core asset, as well as a ‘management fashion’ [1], of our time. It brings about unprecedented opportunities for data-driven decision making and innovation in various spheres of public life. This concerns data held by governments, as well as companies, academic institutions, non-profits, and citizens. Often, collaboration is needed because data from different parties have to be combined to realize opportunities. The notion of Data Collaboratives captures this collaboration imperative and stands for “cross-sector (and public-private) collaboration initiatives aimed at data collection, sharing, or processing for the purpose of addressing a societal challenge” [31]. In the past five years the concept of data collaboratives, initially coined by The Gov Lab [34], has gained much interest (e.g. World Bank’s Development Data Partnership, EU Commission’s Expert Group on B2G Data Sharing).

There is a row of successful and less successful stories of data collaboratives. Data collaboratives face several rather specific challenges, such as balancing data control and ownership, governance challenges, legal constraints, privacy and ethics issues, competitive risks, technical challenges, to name a few [7, 16, 24, 32]. Different types of data intermediaries have emerged that view these challenges as opportunities. There is however little knowledge on how these data intermediaries operate and what business models they employ. Providing more clarity on this can create a better understanding of the roles that government can play in such data collaboratives. Therefore, our research question is: What business models are employed by intermediaries to create value in data collaboratives?

The paper is structured as follows: in Sect. 2 we review relevant literature on intermediation in data sharing and collaboration processes; in Sect. 3 present our analytical framework for intermediary business model analysis; in Sect. 4 we outline our multiple case study method; in Sect. 5 we present a comparative analysis of the cases; and in Sect. 6 we formulate generic business models from the case data; and in Sect. 7 we summarize and reflect on our findings.

2 Intermediaries in Data Collaboratives: Conceptualization

Data intermediaries in the context of data have characteristics that resemble well-known notions of infomediary, partnership broker, and innovation intermediary. Intermediaries that facilitate data sharing (also known as ‘infomediaries’) are well described in the open data literature. There are, however, different conceptualizations of the functions of these intermediaries. Open data intermediaries can develop products and services based on open data for citizens, government, or third parties [20] and can present complex datasets in a user-friendly way [22]. Meijer and Potjer [21] broaden the concept of open data intermediary to include actors facilitating the generation of data by citizens.

Inter-organizational collaboration literature puts forward the concept of ‘partnership brokers’ who have specific experience and capacity to facilitate negotiation and development of partnership arrangements [29]. Partnership brokers can create value by acting as matchmakers, connectors, facilitators, co-designers, conveners, mediators, or learning catalysts [19, 29].

Another relevant concept is that of an ‘innovation intermediary’. Overall, innovation intermediaries create value by (1) connecting actors; (2) involving, committing, and mobilising actors; (3) solving, avoiding, or mitigating potential conflicts of interests; and (4) (actively) stimulating the innovation process and innovation outcomes [2]. Holzmann, Sailer and Katzy [14] discuss the role of innovation intermediaries in multi-sided markets, whereby innovation intermediaries are tasked with the matching process between demand and supply.

More recently, the term ‘trusted data intermediary’ (TDI) entered into circulation in the practitioner community [8, 9, 31]. Recently, a research agenda was formulated by Stanford researchers who recognized this as an emerging term [8, 9]. They defined TDIs as entities that have “a commitment to collect, aggregate, and make available large sets of digitized data for public purpose” [9]. The key distinguishing feature of TDIs is the suite of negotiations, ranging from data ownership, storage, access, analysis and security to ensuring privacy, regulations and legal conditions, and standards and practices (Ibid.). The intermediaries conduct these negotiations for different kinds of data, across sectors, and in various organizational forms (Ibid.).

These intermediation concepts provide the necessary depth for conceptualizing and grounding the notion of intermediaries of data collaboratives in existing academic research. Previous research [30] found that, compared to collaborations in general, data collaboratives face very specific collaboration challenges: the decisive role of trust between parties, data stewardship, data-related risk mitigation, and formulating value proposition to both sides. Intermediaries of data collaboratives, therefore, have a potential to create value by addressing these challenges. Our study aims to shed light on how they achieve this by analyzing emerging business models of select intermediaries in this context.

3 Framework for Business Model Analysis

The notion of ‘business model’ has been discussed in various streams of literature. Here, we follow the definition by Amit and Zott [4]: a business model depicts how the content, governance, and structure of transactions create value.

With the growing importance of data as raw material for the digital economy, there is an emergence of the data-driven business model (DDBM) as one specific type of business model [11] which emphasizes data as a key resource in firms’ business models, particularly to establish value proposition [6]. The majority of scholars focused their research on identifying patterns and/or developing a framework/taxonomy for DDBM [11, 13, 26]. These classifications, however, apply primarily to companies in a commercial setting; they do not fully capture either the public value goals of data collaboratives or the non-profit motives of intermediaries in this context.

As for the intermediary business model, the literature in this domain is still scarce. Weill and Vitale [35] discuss the “generic business model” of electronic intermediaries. They argued that the provision of services in this business model (e.g., knowledge management, centralized management of applications, information search services) offers buyers and sellers (e.g., business, consumer, or other entities) lowered search and transaction costs. Similarly, Janssen and Zuiderwijk [15] explicate infomediary business models by formulating six types of business models based on the levels of data accessibility and dialogue between data users and data contributors. Most recently, intermediaries as marketplaces enabled by digital technologies were studied by Täuscher and Laudien [33], in which they developed a taxonomy of six types of web-based platform business models.

In sum, to describe a business model, many ontologies exist. A business model ontology is an explicit, simple specification of a conceptualization of components of a business model and the relationships between them [12, 17]. For our study, we needed a framework that is sufficiently broad to capture the description of kinds of value (for different customer segments), resources, governance mechanisms, and financial arrangement of intermediaries. Based on this rationale, we selected the Unified Business Model Framework (UBM Framework) by Al-Debei and Avison [3]. The framework is arguably one of the most comprehensive business model frameworks, as it was developed through the systematization of 22 other business model frameworks. This framework has also been applied in other similar studies [15, 23] that focus on entities that act as intermediaries.

The framework’s ontological structure comprises of four dimensions, each explained by specific constituent elements. We follow the operationalization of this framework as proposed by Janssen and Zuiderwijk [15] since they focused on open data intermediaries which is a similar context to ours. See Table 1 for an overview of the four dimensions considered in this study.

Table 1. Framework for Business Model Analysis (adapted from Al-Debei and Avison [3] and Janssen and Zuiderwijk [15])

4 Case Study Method

For our study we conducted an exploratory multiple case study [37]. We chose to sample cases by diversity. Data collaboratives can be categorized by their expected outcome into three arenas: (a) policy intervention, (b) data science, and (c) data-driven innovation [31]. In our case sampling we included two cases per each type to ensure diversity. We assumed that, based on these types, intermediaries create value for different target groups and therefore the value proposition, as well as value delivery and capture mechanisms, are likely to be different. As a result, the following cases were included in our analysis (Table 2).

Table 2. Selected cases with descriptions

The data was collected by conducting online desk research which included analysis of the case websites, documents, and applications. The number of data sources depended on the availability of information and ranged between 2 and 8 per case. In analyzing each case, we followed the qualitative content analysis approach [18] that is driven by the structure of the analytical framework, i.e. data for every case were coded and categorized into the value proposition, value architecture, value network, and value finance dimension. In cases when online research did not provide sufficient data, we conducted interviews with key informants of the projects to obtain clarifying information (CIP) or participated in observations during meetings and presentations (HDX, AMdEX).

5 Findings

In this section we present our analysis of the selected casesFootnote 1 through the lens of the analytical framework found in Table 1. We thus derive the business models of the intermediaries based on the four elements of the framework: Value Proposition, Value Architecture, Value Network, and Value Finance.

Our analysis shows that the intermediary model of data collaborative can be employed to address a wide range of needs and create value in different ways along the data value chain. Data intermediaries can facilitate access to previously closed data, provide an infrastructure for aggregating previously fragmented data, produce visualizations and targeted apps for eased use and understanding of data, offer interactive solutions for collecting and sharing data.

In terms of Value Architecture, the majority of the cases (except for SSO) rely heavily on pooling diverse data sources together, although they create value from that via different pathways.

Regarding Value Network, our analysis of the six cases shows that several roles can be identified. The intermediaries we analyzed have diverse organizational forms and various actors can be in the lead: academic institutions, companies, inter-governmental organizations, non-profit organizations; yet, the triple helix construction (government, industry, academia) is widely leveraged.

Finally, data intermediaries can be financially supported by different actors like non-profit foundations, industry, research institutions, government and can additionally create revenue themselves from fees for their products and services.

More nuance can be added by discussing pairs of cases according to our sampling strategy by three arenas: data science, policy intervention, and data innovation. Our analysis found that there are both similarities and differences among these types of cases.

As regards data intermediaries in the arena of data science, as evidenced by the cases SSO and Vivli, their core value proposition is facilitating easier access by researchers to valuable data from the private (and public sectors in case of Vivli). Industry-academia data intermediaries deliver value to researchers as data users by pre-negotiating data access agreements with data providers that otherwise cost researchers much time and effort. The rationale for the data provider differs depending on the context. In the case of SSO, data providers like Facebook may see it as a means to increase transparency and improve their reputation in the aftermath of negative data analytics publicity.

In the Vivli case, which provides a one-stop-shop of clinical trials data, the value for data providers is increased discoverability of their data and enhanced capacity for data sharing and reuse, thereby breaking down silos in medical research. Data contributors are offered services for secure data hosting, tools for anonymization and mapping of data, and the service of reviewing data requests on their behalf. Data users are offered a data search engine, a form to request data with data contributors, as well as an environment and tools for data aggregation (from various sources) and analysis of this data. Both data providers and users are asked to sign a Harmonized Data User Agreement prior to data sharing. In both cases, however, the intermediary support the scientific process by providing an organizational (SSO) and/or technical (Vivli) infrastructure for data access. Furthermore, they act like a “neutral broker” and perform a function of a data steward.

Data intermediaries in the arena of policy intervention are guided by slightly different objectives. Data resources created via these data collaboratives are a public good, therefore openness is at the heart of the value proposition of these intermediaries. In the case of HDX, the value of the intermediary, besides providing a technical platform and standardized process for sharing data, is providing quality assurance services to ensure risks to privacy are minimal. HDX exists since 2014 and during this time the focus has shifted from mere data aggregation to data transformations, such as visualizations. The value that this intermediary provides to data contributors is assistance with data processing and preparing data for publication, thereby tackling the data quality challenge in the humanitarian sector. Data contributors are offered services and tools that host data, impose access controls for published data sets, create metadata, and standardize, refine, statistically analyze and visualize data. For data users, services and tools for searching, following and requesting data are offered, as well as an API infrastructure to integrate the platform into user-own developments. The HDX team verifies data contributors and evaluates contributed data sets for data quality and sensitivity levels.

For both HDX and GFW, data visualizations form one of the core elements of the value proposition. The mission of the Global Forest Watch is to bring data insights to the society, including government decision-makers, companies, journalists, researchers and the public, to drive evidence-based action in forest management and conservation domain. This platform combines and overlays best available data from various sources (government, research, crowdsourced, proprietary) to produce targeted applications, such as Forest Watch and GFW Pro. There is also an open data portal for downloading datasets as open data. Both cases, HDX and GFW, add value by centralizing large volumes of data in one portal and by transforming data into actionable visualizations and insights to drive policy action. In this sense, they can be seen as ‘infomediaries’ [15] that help users manage vast amounts of information.

In the arena of data-driven innovation, the data intermediaries we analyzed were AMdEX and Civity. These intermediaries aim to provide a secure data infrastructure and ‘rules’ for data sharing among diverse parties from different sectors enabling data owners to remain in control of what data they share and with whom. Both cases are connected to international initiatives potentially feeding into the larger data innovation ecosystem in Europe. The CIP platform is limited to the smart city theme, while AMdEX has an open-ended scope potentially embracing a wide variety of data innovation use cases. Here intermediaries have to deal with issues of balancing competition and data protection with innovation and openness, therefore trust in the technical solution and in the members of the network play an important role in these cases. Both intermediaries create value to data contributors by offering a trusted data infrastructure. However, the difference is that in the case of AMdEX the business model is still a search process; it is emerging from consultations with stakeholders about their needs and challenges. Therefore, issues of standardization and interoperability are of concern, given the different demands and expectations of the wide range of stakeholders. While in the CIP case, the business model is very straightforward and determined by the company as a platform provider – ‘take it or leave it’.

6 Discussion

Overall, we find that the business models described differ based on the following discriminating variables:

  • Level of openness – stands for the extent to which data access is restricted or available. A high level of openness means that anyone can use the data provided through an intermediary; a medium level of openness means that there is a process in place to approve data requests; and a low level of openness means that the data is available to members only. Thereby, we place the business models on a continuum.

  • Added value to the data – key activities that the intermediary performs along the data value chain to realize the value proposition.

Table 3 below shows how the six cases we analyzed are classified according to these variables. We observe that the level of openness in these six intermediary cases correlates with the three arenas based on which we selected the cases. Data science collaboratives, such as SSO and Vivli, have a medium level of openness due to the data access procedures that are put in place (call for proposals to researchers, data request forms). Policy intervention collaboratives, such as HDX and GFW, have a high level of openness due to the public nature of the issue and the fact that the platforms they provide are considered as public good. Data innovation collaboratives, such as AMdEX and CIP, have a low level of openness as in the former case data access can be controlled by the data provider and in the latter case the platform is offered on a for-profit basis.

Table 3. Variables for deriving generic business models from cases

In terms of added value, some cases can be placed in several (more than two) categories (e.g. GFW provides data access through their open data portal, data visualizations through the Forest Watcher app, and data sharing tools for supply chain monitoring via its GFW Pro app). Nonetheless, in Table 3 below we highlight the activities that are at the very core of the value proposition of the intermediary. For some cases that are placed in two cells, namely HDX, this also shows the evolution of the business model (towards more data visualizations).

Based on these variables, we distinguish the following generic business models for a data intermediary in a data collaborative setting. Since we based our analysis on six cases, we allow the possibility that other cells in this table may be filled in with other business models.

  1. 1.

    Data gatekeeper model (Social Science One): intermediary serves as a trusted third party that negotiates terms of access to previously closed data by users selected through a call for proposals. Value comes from the legitimacy of the process.

  2. 2.

    One-stop-shop model (Vivli, HDX): intermediary aggregates previously siloed data from multiple sources into a central data repository to ease discoverability, comparability, and analysis of data. Value comes from scale, therefore the intermediary is dependent on data providers contributing data.

  3. 3.

    Information-as-a-service model (GFW, HDX): intermediary provides data visualizations to targeted segments to ease the understandability of data for decision-makers. Value comes from ease-of-use and quality of decision support; therefore, the intermediary is dependent on users.

  4. 4.

    Data controls model (Civity CIP, AMdEX): intermediary offers a solution for sharing data (including sensitive data) in a secure, targeted, and controlled manner with full insight into who uses the data. Data owners are offered a menu of options what (parts of) data to share with whom and for what period. The value comes from the technical excellence of the product and the data expertise of the intermediary. Therefore, the intermediary is dependent on buy-in from key stakeholders and on standardization and interoperability efforts.

In the realization of these business models the intermediaries are dependent on interactions with other actors in the data ecosystem. In the Data Gatekeeper model, the data intermediary is dependent on the acceptance and trust of the wider (scientific) community. The role of government, at least based on the SSO case, is supporting, providing guidance and/or setting norms. In the One-stop-shop model, where more value comes from scale, the intermediary is dependent on data providers contributing data to build a comprehensive repository. In both Vivli and HDX government organizations can be data providers or users, as well as partners of the initiative. In the Information-as-a-service model, the intermediary is dependent on the users from different target groups. Governments can play multiple roles here by providing data, supporting the initiative, and consuming the data analysis products themselves. And in the Data Controls model, where value comes from buy-in from key stakeholders, government can be a facilitator, a data provider/client, data user, and funder. Overall, our study confirms and provides illustrations of the roles of government in data collaborations described in previous research [17, 27]: namely, government as a data provider, data user, facilitator of collaboration, supporter or funder, an active partner in the consortium, a guarantor of data quality, source of trust, public interest promoter. Our research however shows that it is rarely just one role that government organizations assume but a combination of roles.

When compared to the existing literature [15], there are some similarities and differences. For instance, the One-stop-shop model is an extension of three different business models, as this model aggregates siloed data from multiple sources (i.e. information aggregators) and stores those data centrally (i.e. open data repository) to make the data easier to search, compare and analyze (i.e. service platforms). Similarly, in the Information-as-a-service model, data intermediaries will act as service platforms that offer data processing and analysis services that can help decision-makers in making better policies. However, there are notable differences in a way that we also emphasize technical resources and the enabling environment to promote data transparency and sovereignty, as can be seen in the Data gatekeeper and Data control models. In this way, we allow the possibility to open up the data that are traditionally challenging to be open, especially data possessed by private entities.

7 Conclusion

Our study investigated how intermediaries create value and what business models they use to facilitate data exchange and collaboration in the context of data collaboratives. For our analysis we selected six cases representing three different arenas of data collaboratives: data science, policy intervention, and data-driven innovation. Based on the case analysis, we derived four generic business models that can be employed by intermediaries in data collaboratives. These generic business models are distinguished by two variables – level of openness and added value to the data. We labeled these business models as follows: Data Gatekeeper model, One-stop-shop model, Information-as-a-service model, and Data Controls model.

Our study shows that the intermediary model of data collaboratives can have different configurations and can follow various pathways for value creation. As evidenced by our cases, in their offerings intermediaries aim to tackle diverse challenges that data collaboratives face, such as balancing collaborative data sharing with control over data, overcoming fragmentation and creating data pools, brokering access to controversial data, matching data supply and demand though market rules, and more. The four business models further differ based on what the value comes from and who or what the intermediary is dependent on to realize the value proposition. We further discussed the implications of these findings for governments explicating the different roles in which they can step in these initiatives.

Our research contributed empirically derived knowledge about data intermediaries through the lens of data collaboratives. This adds to our understanding of emerging data ecosystems leveraging diverse data sources for creating public value through cross-sector collaboration. The limitations of our work are that our study is exploratory and was based on a limited number of cases. Future research can test and elaborate our findings in other empirical settings. Furthermore, much literature discusses trust as an important element of intermediaries (e.g. [5, 8,9,10, 25, 28, 29, 36]). In our study we did not explicitly interrogate whether and to what extent the conceptualizations of trust can be different across the various intermediary business models. We propose this research question for future investigations.