Keywords

1 Introduction

Since its formal inception in 2003, when European Union (EU) adopted the ‘Directive on the Re-use of Public Sector Information’Footnote 1 [1], open government data (OGD) as a free re-useable object has attracted the interest of researchers and practitioners under the notion of research efficiency and effectiveness. Governments and high level policy makers have realised the potential of publishing public sector information as the last stand of earning back citizens’ trust, as well as the importance of the national context on government information and knowledge sharing [2, 3]. Lower level civil servants, as always reluctant to the change this new entry, will enforce in terms of new systems, new procedures and effort. Citizens are becoming more aware of the benefits that OGD may offer, by using secondary services towards accountability and transparency. Businesses develop and/or redesign their business models to be in alignment with this great development of our century, exploiting the numerous benefits and turn it into profit. For these reasons, OGD initiatives have burgeoned over the last years worldwide, both in developed and in developing countries [4,5,6].

Quite a lot of studies position OGD and its exploitation as the ‘new gold’ [7, 8], resulting in the establishment of opening government datasets as a ‘political orthodoxy’ in numerous countries worldwide (e.g., in the USA [9], in the UK [10], in Australia [11] and across Europe [6, 12]).

Big investments that have been made for the development of ‘OGD sources’, defined as various types of portals enabling access to government datasets by the public through the Internet. These OGD portals provide various capabilities/functionalities in this direction by a variety of government organisations with different strategies and technical capacities, and under different social, political and legal conditions worldwide [13]. Immense research has been conducted on these OGD sources to better understand their main characteristics from various perspectives, and identify their strengths and weaknesses over the recent years [4, 14,15,16,17]. The authors at [8] conclude that the success of the developed OGD infrastructures requires more than the simple provision of access to data; it is necessary to make progress towards (i) the improvement of the quality of government information, (ii) the creation and institutionalisation of a culture of open government, and (iii) the provision of tools and instruments for the most beneficial data utilisation. The realisation of the ‘Open Government’ paradigm, in general, seems to be a demanding and complex task, requiring combined efforts of multiple actors, from both the public and the private sector, and gradual development of ‘open government ecosystems’ [18].

The contribution of this paper is the aggregation of this research effort towards illustrating the evolutionary path of OGD portals, by presenting an analysis of their characteristics in terms of a maturity model. Our study provides an aggregation of the abovementioned characteristics, examining the development of OGD portals including the factor of time, by proposing an OGD maturity model.

This paper structures as follows; Sect. 2 describes the followed methodology. Section 3 presents the identified and integrated analysis framework of our study in order to categorise the different maturity stages. Section 4 enlists the maturity model for Open Government Data Platforms which is validated by the research literature concerning Greece and the EU in general in Sect. 5. Finally, Sect. 6 concludes the paper by raising issues for further research.

2 Methodology

The paper makes use of a methodology consisting of 4 stages. Firstly, a literature review was conducted in order to identify the documents containing the required information. Secondly, an integrated analysis framework was developed to identify the common elements of analysis in order to maintain coherence. The third step presents the facts that have been identified in the literature and lastly, the fourth step concludes to the construction of the OGD maturity model. More specifically:

Stage 1: Identification of basic literature

The first stage of our research method refers to the identification of the basic literature underlying the characteristics of OGD portals through time [19,20,21,22,23,24,25]. Since there is a great diversity of analytical methods as well as types of portals (European, national, regional, local and thematic) we proceeded to the next step of our methodology.

Stage 2: Formulation of an Integrated Analysis Framework

After the necessary adaptations, we concluded the integrated analysis framework for the construction of the OGD portals maturity model, which consists of elements categorised in 4 dimensions: general; information quality; system quality and service quality.

Stage 3: Analysis and presentation of facts and results

This stage, which is thoroughly analysed in Sect. 4, presents the aggregated results of the studies in terms of the IS Success model of analysis, which we consider it as the most efficient approach for the presentation. The case studies that could provide results in chronological order are those concerning Greece [20, 21] and EU as a whole [19, 22]. A few more studies indicate the development of marketplaces [24] and services repositories [25].

Stage 4:Maturity model construction

At the final stage of the methodology, which is presented in detail in Sect. 5, the maturity model in terms of the analysis framework is presented.

3 Integrated Analysis Framework

After the thorough examination of the literature on OGD evaluation metrics, stage models and portals functionality, we concluded the following dimensions for the development of a maturity model on OGD portals. The identified OGD sources constitute a new type of Information Systems (IS), so in accordance to previous relevant research on IS Success [26,27,28,29], their success relies critically on three main characteristics of them; their ‘information quality’, i.e. the quality of the information they provide, their ‘system quality’, i.e. their quality viewed as technological systems, and their ‘service quality’, i.e. the support provided to its users, such as training, helpdesk, etc. The “general” category introduces characteristics from the recent literature on OGD metrics that could not be categorised in the previous ones.

General

  • Internet presence: This chronically placed element identifies the web presence of datasets. First was the closed silos and then the open data portals which all are characterised by internet presence. This factor was mostly included to point out time zero.

  • Users: It specifies the different type of users according to their capabilities [29,30,31]. Collaboration spaces provide a wider range of functionalities, influenced by the principles of the new Web 2.0 paradigm [32, 33]. They support the main feature of this new paradigm: the elimination of the clear distinction between the ‘passive’ content of users/consumers and the ‘active’ content of producers (which characterises Web 1.0), and the shift towards highly active users (who assess the quality of the data they consume and intervene in order to enhance them) who are potentially data ‘pro-sumers’ (both consumers and providers of data). In particular, collaboration spaces increasingly offer to data users capabilities for comments provision and rating upon the datasets; for processing them in order to improve them, adapt them to specialised needs; link them to other datasets (public or private); and then for uploading-publishing new versions of them, or even their own datasets. In general, collaboration spaces aim at fulfilling the needs of the emerging OGD ‘pro-sumers’ [33].

  • Open Government level: Assessing the open government level of each type of OGD portal, regarding its functionality and scope, according to the study in [34]. The highest the maturity level, the highest the public engagement and thus greater public value of open government is realised.

  • Value: The authors in [35,36,37] argue that there can be four types of values that generated from the OGD, which differ based on the sector generating the value (public or private), and the kind of generated value (social or economic): (i) transparency related value (public sector organisations generate social value by offering increased transparency into government actions, which reduces misuse of public power for private benefits and corruption), (ii) efficiency related value (public sector organisations generate economic value through OGD by increasing internal efficiency and effectiveness), (iii) participation related value (individuals and private sector generate social value through participating and collaborating with government), (iv) innovation related value (private sector firms generate economic value through the creation of new products/services).

Information Quality

  • Thematic perspective: It includes analysis of the thematic categories of the datasets provided by the OGD sources. It has been conducted using the nine main thematic categories of OGD, identified by the [1, 38].

  • Format: It defines the portals’ available data representation formats of the published information and their categorisation, according to the 5-stars Berners Lee’s Rating Scheme for Open Data.Footnote 2 The authors in [41] define LOGD as “all stored data of the public sector connected by the World Wide Web which could be made accessible in a public interest without any restrictions for usage and distribution”, and argue that “the cross linking of Open Data via the Internet and the World Wide Web as “Linked Open Data” (LOD) offers the possibility of using data across domains or organisational borders for statistics, analysis, maps and publications”, which can lead to the generation of more insight, knowledge and innovation from OGD, implementing generic applications that can operate over the complete data space.

  • Metadata: It concerns (a) the metadata openness: Portals’ provided metadata schemas and their categorisation, according to the 5-stars Maturity Scheme of Metadata Management [42,43,44] and (b) their capabilities of flat metadata descriptions (based on a specific metadata models) and/or contextual metadata descriptions and/or detailed metadata of any metadata/vocabulary model [51].

  • RDF-compliance: It concerns the use or not of relevant technologies that support RDF (binary indicator), including technical products of open data initiatives publishing structured data in a way that it can be interlinked. It is quite important, both for enabling more effective browsing and discovery of datasets, and for linking and combining OGD from multiple sources [39, 41]. The use of Semantic Web technologies (such as “Uniform Resource Identifiers” (URI) for the identification of certain resources, the “Resource Description Framework” (RDF) for relating elements, and also vocabularies and ontologies that give meaning to the datasets) in OGD provides a common framework that allows various datasets to be shared and reused. Semantic Web technologies enable a more effective browsing and discovery of datasets through distributed SPARQL queries, and also linking and combining OGD from multiple sources across the Web, which can increase significantly the usefulness of the OGD and the value generated from them (e.g., it allows discovering new correlations and gaining deeper insights, or developing new advanced value-added e-services by combining different datasets from multiple OGD sources). Also, the value of any kind of data (including OGD) increases each time it is being re-used and linked to another resource, and this can be facilitated and triggered by providing informative and explanatory data about each available dataset, i.e. metadata, which can be used as a systematic way to describe datasets, based on pre-agreed meanings, thus facilitating the usefulness of the data.

System Quality

  • Functionality: It includes analysis of the functionalities provided by the OGD portals [45], in terms of datasets discovery (simple document list, free text search, browsing through categories, browsing through filters, browsing through interactive map and SPARQL search), data provision (download file, online view of dataset, API), data visualisation (charts and maps) techniques, multi-linguality and data and metadata processing (e.g. enrichment, data cleansing and data format conversions).

  • Type: It contains the types of OGD portals, as they have been identified in [19]. It has been revealed that two distinct types of OGD sources/portals have been developed with respect to the capabilities/functionalities provided to the user: (i) OGD direct provision portals: constitutes the main category of OGD portals, which are ‘primary sources’ of OGD, publishing original government datasets provided by either one government agency, or a small number of similar government agencies (who are the legal owners/licensers of the data). These portals usually offer a wide range of functionalities supporting the whole lifecycle of OGD, from the creation of datasets to the update and finally to the archiving of them. (ii) OGD aggregators: this category includes OGD aggregator portals, which are ‘secondary sources’ of OGD, coming from a big number of government agencies, publishing and maintaining lists of other ‘primary’ OGD catalogues and links to them. They constitute single access points to multiple OGD direct provision portals, and make it easier for a user to locate the OGD they are interested in. Usually they include descriptive information about datasets and sources, which is quite useful for the users to get a first impression of what is available. Many of them act as highly structured registries of OGD primary sources and datasets, storing structured and machine processable information, and provide ‘index’-like features, such as automated registration and discovery of OGD.

  • Technology: It includes analysis of the technologies and products that have been used for the development of the OGD sources at the main technological layers: (i) web server, (ii) Content Management System (CMS) or platform and (iii) user interface, which is categorised either as open or not open source software.

Service Quality

  • License: It concerns license information related to the use of the published datasets. This is one of the most important characteristic of OGD sources, since it defines the allowed ways of OGD utilisation and exploitation for generating various types of social and economic value, and reduces all relevant legal uncertainties and risks (e.g., see [39, 41]).

  • Rating and Feedback mechanisms: It concerns capabilities to communicate to the other users and the providers the level of quality of the datasets that I perceive and get informed on the level of quality of the datasets perceived by other users through their ratings (e.g. five stars rating system). Another feedback and discussion mechanism that was investigated was the discussion of what can be learned from data use by looking at previous uses of the data; expressing your own needs for additional datasets; getting informed about the needs of other users and getting informed about datasets extensions and revisions [51].

4 The Maturity Model for OGD Portals

Based on the essential elements that have been identified and presented in Sect. 3, we are creating the maturity model presented in Table 1, categorising the capabilities of OGD infrastructures through time. Following the observations of the analysed literature, we concluded the following abstract maturity model:

Table 1. The Maturity Model for OGD Portals

5 Validation of the Maturity Model

5.1 Information Quality

Analysing the thematic perspective, we remark that the thematic category with the highest publication rate in Greece (having a significant difference from the second one) is the economic and financial one, concerning mainly public spending data for various government agencies and also data about economic activity and firms [21]. This is strongly associated with two important facts: the growing citizens’ distrust in government (so many government agencies respond by publishing data on their spending), and the existing severe economic crisis (which necessitates an increase in economic activity, so it is useful to provide data on existing economic activity/firms, which allow a better understanding of it, and support a better design and planning of its increase). Therefore, it is concluded that the first attempt of opening data was restricted in a narrow thematic range, focused mainly on the provision of economic/financial data. Next to that, statistical offices open their census and unemployment data. It should be noted that the European Union member states’ OGD portals, has the highest publication rate in the thematic category of ‘Law Enforcement, Courts and Prisons’ (probably reflecting the increasing criminality and security concerns in many EU countries) [22] and then in economic and statistical data. We also remark that there are also four thematic categories (social, natural resources, legal and geographic information) with much lower publication rate, while the remaining four thematic categories (traffic/transport, meteorological/environmental, agricultural/farming/forestry/fisheries, tourism/leisure and geospatial data) have quite low frequencies, despite their importance (e.g., the importance of agriculture and tourism). In the next developments we observe the increase of publication rate in the categories of GIS and transport data, since they are characterised of great innovation value.

For the semantic perspective, the analysis shows that currently the majority of open data providers aim to adopt an already available metadata standard that fits within their context. Data providers that are based on the CKAN engine also adopt the CKAN metadata schema for the data catalogue and data discovery. Other governmental sites adopt a custom metadata schema for the data discovery and preserve the datasets in vertical-domain metadata standards. Noteworthy cases include open data initiatives that have developed detailed metadata standards to become EU recommendations (e.g., INSPIREFootnote 3 directive for geospatial information and SDMX for statistical information), which tend to be included in the current phases of development. Furthermore, the majority of longstanding OGD sites indicate their intention not to follow the Linked Data paradigm, as opposed to more recent “data gov” efforts. There is a growing rate of RDF-compliance of OGD portals towards the connection to linked open data cloud and much more standardised ontologies have been used for data modelling. Additionally, the analysis indicates that almost all initiatives (with the exception of EUR-Lex) limit their internationalization efforts (if any) to the user interface level not respecting multi-linguality in their published datasets.

For the data perspective, the data formats provided are more or less common between all initiatives, while the vast majority of OGD sites tend to provide data only in the format of the original source. Greece seems to be far behind since the studies indicate a stable position in publishing data in not machine-processable formats (.pdf, .rar and .html instead of .csv, .json and .xls). The current developments and after the launch of Greek open data portal is characterised by a small increase towards machine-processable formats. EU-wide the same course have been followed only quicker.

5.2 System Quality

Our analysis indicates that in Greece only a few OGD aggregators exist; all the others are OGD direct provision portals. EU-wide and as we moving to the next generation of developments, we observe an increase of OGD aggregators at the national level, since the majority of countries (with only a few exceptions) maintain an OGD national portal. In addition EU has launched two versions of its own OGD aggregator. Next to that, we remark some new attempts of collaboration spaces and marketplaces development characterised by higher level of open government and value but not yet with great success and recognition [24, 25].

The analysis of the system quality from the functional perspective identifies that only a few OGD providers offer advanced data acquisition capabilities. The majority of data providers are internally linked to the relevant data repositories and provide only interfaces for data provision. It is especially common for organizations and agencies that are responsible for the complete life cycle of data (from creation to update/archiving), such as statistical offices. Furthermore, the majority of OGD providers offer simple free-text search and theme-browsing functions for the discovery and cataloguing of datasets, whereas only recent open data initiatives start to appreciate the advances of Semantic Web by providing semantically enriched discovery services such as performing SPARQL queries. Additionally, most local public agencies limit their data provision services to a simple download functionality whereas agencies addressed to a wider network (country-level or European level) typically include the capability to view datasets on a map or various types of charts. Nevertheless, the range of visualization facilities offered by each provider varies significantly. This is mainly due to the fact that during the last years visualisation engines have become more comprehensive, flexible and light-weighted. The next generations of OGD platforms are characterised by the provision of more collaborative capabilities such as: Grouping and Interaction, Data Processing, Data Enhanced Modelling, Feedback and Collaboration, Data Quality Rating, Data Linking, Data Versioning, Advanced Data Visualisation and Advanced Data Search.

The analysis from technological perspective shows that there is a strong preference for open-source and free underlying platforms and content management systems in OGD sites with the exception of the Data.gov initiative which is based on the proprietary platform SocrataFootnote 4 that receives widespread adoption in the US (State of Oregon, State of Oklahoma, City of Chicago, City of Seattle, etc.). For data visualization, OGD sites are turning from heavy and proprietary engines to free and light-weighted javascript frameworks (Google charts, JQuery, JavaExts). Lastly, relatively few data providers offer APIs for data and metadata interactions, whereas the paradigm of restful web services that output JSON objects is becoming the common approach in the new generations.

5.3 Service Quality

The first generations of OGD portals are characterised by the absence of service quality mechanisms. Neither guidance in how publicised data could be used nor communication channels supporting feedback and needs input were provided.

The analysis indicates that there is no common policy for license issues as the license for use and reuse of data vary significantly. Most of the OGD portal do not specify their licencing mode but there is a clear move towards open licences and more specifically, Creative Commons Attributes.

One essential element of OGD portals concerns their service quality development “through user adaptation, feedback loops and dynamic supplier and user interactions and other interacting factors” [46]. However, discussion and feedback loops appear barely to be part of existing open data practices and infrastructures. The authors at [33] argue that after open data have been used, the provision of feedback to data providers or a discussion with them is quite important by not facilitated by existing open data infrastructures, though such mechanisms might be useful for improving open data service quality, data release processes and policies. The authors at [47] found that such mechanisms can help users to obtain insight in how they can use and interpret open government data and generate value from them.

Only a few efforts concentrate on receiving the needs of users in a formal and systematic manner. In the majority of service providers comments and suggestions from users is limited to general-purpose feedback web forms that typically address comments on technical aspects of the site rather the actual datasets. On the other side, moving to the next generations of OGD portals there is a clear move towards the inclusion of dataset rating and commenting, as well as viewing and voting users’ demands for specific datasets, that are not yet public or that follow strict data license [48].

6 Conclusions

This paper aggregates the research outcomes and developments, including the factor of time, towards illustrating the evolutionary path of open government data. It presents an analysis that has been conducted based on the basic identified characteristics of them, proposing a maturity model, in terms of traditional and advanced OGD infrastructures. As a next step in our study we have identified the assignment of relevant best practices to each layer, thus assisting policy makers to better design the implementation of each state. The identification of the proposed OGD portals maturity model is based on the distinction of the OGD sources with respect to the capabilities/functionalities they offer, namely to the ‘traditional’ Web 1.0 paradigm and to Web 2.0 paradigm [49, 50].

The ‘traditional’ first generation OGD portals have been influenced by the Web 1.0 paradigm, in which there is a clear distinction between content producers and content users. They are characterized by datasets publishing in non-machine-processable formats (i.e. PDF), without providing any contextual information or linkage capabilities to other datasets. Also, they are limited to offering basic functionalities to data users (consumers) for datasets downloading, and to data providers for uploading datasets. They do not support improvements of their published datasets by their users (e.g., through cleaning and further processing), or feedback provision by datasets users to their providers so that the latter can understand better the needs of the former.

The advanced second generation Web 2.0 OGD portals follows the advent of the Web 2.0 paradigm, which facilitates the generation of content of various types by simple and non-expert users, the development of relationships and online communities among them, and the extensive interaction, collaboration and sharing of content and information. These attributes have led to the emergence of a second generation of OGD portals, which have been influenced by the Web 2.0 principles. They provide, in addition to the basic functionalities of the traditional first generation OGD portals mentioned in the previous paragraph, functionalities for commenting and rating datasets, forming groups around common interests, visualising and processing datasets, improving or adapting them to specialised needs, and then publishing them again, uploading new datasets, enabling OGD users to become data ‘pro-sumers’ (both consuming and producing datasets). Their main objective is to support and facilitate extensive communication between OGD users (citizens, journalists, businesses, scientists, etc.) and providers (government agencies), and also collaborative value generation from OGD.