This volume contains the proceedings of the 4th International Conference on Data Technologies and Applications - DATA 2015), which is sponsored by the Institute for Systems and Technologies of Information, Control and Communication (INSTICC), and co-organized by the University of Haute Alsace and held in cooperation with the ACM SIGMIS - ACM Special Interest Group on Management Information Systems
This conference brings together researchers and practitioners interested in databases, data warehousing, data mining, data management, data security and other aspects of information systems and technology involving advanced applications of data.
The high quality of the DATA 2015 program is enhanced by the three keynote lectures, delivered by distinguished speakers who are renowned experts in their fields: Michele Sebag (Laboratoire de Recherche en Informatique, CNRS, France), John Domingue (The Open University, United Kingdom) and Paul Longley (University College London, United Kingdom).
The meeting is complemented with the Special Session on Knowledge Discovery meets Information Systems: Applications of Big Data Analytics and BI - methodologies, techniques and tools (KomIS).
DATA 2015 received 70 paper submissions, including the special session, from 32 countries in all continents, of which 44% were orally presented (20% as full papers). In order to evaluate each submission, a double blind paper review was performed by the Program Committee.
The program for this conference required the dedicated effort of many people. Firstly, we must thank the authors, whose research efforts are herewith recorded. Next, we thank the members of the Program Committee and the auxiliary reviewers for their diligent and professional reviewing. We would also like to deeply thank the invited speakers for their invaluable contribution and for taking the time to prepare their talks. Finally, a word of appreciation for the hard work of the INSTICC team; organizing a conference of this level is a task that can only be achieved by the collaborative effort of a dedicated and highly capable team.
A successful conference involves more than paper presentations; it is also a meeting place, where ideas about new research projects and other ventures are discussed and debated. Therefore, a social event & banquet has been arranged for the evening of July 21st (Tuesday) in order to promote this kind of social networking.
We wish you all an exciting conference and an unforgettable stay in the city of Colmar. We hope to meet you again next year at DATA 2016 in Lisbon, Portugal.
Finding Maximal Quasi-cliques Containing a Target Vertex in a Graph
Many real-world phenomena such as social networks and biological networks can be modeled as graphs. Discovering dense sub-graphs from these graphs may be able to find interesting facts about the phenomena.
Quasi-cliques are a type of dense graphs, which ...
From Static to Agile - Interactive Particle Physics Analysis in the SAP HANA DB
In order to confirm their theoretical assumptions, physicists employ Monte-Carlo generators to produce millions
of simulated particle collision events and compare them with the results of the detector experiments. The
traditional, static analysis ...
A Study on Term Weighting for Text Categorization: A Novel Supervised Variant of tf.idf
Within text categorization and other data mining tasks, the use of suitable methods for term weighting can
bring a substantial boost in effectiveness. Several term weighting methods have been presented throughout
literature, based on assumptions ...
A First Framework for Mutually Enhancing Chorem and Spatial OLAP Systems
Spatial OLAP systems aim to interactively analyze huge volumes of geo-referenced data. They allow
decision-makers to on-line explore and visualize warehoused spatial using pivot tables, graphical displays
and interactive maps. On the other hand, it has ...
A Unifying Polynomial Model for Efźcient Discovery of Frequent Itemsets
It is well-known that developing a unifying theory is one of the most important issues in Data Mining research.
In the last two decades, a great deal has been devoted to the algorithmic aspects of the Frequent Itemset (FI)
Mining problem. We are ...
Structuring Documents from Short Texts
Nowadays, structured documents are marked-up using XML. XML is the W3C standard that allows to give
a meaning about the stored content of a document by the definition of its logical structure. A logical
structure can be exploited to have a focused ...
Exploiting Linked Data Towards the Production of Added-Value Business Analytics and Vice-versa
- Eleni Fotopoulou,
- Panagiotis Hasapis,
- Anastasios Zafeiropoulos,
- Dimitris Papaspyros,
- Spiros Mouzakitis,
- Norma Zanetti
The majority of enterprises are in the process of recognizing that business data analytics have the potential
to transform their daily operations and make them extremely effective at addressing business challenges,
identifying new market trends and ...
Visual-CBIR: Platform for Storage and Effective Manipulation of a Database Images
Today, image retrieval system has become a vital necessity for computing users. Different search systems
are increasingly invading the computing software markets, such as QBIC, Photobook and BlobWord. The
only negative point these systems have in common ...
Preserving Prediction Accuracy on Incomplete Data Streams
Model tree is a useful and convenient method for predictive analytics in data streams, combining the interpretability
of decision trees with the efficiency of multiple linear regressions. However, missing values within
the data streams is a crucial ...
Determining Top-K Candidates by Reverse Constrained Skyline Queries
Given a set of criteria, an object o is defined to dominate another object o' if o is no worse than o' in each
criterion and has better outcomes in at least a specific criterion. A skyline query returns each object that is
not dominated by any other ...
Extended Techniques for Flexible Modeling and Execution of Data Mashups
Today, a multitude of highly-connected applications and information systems hold, consume and produce huge amounts of heterogeneous data. The overall amount of data is even expected to dramatically increase in the future. In order to conduct, e.g., data ...
Database Evolution for Software Product Lines
Software product lines (SPLs) allow creating a multitude of individual but similar products based on one
common software model. Software components can be developed independently and new products can be
generated easily. Inevitably, software evolves, a ...
A Visual Technique to Assess the Quality of Datasets
Nowadays, more and more information is flowing in and is provided on the Web. Large datasets are made
available covering many fields and sectors. Open Data (OD) plays an important role in this field. Thanks to
the volumes and the variety of the released ...
Facts Collection and Verification Efforts
Geographic web portals and geospatial databases are emerging on the web recently, offering information about countries and places in the world. Digital content is increasing at a staggering rate due to community collaboration and the integration of ...
Database Architectures: Current State and Development
The paper presents shortly a history and development of database management tools in last decade. The
movement towards a higher database performance and database scalability is discussed in the context to
requirements of practice. These include Big Data ...
Data Quality Assessment of Companyźs Maintenance Reporting: A Case Study
Businesses are increasingly using their enterprise data for strategic decision-making activities. In fact, information, derived from data, has become one of the most important tools for businesses to gain competitive edge. Data quality assessment has ...
Integrated Smart Home Services and Smart Wearable Technology for the Disabled and Elderly
Smart Home is indeed a broad concept which includes the techniques and systems applied to living spaces.
While its main goal is to reduce the consumption of energy, it provides many benefits including living in
comfort, security and increasing ...
Service for Data Retrieval via Persistent Identifiers
Persistent identifiers for research data citation have become commonplace yet current practices of minting
them need evaluation to see how the data cited can be actually discovered, contextualized and processed in
scalable eInfrastructures that serve ...
Linking Library Data for Quality Improvement and Data Enrichment
Dataset interlinking holds the potential for data quality improvement and data enrichment as demonstrated
by the Linked Open Data project. This paper explores the library domain characterized by carefully curated
datasets that require high quality ...
Towards a Context-aware Framework for Assessing and Optimizing Data Quality Projects
This paper presents an approach to clearly identify the opportunities for increased monetary and non-monetary
benefits from improved Data Quality, within an Enterprise Architecture context. The aim is to measure, in a
quantitative manner, how key ...
An Ontology for Representing and Extracting Knowledge Starting from Open Data of Public Administrations
As proposed by European Commission through the institution of Europeâ s Digital Agenda, the Italian Digital Agenda has promoted the publication and the use of Open Data (OD) owned by the Public Administration (PA), providing with the appropriate ...
Open Science
The term â open dataâ refers to information that has been made technically and legally available for reuse. In our research, we focus on the particular case of open research data. We conducted a literature review in order to determine what are the ...
Task Clustering on ETL Systems
Usually, data warehousing populating processes are data-oriented workflows composed by dozens of
granular tasks that are responsible for the integration of data coming from different data sources. Specific
subset of these tasks can be grouped on a ...
The Use of Extensible Markup Language (XML) to Analyse Medical Full Text Repositories ź An Example from Homeopathy
Extensible Markup Language (XML) is one of the most popular web languages in the life science used for
for Semantic Data Analysis in various fields of clinical research. One of these fields is the processing of
medical full texts. To extract meaningful ...
The TORCIA platform has been developed as part of a project funded by the Lombardy Region. The main
goal of the project is the development of a tool that leverages social media in emergency management
processes. With a continuous and real-time ...
RDF Resource Search and Exploration with LinkZoo
The Linked Data paradigm is the most common practice for publishing, sharing and managing information in the Data Web. Linkzoo is an IT infrastructure for collaborative publishing, annotating and sharing of Data Web resources, and their publication as ...
A Model for Digital Content Management
Digital libraries work in complex and heterogeneous scenarios. The quantity and diversity of resources, together with the plurality of agents involved in this context, and the continuous evolution of user-generated content, require knowledge to be ...
Automatic Generation of Concept Maps based on Collection of Teaching Materials
The aim of this work is demonstration of usefulness and efficiency of statistical methods of text processing for automatic construction of concept maps of the pre-determined domain. Statistical methods considered in this paper are based on the analysis ...
Efficient Exploration of Linked Data Cloud
As the size of semantic data available as Linked Open Data (LOD) increases, the demand for methods for automated exploration of data sets grows as well. A data consumer needs to search for data sets meeting his interest and look into them using suitable ...
Hypergraph-based Access Control Using Formal Language Expressions - HGAC
In all organizations, access assignments are essential in order to ensure data privacy, permission levels and the correct assignment of tasks. Traditionally, such assignments are based on total enumeration, with the consequence that constant effort has ...
Index Terms
- Proceedings of 4th International Conference on Data Management Technologies and Applications