Keyword: Data Cleaning : Search

Article

GaussTS: Towards Time Series Data Management in OpenGauss

Web and Big DataPages 496–501https://doi.org/10.1007/978-981-97-2421-5_34

Abstract

The rapid advancement of sensor technology presents novel challenges in the efficient management of large scale time series data. In this demo, we demonstrate a time series data management module named GaussTS in database, which provides four key ...

short-paper

Open Access

Interactive Data Cleaning for Real-Time Streaming Applications

HILDA '23: Proceedings of the Workshop on Human-In-the-Loop Data AnalyticsArticle No.: 13, Pages 1–3https://doi.org/10.1145/3597465.3605229

The importance of data cleaning systems has continuously grown in recent years. Especially for real-time streaming applications, it is crucial, to identify and possibly remove anomalies in the data on the fly before further processing. The main challenge ...

Article

Cleaner Categories Improve Object Detection and Visual-Textual Grounding

Image AnalysisPages 412–442https://doi.org/10.1007/978-3-031-31435-3_28

Abstract

Object detectors are core components of multimodal models, enabling them to locate the region of interest in images which are then used to solve many multimodal tasks. Among the many extant object detectors, the Bottom-Up Faster R-CNN [39] (BUA) ...

research-article

Open Access

Data Preprocessing in Supply Chain Management Analytics - A Review of Methods, the Operations They Fulfill, and the Tasks They Accomplish.: Data Preprocessing in Supply Chain Management Analytics.

ICCMB '23: Proceedings of the 2023 6th International Conference on Computers in Management and BusinessPages 93–99https://doi.org/10.1145/3584816.3584830

Data preprocessing is thought of as one of the most important steps in data analytics. This is especially true for the field of Supply Chain Management (SCM), in which the handling of huge data sets is the norm. Data preprocessing consists of multiple ...

research-article

STDAPM: Spatial and Temporal Distribution Analysis and Prediction Model of Bee Species based on Grey Model

ICCAI '22: Proceedings of the 8th International Conference on Computing and Artificial IntelligencePages 152–157https://doi.org/10.1145/3532213.3532236

The presence of many different species of bees in nature increases the diversity of ecosystems. However, when the original ecology of bees is destroyed, related bees will migrate. Nevertheless, large-scale migratory behavior may seriously affect local ...

research-article

Design and Implementation of Preprocessing Scheme for Massive SQL Interactive Instructions in Power Business

Xiaogang Wei

WSSE '20: Proceedings of the 2nd World Symposium on Software EngineeringPages 55–59https://doi.org/10.1145/3425329.3425341

With the continuous development of power business, the demand of data interaction between internal and external network is becoming more and more frequent, resulting in a large number of Structured Query Language (SQL) interactive instructions. Aiming ...

research-article

Data Wrangling for South African Smart City Crime Data

SAICSIT '20: Conference of the South African Institute of Computer Scientists and Information Technologists 2020Pages 198–209https://doi.org/10.1145/3410886.3410913

South Africa (S.A.) is currently facing economic and social challenges that could benefit from the implementation of international smart city guidelines. Crucial to transforming a city into a smart city is the collection and access to reliable data. One ...

demonstration

An Interactive Map-based System for Visually Exploring and Cleaning GPS Traces

SIGSPATIAL '19: Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information SystemsPages 572–575https://doi.org/10.1145/3347146.3359105

It is a fact that there are tons of GPS traces generated every minute by the millions of in-road vehicles over the world. Naturally, those traces contain imprecise readings, and most of the time they include noise and outliers. Therefore, there is a ...

poster

Which One is Correct, The Map or The GPS Trace

SIGSPATIAL '19: Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information SystemsPages 472–475https://doi.org/10.1145/3347146.3359099

GPS data is noisy by nature. A typical location-based service would start by filtering out the noise from the raw GPS points that are generated by moving objects. Once the locations of the objects are identified, the location-based service is provided. ...

article

A Comparative Study of Data Cleaning Tools

International Journal of Data Warehousing and Mining (IJDWM-IGI), Volume 15, Issue 4Pages 48–65https://doi.org/10.4018/IJDWM.2019100103

In the information era, data is crucial in decision making. Most data sets contain impurities that need to be weeded out before any meaningful decision can be made from the data. Hence, data cleaning is essential and often takes more than 80 percent ...

research-article

Research on Key Problems of Data Quality in Large Industrial Data Environment

ICRCA '18: Proceedings of the 3rd International Conference on Robotics, Control and AutomationPages 245–248https://doi.org/10.1145/3265639.3265680

At present, the modern manufacturing and management concepts such as digitalization, networking and intellectualization have been popularized in the industry, and the degree of industrial automation and information has been improved unprecedentedly. ...

article

A MapReduce-Based User Identification Algorithm in Web Usage Mining

International Journal of Information Technology and Web Engineering (IJITWE-IGI), Volume 13, Issue 2Pages 11–23https://doi.org/10.4018/IJITWE.2018040102

This article contends that in the booming era of information, analysing users' navigation behaviour is an important task. User identification is considered as one of the important and challenging tasks in the data preprocessing phase of the Web usage ...

research-article

ETDC: An Efficient Technique to Cleanse Data in the Data Warehouse

Saad B. Alotaibi

ICAIP '17: Proceedings of the International Conference on Advances in Image ProcessingPages 135–138https://doi.org/10.1145/3133264.3133296

Data cleansing can be considered to be an activity that is performed on the data sets of the data warehouse. The cleansing is done in order to enhance and collectively maintain data consistency and quality. The quality of data has a strong impact on a ...

research-article

Data Quality Assessment and Improvement

Procedia Computer Science (PROCS), Volume 106, Issue CPages 32–38https://doi.org/10.1016/j.procs.2017.03.006

As the Vrije Universiteit Brussel switched from an in-house built CRIS to Pure, a large number of data quality issues were discovered. In order to solve these, a large-scale data quality assessment and improvement program was started. The assessment ...

short-paper

Ontology Based Rewriting Data Cleaning Operations

C3S2E '16: Proceedings of the Ninth International C* Conference on Computer Science & Software EngineeringPages 85–88https://doi.org/10.1145/2948992.2949007

Dealing with increasing amounts of data creates the need to deal with redundant, inconsistent and/or complementary repositories which may be different in their data models and/or in their schema. Current data cleaning techniques developed to tackle data ...

Article

SJClust: Towards a Framework for Integrating Similarity Join Algorithms and Clustering

ICEIS 2016: Proceedings of the 18th International Conference on Enterprise Information SystemsPages 75–80https://doi.org/10.5220/0005868700750080

A critical task in data cleaning and integration is the identification of duplicate records representing the same real-world entity. A popular approach to duplicate identification employs similarity join to find pairs of similar records followed by a ...

Article

Identification of Organization Name Variants in Large Databases using Rule-based Scoring and Clustering

ICEIS 2016: Proceedings of the 18th International Conference on Enterprise Information SystemsPages 182–187https://doi.org/10.5220/0005836701820187

This research describes a general method to automatically clean organizational and business names variants within large databases, such as: patent databases, bibliographic databases, databases in business information systems, or any other database ...

Article

An Ontology-based Methodology for Reusing Data Cleaning Knowledge

IC3K 2015: Proceedings of the International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge ManagementPages 202–211https://doi.org/10.5220/0005596402020211

The organizations' demand to integrate several heterogeneous data sources and an ever-increasing volume of data is revealing the presence of quality problems in data. Currently, most of the data cleaning approaches (for detection and correction of data ...

Article

Cleaning Framework for Big Data - Object Identification and Linkage

BIGDATACONGRESS '15: Proceedings of the 2015 IEEE International Congress on Big DataPages 215–221https://doi.org/10.1109/BigDataCongress.2015.38

Data is a valuable resource. The proper use of high-quality data can help make better predictions, analysis and decisions. Poor-quality data is detrimental to data analytics. Data from different sources may provide the same entities, but different ...

Article

LOD Laundromat: A Uniform Way of Publishing Other People’s Dirty Data

The Semantic Web – ISWC 2014Pages 213–228https://doi.org/10.1007/978-3-319-11964-9_14

Abstract

It is widely accepted that proper data publishing is difficult. The majority of Linked Open Data (LOD) does not meet even a core set of data publishing guidelines. Moreover, datasets that are clean at creation, can get stains over time. As a ...

Search Results

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

GaussTS: Towards Time Series Data Management in OpenGauss

Interactive Data Cleaning for Real-Time Streaming Applications

Cleaner Categories Improve Object Detection and Visual-Textual Grounding

Data Preprocessing in Supply Chain Management Analytics - A Review of Methods, the Operations They Fulfill, and the Tasks They Accomplish.: Data Preprocessing in Supply Chain Management Analytics.

STDAPM: Spatial and Temporal Distribution Analysis and Prediction Model of Bee Species based on Grey Model

Design and Implementation of Preprocessing Scheme for Massive SQL Interactive Instructions in Power Business

Data Wrangling for South African Smart City Crime Data

An Interactive Map-based System for Visually Exploring and Cleaning GPS Traces

Which One is Correct, The Map or The GPS Trace

A Comparative Study of Data Cleaning Tools

Research on Key Problems of Data Quality in Large Industrial Data Environment

A MapReduce-Based User Identification Algorithm in Web Usage Mining

ETDC: An Efficient Technique to Cleanse Data in the Data Warehouse

Data Quality Assessment and Improvement

Ontology Based Rewriting Data Cleaning Operations

SJClust: Towards a Framework for Integrating Similarity Join Algorithms and Clustering

Identification of Organization Name Variants in Large Databases using Rule-based Scoring and Clustering

An Ontology-based Methodology for Reusing Data Cleaning Knowledge

Cleaning Framework for Big Data - Object Identification and Linkage

LOD Laundromat: A Uniform Way of Publishing Other People’s Dirty Data

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Save to Binder