Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- ArticleMay 2024
GaussTS: Towards Time Series Data Management in OpenGauss
AbstractThe rapid advancement of sensor technology presents novel challenges in the efficient management of large scale time series data. In this demo, we demonstrate a time series data management module named GaussTS in database, which provides four key ...
- short-paperJuly 2023
Interactive Data Cleaning for Real-Time Streaming Applications
HILDA '23: Proceedings of the Workshop on Human-In-the-Loop Data AnalyticsArticle No.: 13, Pages 1–3https://doi.org/10.1145/3597465.3605229The importance of data cleaning systems has continuously grown in recent years. Especially for real-time streaming applications, it is crucial, to identify and possibly remove anomalies in the data on the fly before further processing. The main challenge ...
- ArticleApril 2023
Cleaner Categories Improve Object Detection and Visual-Textual Grounding
AbstractObject detectors are core components of multimodal models, enabling them to locate the region of interest in images which are then used to solve many multimodal tasks. Among the many extant object detectors, the Bottom-Up Faster R-CNN [39] (BUA) ...
- research-articleJune 2023
Data Preprocessing in Supply Chain Management Analytics - A Review of Methods, the Operations They Fulfill, and the Tasks They Accomplish.: Data Preprocessing in Supply Chain Management Analytics.
ICCMB '23: Proceedings of the 2023 6th International Conference on Computers in Management and BusinessPages 93–99https://doi.org/10.1145/3584816.3584830Data preprocessing is thought of as one of the most important steps in data analytics. This is especially true for the field of Supply Chain Management (SCM), in which the handling of huge data sets is the norm. Data preprocessing consists of multiple ...
- research-articleJuly 2022
STDAPM: Spatial and Temporal Distribution Analysis and Prediction Model of Bee Species based on Grey Model
ICCAI '22: Proceedings of the 8th International Conference on Computing and Artificial IntelligencePages 152–157https://doi.org/10.1145/3532213.3532236The presence of many different species of bees in nature increases the diversity of ecosystems. However, when the original ecology of bees is destroyed, related bees will migrate. Nevertheless, large-scale migratory behavior may seriously affect local ...
-
- research-articleNovember 2020
Design and Implementation of Preprocessing Scheme for Massive SQL Interactive Instructions in Power Business
WSSE '20: Proceedings of the 2nd World Symposium on Software EngineeringPages 55–59https://doi.org/10.1145/3425329.3425341With the continuous development of power business, the demand of data interaction between internal and external network is becoming more and more frequent, resulting in a large number of Structured Query Language (SQL) interactive instructions. Aiming ...
- research-articleSeptember 2020
Data Wrangling for South African Smart City Crime Data
SAICSIT '20: Conference of the South African Institute of Computer Scientists and Information Technologists 2020Pages 198–209https://doi.org/10.1145/3410886.3410913South Africa (S.A.) is currently facing economic and social challenges that could benefit from the implementation of international smart city guidelines. Crucial to transforming a city into a smart city is the collection and access to reliable data. One ...
- demonstrationNovember 2019
An Interactive Map-based System for Visually Exploring and Cleaning GPS Traces
- Abdeltawab Hendawi,
- Sree Sindhu Sabbineni,
- Jianwei Shen,
- Yaxiao Song,
- Peiwei Cao,
- Zhihong Zhang,
- John Krumm,
- Mohamed Ali
SIGSPATIAL '19: Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information SystemsPages 572–575https://doi.org/10.1145/3347146.3359105It is a fact that there are tons of GPS traces generated every minute by the millions of in-road vehicles over the world. Naturally, those traces contain imprecise readings, and most of the time they include noise and outliers. Therefore, there is a ...
- posterNovember 2019
Which One is Correct, The Map or The GPS Trace
- Abdeltawab Hendawi,
- Sree Sindhu Sabbineni,
- Jianwei Shen,
- Yaxiao Song,
- Peiwei Cao,
- Zhihong Zhang,
- John Krumm,
- Mohamed Ali
SIGSPATIAL '19: Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information SystemsPages 472–475https://doi.org/10.1145/3347146.3359099GPS data is noisy by nature. A typical location-based service would start by filtering out the noise from the raw GPS points that are generated by moving objects. Once the locations of the objects are identified, the location-based service is provided. ...
- articleOctober 2019
A Comparative Study of Data Cleaning Tools
International Journal of Data Warehousing and Mining (IJDWM-IGI), Volume 15, Issue 4Pages 48–65https://doi.org/10.4018/IJDWM.2019100103In the information era, data is crucial in decision making. Most data sets contain impurities that need to be weeded out before any meaningful decision can be made from the data. Hence, data cleaning is essential and often takes more than 80 percent ...
- research-articleAugust 2018
Research on Key Problems of Data Quality in Large Industrial Data Environment
ICRCA '18: Proceedings of the 3rd International Conference on Robotics, Control and AutomationPages 245–248https://doi.org/10.1145/3265639.3265680At present, the modern manufacturing and management concepts such as digitalization, networking and intellectualization have been popularized in the industry, and the degree of industrial automation and information has been improved unprecedentedly. ...
- articleApril 2018
A MapReduce-Based User Identification Algorithm in Web Usage Mining
International Journal of Information Technology and Web Engineering (IJITWE-IGI), Volume 13, Issue 2Pages 11–23https://doi.org/10.4018/IJITWE.2018040102This article contends that in the booming era of information, analysing users' navigation behaviour is an important task. User identification is considered as one of the important and challenging tasks in the data preprocessing phase of the Web usage ...
- research-articleAugust 2017
ETDC: An Efficient Technique to Cleanse Data in the Data Warehouse
ICAIP '17: Proceedings of the International Conference on Advances in Image ProcessingPages 135–138https://doi.org/10.1145/3133264.3133296Data cleansing can be considered to be an activity that is performed on the data sets of the data warehouse. The cleansing is done in order to enhance and collectively maintain data consistency and quality. The quality of data has a strong impact on a ...
- research-articleApril 2017
Data Quality Assessment and Improvement
Procedia Computer Science (PROCS), Volume 106, Issue CPages 32–38https://doi.org/10.1016/j.procs.2017.03.006As the Vrije Universiteit Brussel switched from an in-house built CRIS to Pure, a large number of data quality issues were discovered. In order to solve these, a large-scale data quality assessment and improvement program was started. The assessment ...
- short-paperJuly 2016
Ontology Based Rewriting Data Cleaning Operations
C3S2E '16: Proceedings of the Ninth International C* Conference on Computer Science & Software EngineeringPages 85–88https://doi.org/10.1145/2948992.2949007Dealing with increasing amounts of data creates the need to deal with redundant, inconsistent and/or complementary repositories which may be different in their data models and/or in their schema. Current data cleaning techniques developed to tackle data ...
- ArticleApril 2016
SJClust: Towards a Framework for Integrating Similarity Join Algorithms and Clustering
ICEIS 2016: Proceedings of the 18th International Conference on Enterprise Information SystemsPages 75–80https://doi.org/10.5220/0005868700750080A critical task in data cleaning and integration is the identification of duplicate records representing the same real-world entity. A popular approach to duplicate identification employs similarity join to find pairs of similar records followed by a ...
- ArticleApril 2016
Identification of Organization Name Variants in Large Databases using Rule-based Scoring and Clustering
ICEIS 2016: Proceedings of the 18th International Conference on Enterprise Information SystemsPages 182–187https://doi.org/10.5220/0005836701820187This research describes a general method to automatically clean organizational and business names variants within large databases, such as: patent databases, bibliographic databases, databases in business information systems, or any other database ...
- ArticleNovember 2015
An Ontology-based Methodology for Reusing Data Cleaning Knowledge
IC3K 2015: Proceedings of the International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge ManagementPages 202–211https://doi.org/10.5220/0005596402020211The organizations' demand to integrate several heterogeneous data sources and an ever-increasing volume of data is revealing the presence of quality problems in data. Currently, most of the data cleaning approaches (for detection and correction of data ...
- ArticleJune 2015
Cleaning Framework for Big Data - Object Identification and Linkage
BIGDATACONGRESS '15: Proceedings of the 2015 IEEE International Congress on Big DataPages 215–221https://doi.org/10.1109/BigDataCongress.2015.38Data is a valuable resource. The proper use of high-quality data can help make better predictions, analysis and decisions. Poor-quality data is detrimental to data analytics. Data from different sources may provide the same entities, but different ...
- ArticleOctober 2014
LOD Laundromat: A Uniform Way of Publishing Other People’s Dirty Data
AbstractIt is widely accepted that proper data publishing is difficult. The majority of Linked Open Data (LOD) does not meet even a core set of data publishing guidelines. Moreover, datasets that are clean at creation, can get stains over time. As a ...