[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3274895.3274987acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

TurboReg: a framework for scaling up spatial logistic regression models

Published: 06 November 2018 Publication History

Abstract

Predicting the presence or absence of spatial phenomena has been of great interest to scientists pursuing research in several applications including epidemic diseases detection, species occurrence prediction and earth observation. In this operation, a geographical space is divided by a two-dimensional grid, where the prediction (i.e, either 0 or 1) is performed at each cell in the grid. A common approach to solve this problem is to build spatial logistic regression models (a.k.a autologistic models) that estimate the prediction at any location based on a set of predictors (i.e., features) at this location and predictions from neighboring locations. Unfortunately, existing methods to build autologistic models are computationally expensive and do not scale up for large-scale grid data (e.g., fine-grained satellite images). This paper introduces TurboReg, a scalable framework to build autologistic models for predicting large-scale spatial phenomena. TurboReg considers both the accuracy and efficiency aspects when learning the regression model parameters. TurboReg is built on top of Markov Logic Network (MLN), a scalable statistical learning framework, where its internals and data structures are optimized to process spatial data. A set of experiments using large real and synthetic data show that TurboReg achieves at least three orders of magnitude performance gain over existing methods while preserving the model accuracy.

References

[1]
MinnesotaCompass. http://www.mncompass.org/.
[2]
NASA EarthData. https://earthdata.nasa.gov/earth-observation-data.
[3]
OpenDataMinneapolis. http://opendata.minneapolismn.gov/.
[4]
Nathalie Augustin, Moira A. Mugglestone, and Stephen T. Buckland. An Autologistic Model for the Spatial Distribution of Wildlife. J. Appl. Ecol., 1996.
[5]
Colin M. Beale, Jack J. Lennon, Jon M. Yearsley, Mark J. Brewer, and David A. Elston. Regression Analysis of Spatial Data. Ecol. Lett., 2010.
[6]
Julian Besag. Spatial Interaction and the Statistical Analysis of Lattice Systems. J. Royal Stat. Soc., 1974.
[7]
Julian Besag. Statistical Analysis of Non-Lattice Data. J. Royal Stat. Soc., 1975.
[8]
Petruta C. Caragea and Mark S. Kaiser. Autologistic Models with Interpretable Parameters. JABES, 2009.
[9]
Yang Chen and Daisy Zhe Wang. Web-Scale Knowledge Inference Using Markov Logic Networks. ICML SLG, 2013.
[10]
David R. Cox. The Regression Analysis of Binary Sequences (with discussion). J. Royal Stat. Soc., 1958.
[11]
Robert Crane and Luke K. McDowell. Evaluating Markov Logic Networks for Collective Classification. In SIGKDD MLG, 2011.
[12]
Pedro Domingos and Daniel Lowd. Markov Logic: An Interface Layer for Artificial Intelligence. Morgan and Claypool Publishers, 2009.
[13]
Juan Ferrandiz, Antonio Lopez, Agustin Llopis, Maria Morales, and Maria Luisa Tejerizo. Spatial Interaction between Neighbouring Counties: Cancer Mortality Data in Valencia (Spain). Biometrics, 1995.
[14]
R.A. Finkel and J.L. Bentley. Quad Trees a Data Structure for Retrieval on Composite Keys. Acta Informatica, 1974.
[15]
Michael Genesereth and Nils Nilsson. Logical Foundations of Artificial Intelligence. Morgan Kaufmann Publishers Inc., 1987.
[16]
Charles J. Geyer. On the Convergence of Monte Carlo Maximum Likelihood Calculations. J. Royal Stat. Soc., 1994.
[17]
C. A. Gotway and W. W. Stroup. A Generalized Linear Model Approach to Spatial Data Analysis and Prediction. JABES, 1997.
[18]
Marcia L. Gumpertz, Jonathan M. Graham, and Jean B. Ristaino. Autologistic Model of Spatial Pattern of Phytophthora Epidemic in Bell Pepper: Effects of Soil Variables on Disease Presence. JABES, 1997.
[19]
A. Guttman. R-trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Rec., 1984.
[20]
Tjelmeland Hakon and Besag Julian. Markov Random Fields with Higher-order Interactions. Scand. Stat. Theory Appl., 1998.
[21]
Murali Haran. Gaussian Random Field Models for Spatial Data. Chapman and Hall/CRC, 2011.
[22]
Fangliang He, Julie Zhou, and Hongtu Zhu. Autologistic Regression Model for the Distribution of Vegetation. JABES, 2003.
[23]
John Hughes. ngspatial: A Package for Fitting the Centered Autologistic and Sparse Spatial Generalized Linear Mixed Models for Areal Data. The R Journal, 2014.
[24]
John Hughes, Murali Haran, and Petruta C. Caragea. Autologistic Models for Binary Data on a Lattice. Environmetrics, 2011.
[25]
Nikos Koutsias. An Autologistic Regression Model for Increasing the Accuracy of Burned Surface Mapping using Landsat Thematic Mapper Data. Int. J. Remote Sens., 2003.
[26]
Catherine Linard and Andrew J. Tatem. Large-scale Spatial Population Databases in Infectious Disease Research. Int. J. Health Geogr., 2012.
[27]
Daphne Lopez, M. Gunasekaran, and B. Senthil Murugan. Spatial Big Data Analytics of Influenza Epidemic in Vellore, India. In IEEE Big Data, 2014.
[28]
J. Moller, A. N. Pettitt, R. Reeves, and K. K. Berthelsen. An Efficient Markov Chain Monte Carlo Method for Distributions with Intractable Normalising Constants. Biometrika, 2006.
[29]
Sangkil Moon and Gary J. Russell. Predicting Product Purchase from Inferred Customer Similarity: An Autologistic Model Approach. Management Science, 2008.
[30]
J. Nievergelt, Hans Hinterberger, and Kenneth C. Sevcik. The Grid File: An Adaptable, Symmetric Multikey File Structure. TODS, 9(1), 1984.
[31]
Feng Niu, Christopher Ré, AnHai Doan, and Jude Shavlik. Tuffy: Scaling Up Statistical Inference in Markov Logic Networks Using an RDBMS. VLDB, 2011.
[32]
Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas, and Christopher Ré. HoloClean: Holistic Data Repairs with Probabilistic Inference. VLDB J., 2017.
[33]
Havard Rue and Leonhard Held. Gaussian Markov Random Fields: Theory And Applications (Monographs on Statistics and Applied Probability). Chapman & Hall/CRC, 2005.
[34]
Nikita A. Sakhanenko and David J. Galas. Markov Logic Networks in the Analysis of Genetic Data. J. Comput. Biol., 2010.
[35]
R. Sanderson, M. D. Eyre, S. P. Rushton, and Kaj Sand-Jensen. Distribution of Selected Macroinvertebrates in a Mosaic of Temporary and Permanent Freshwater Ponds as Explained by Autologistic Models. Ecography, 2005.
[36]
J. Michael Scott, Patricia J. Heglund, and Michael L. Morrison et al. Predicting Species Occurrences: Issues of Accuracy and Scale. Journal of Mammalogy, 2002.
[37]
Michael Sherman, Tatiyana V. Apanasovich, and Raymond J. Carroll. On estimation in binary autologistic spatial models. J. Stat. Comput. Simul., 2006.
[38]
Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, and Christopher Ré. Incremental Knowledge Base Construction Using DeepDive. PVLDB, 2015.
[39]
Brian L. Sullivan, Christopher L. Wood, Marshall J. Iliff, Rick E. Bonney, Daniel Fink, and Steve Kelling. eBird: A Citizen-based Bird Observation Network in the Biological Sciences. Biological Conservation, 2009.
[40]
W. R. Tobler. Cellular Geography: Philosophy in Geography. Springer, Dordrecht, 1979.
[41]
Zilong Wang and Yanbing Zheng. Analysis of Binary Data via a Centered Spatial-temporal Autologistic Regression Model. Environ. and Ecol. Stat., 2013.
[42]
Michael Wick, Andrew McCallum, and Gerome Miklau. Scalable Probabilistic Databases with Factor Graphs and MCMC. PVLDB, 2010.
[43]
Mark A. Wolters and C. B. Dean. Classification of Large-Scale Remote Sensing Images for Automatic Identification of Health Hazards: Smoke Detection Using an Autologistic Regression Classifier. Statistics in Biosciences, 2017.
[44]
Ce Zhang and Christopher Ré. Towards High-throughput Gibbs Sampling at Scale: A Study Across Storage Managers. In SIGMOD, 2013.
[45]
Yanbing Zheng and Jun Zhu. Markov Chain Monte Carlo for a Spatial-Temporal Autologistic Regression Model. J. Comput. Graph. Stat., 2008.
[46]
Jun Zhu, Hsin-Cheng Huang, and Jungpin Wu. Modeling Spatial-temporal Binary Data using Markov Random Fields. JABES, 2005.
[47]
Martin Zinkevich, Markus Weimer, Lihong Li, and Alex J. Smola. Parallelized Stochastic Gradient Descent. In NIPS, 2010.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGSPATIAL '18: Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
November 2018
655 pages
ISBN:9781450358897
DOI:10.1145/3274895
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Markov logic networks
  2. autologistic models
  3. factor graph
  4. first-order logic
  5. spatial regression

Qualifiers

  • Research-article

Conference

SIGSPATIAL '18
Sponsor:

Acceptance Rates

SIGSPATIAL '18 Paper Acceptance Rate 30 of 150 submissions, 20%;
Overall Acceptance Rate 257 of 1,238 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)3
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Spatial Query Optimization With LearningProceedings of the VLDB Endowment10.14778/3685800.368584617:12(4245-4248)Online publication date: 8-Nov-2024
  • (2022)A Localization Method of Ant Colony Optimization in Nonuniform SpaceSensors10.3390/s2219738922:19(7389)Online publication date: 28-Sep-2022
  • (2021)Machine Learning Meets Big Spatial Data (Revised)2021 22nd IEEE International Conference on Mobile Data Management (MDM)10.1109/MDM52706.2021.00014(5-8)Online publication date: Jun-2021
  • (2020)FlashSIGSPATIAL Special10.1145/3383653.338365411:3(3-6)Online publication date: 13-Feb-2020
  • (2020)Machine Learning Meets Big Spatial Data2020 IEEE 36th International Conference on Data Engineering (ICDE)10.1109/ICDE48307.2020.00169(1782-1785)Online publication date: Apr-2020
  • (2019)Machine learning meets big spatial dataProceedings of the VLDB Endowment10.14778/3352063.335211512:12(1982-1985)Online publication date: 1-Aug-2019
  • (2019)Flash in actionProceedings of the VLDB Endowment10.14778/3352063.335207812:12(1834-1837)Online publication date: 1-Aug-2019
  • (2019)RegRocketACM Transactions on Spatial Algorithms and Systems10.1145/33664595:4(1-27)Online publication date: 25-Nov-2019
  • (2019)Towards Scalable Spatial Probabilistic Graphical ModelingProceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems10.1145/3347146.3363461(606-607)Online publication date: 5-Nov-2019
  • (2019)NanoSIMS measurements of sub‐micrometer particles using the local thresholding techniqueSurface and Interface Analysis10.1002/sia.671152:5(234-239)Online publication date: 12-Dec-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media