Ready to unlock the power of your data? With this comprehensive guide, youll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters. Youll find illuminating case studies that demonstrate how Hadoop is used to solve specific problems. This third edition covers recent changes to Hadoop, including material on the new MapReduce API, as well as MapReduce 2 and its more flexible execution model (YARN).Store large datasets with the Hadoop Distributed File System (HDFS) Run distributed computations with MapReduce Use Hadoops data and I/O building blocks for compression, data integrity, serialization (including Avro), and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop clusteror run Hadoop in the cloud Load data from relational databases into HDFS, using Sqoop Perform large-scale data processing with the Pig query language Analyze datasets with Hive, Hadoops data warehousing system Take advantage of HBase for structured and semi-structured data, and ZooKeeper for building distributed systems
Cited By
- Meng L, Shao Y, Yuan L, Lai L, Cheng P, Li X, Yu W, Zhang W, Lin X and Zhou J (2024). A Survey of Distributed Graph Algorithms on Massive Graphs, ACM Computing Surveys, 57:2, (1-39), Online publication date: 28-Feb-2025.
- Xu Y and Yu L Cross-regional Teaching Resource Sharing Solution Based on HADOOP Architecture Proceedings of the 2024 International Symposium on Artificial Intelligence for Education, (613-620)
- Liu Q and Seshadhri C Brief Announcement: Improved Massively Parallel Triangle Counting in O(1) Rounds Proceedings of the 43rd ACM Symposium on Principles of Distributed Computing, (519-522)
- Coy S, Czumaj A, Mishra G and Mukherjee A Log Diameter Rounds MST Verification and Sensitivity in MPC Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures, (269-280)
- Dory M and Matar S Massively Parallel Algorithms for Approximate Shortest Paths Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures, (415-426)
- Yang X, Zhang Y, Chen H, Li F, Wang B, Fang J, Sun C and Wang Y PolarDB-MP: A Multi-Primary Cloud-Native Database via Disaggregated Shared Memory Companion of the 2024 International Conference on Management of Data, (295-308)
- Cao R, Bao L, Zhangsun P, Wu C, Wei S, Sun R, Li R and Zhang Z (2023). PTSSBench: a performance evaluation platform in support of automated parameter tuning of software systems, Automated Software Engineering, 31:1, Online publication date: 1-May-2024.
- Fernandez-Basso C, Ruiz M and Martin-Bautista M (2023). New Spark solutions for distributed frequent itemset and association rule mining algorithms, Cluster Computing, 27:2, (1217-1234), Online publication date: 1-Apr-2024.
- Wu W Learning Big Data Systems via Emulation Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1, (1449-1455)
- Czumaj A, Davies-Peck P and Parter M (2024). Component stability in low-space massively parallel computation, Distributed Computing, 37:1, (35-64), Online publication date: 1-Mar-2024.
- Presser D and Siqueira F (2023). Partitioning-Aware Performance Modeling of Distributed Graph Processing Tasks, International Journal of Parallel Programming, 51:4-5, (231-255), Online publication date: 1-Oct-2023.
- Ahanchi A, Andoni A, Hajiaghayi M, Knittel M and Zhong P Massively Parallel Tree Embeddings for High Dimensional Spaces Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures, (77-88)
- Zhou N, Zhou H and Hoppe D (2023). Containerization for High Performance Computing Systems: Survey and Prospects, IEEE Transactions on Software Engineering, 49:4, (2722-2740), Online publication date: 1-Apr-2023.
- Wu W Assessing Peer Correction of SQL and NoSQL Queries Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1, (535-541)
- Wu W Towards a Validated Self-Efficacy Scale for Data Management Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1, (186-192)
- Li Y and Lee B (2022). Phronesis: Efficient Performance Modeling for High-dimensional Configuration Tuning, ACM Transactions on Architecture and Code Optimization, 19:4, (1-26), Online publication date: 31-Dec-2022.
- Meddah I, Guerroudji F and Remil N (2022). Distributed Business Process Discovery in Cloud Clusters, International Journal of Distributed Artificial Intelligence, 14:1, (1-18), Online publication date: 30-Sep-2022.
- Cohen-Addad V, Epasto A, Lattanzi S, Mirrokni V, Munoz Medina A, Saulpic D, Schwiegelshohn C and Vassilvitskii S Scalable Differentially Private Clustering via Hierarchically Separated Trees Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, (221-230)
- Fischer O, Horowitz A and Oshman R Massively Parallel Computation in a Heterogeneous Regime Proceedings of the 2022 ACM Symposium on Principles of Distributed Computing, (345-355)
- Cohen-Addad V, Mallmann-Trenn F and Saulpic D A Massively Parallel Modularity-Maximizing Algorithm with Provable Guarantees Proceedings of the 2022 ACM Symposium on Principles of Distributed Computing, (356-365)
- Ghaffari M, Grunau C and Mitrović S Massively Parallel Algorithms for b-Matching Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures, (35-44)
- Sethi K, Ramesh D and Trivedi M (2022). A Spark-based high utility itemset mining with multiple external utilities, Cluster Computing, 25:2, (889-909), Online publication date: 1-Apr-2022.
- Nanongkai D and Scquizzato M (2022). Equivalence classes and conditional hardness in massively parallel computations, Distributed Computing, 35:2, (165-183), Online publication date: 1-Apr-2022.
- Xiang N and Fan Y (2022). Individual Online Learning Behavior Analysis Based on Hadoop, Computational Intelligence and Neuroscience, 2022, Online publication date: 1-Jan-2022.
- Kapralov M, Lattanzi S, Nouri N and Tardos J Efficient and local parallel random walks Proceedings of the 35th International Conference on Neural Information Processing Systems, (21375-21387)
- Cohen-Addad V, Lattanzi S, Norouzi-Fard A, Sohler C and Svensson O Parallel and efficient hierarchical k-median clustering Proceedings of the 35th International Conference on Neural Information Processing Systems, (20333-20345)
- Lu P, Yue Y, Yuan L and Zhang Y AutoFlow: Hotspot-Aware, Dynamic Load Balancing for Distributed Stream Processing Algorithms and Architectures for Parallel Processing, (133-151)
- Apishev M (2021). Effective Implementations of Topic Modeling Algorithms, Programming and Computing Software, 47:7, (483-492), Online publication date: 1-Dec-2021.
- Jamil H, Umer T, Ceken C and Al-Turjman F (2021). Decision Based Model for Real-Time IoT Analysis Using Big Data and Machine Learning, Wireless Personal Communications: An International Journal, 121:4, (2947-2959), Online publication date: 1-Dec-2021.
- Mountasser I, Ouhbi B, Hdioud F and Frikh B (2021). Semantic-based Big Data integration framework using scalable distributed ontology matching strategy, Distributed and Parallel Databases, 39:4, (891-937), Online publication date: 1-Dec-2021.
- Ballas I, Tsakanikas V, Pefanis E and Tampakas V On Exploring the Optimum Configuration of Apache Spark Framework in Heterogeneous Clusters Proceedings of the 25th Pan-Hellenic Conference on Informatics, (250-253)
- Ke Y, Huang J, Lin W and Jaysawal B (2020). Finding Possible Promoter Binding Sites in DNA Sequences by Sequential Patterns Mining With Specific Numbers of Gaps, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 18:6, (2459-2470), Online publication date: 1-Nov-2021.
- Laskar M, Huang J, Smetana V, Stewart C, Pouw K, An A, Chan S and Liu L (2021). Extending Isolation Forest for Anomaly Detection in Big Data via K-Means, ACM Transactions on Cyber-Physical Systems, 5:4, (1-26), Online publication date: 31-Oct-2021.
- Hu L, Zhao B, Yang S, Luo X and Zhou M Predicting Large-scale Protein-protein Interactions by Extracting Coevolutionary Patterns with MapReduce Paradigm 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC), (939-944)
- Ben HajKacem M, Ben N’cir C and Essoussi N (2021). A parallel text clustering method using Spark and hashing, Computing, 103:9, (2007-2031), Online publication date: 1-Sep-2021.
- Chen Y, Lin L, Li B, Wang Q and Zhang Q (2021).
Silhouette : Efficient Cloud Configuration Exploration for Large-Scale Analytics, IEEE Transactions on Parallel and Distributed Systems, 32:8, (2049-2061), Online publication date: 1-Aug-2021. - Dory M, Fischer O, Khoury S and Leitersdorf D Constant-Round Spanners and Shortest Paths in Congested Clique and MPC Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing, (223-233)
- Czumaj A, Davies P and Parter M Component Stability in Low-Space Massively Parallel Computation Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing, (481-491)
- Biswas A, Dory M, Ghaffari M, Mitrović S and Nazari Y Massively Parallel Algorithms for Distance Approximation and Spanners Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures, (118-128)
- Kaur K, Garg S, Kaddoum G and Kumar N (2021). Energy and SLA-driven MapReduce Job Scheduling Framework for Cloud-based Cyber-Physical Systems, ACM Transactions on Internet Technology, 21:2, (1-24), Online publication date: 23-Jun-2021.
- Wang S, Hindman B and Stoica I In reference to RPC Proceedings of the Workshop on Hot Topics in Operating Systems, (191-198)
- Stoica I and Shenker S From cloud computing to sky computing Proceedings of the Workshop on Hot Topics in Operating Systems, (26-32)
- Zou J, Das A, Barhate P, Iyengar A, Yuan B, Jankov D and Jermaine C (2021). Lachesis, Proceedings of the VLDB Endowment, 14:8, (1262-1275), Online publication date: 1-Apr-2021.
- Xiang Q, Yu H, Aspnes J, Le F, Guok C, Kong L and Yang Y (2021). Optimizing in the Dark: Learning Optimal Network Resource Reservation Through a Simple Request Interface, IEEE/ACM Transactions on Networking, 29:2, (571-584), Online publication date: 1-Apr-2021.
- Herodotou H, Chen Y and Lu J (2020). A Survey on Automatic Parameter Tuning for Big Data Processing Systems, ACM Computing Surveys, 53:2, (1-37), Online publication date: 31-Mar-2021.
- Bawankule K, Dewang R and Singh A Load Balancing Approach for a MapReduce Job Running on a Heterogeneous Hadoop Cluster Distributed Computing and Internet Technology, (289-298)
- Rajora M, Rathod M and Naik N Stroke Prediction Using Machine Learning in a Distributed Environment Distributed Computing and Internet Technology, (238-252)
- Nooraei Abadeh M and Mirzaie M (2021). DiffPageRank: an efficient differential PageRank approach in MapReduce, The Journal of Supercomputing, 77:1, (188-211), Online publication date: 1-Jan-2021.
- Khader M and Al-Naymat G (2020). Density-based Algorithms for Big Data Clustering Using MapReduce Framework, ACM Computing Surveys, 53:5, (1-38), Online publication date: 15-Oct-2020.
- Maleki N, Faragardi H, Rahmani A, Conti M and Lofstead J (2020). TMaR: a two-stage MapReduce scheduler for heterogeneous environments, Human-centric Computing and Information Sciences, 10:1, Online publication date: 7-Oct-2020.
- Ye Q, Wu C, Liu W, Hou A and Shen W Profiling-Based Big Data Workflow Optimization in a Cross-layer Coupled Design Framework Algorithms and Architectures for Parallel Processing, (197-217)
- Czumaj A, Davies P and Parter M Simple, Deterministic, Constant-Round Coloring in the Congested Clique Proceedings of the 39th Symposium on Principles of Distributed Computing, (309-318)
- Ghaffari M and Nowicki K Massively Parallel Algorithms for Minimum Cut Proceedings of the 39th Symposium on Principles of Distributed Computing, (119-128)
- da S. E. Tuy P and Nogueira Rios T Summarizer: Fuzzy Rule-Based Classification Systems for Vertical and Horizontal Big Data 2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), (1-8)
- Zeng R, Huang Z, Chen Y, Zhong J and Feng L Comparison of Different Computing Platforms for Implementing Parallel Genetic Programming 2020 IEEE Congress on Evolutionary Computation (CEC), (1-8)
- Czumaj A, Davies P and Parter M Graph Sparsification for Derandomizing Massively Parallel Computation with Low Space Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures, (175-185)
- Charikar M, Ma W and Tan L Unconditional Lower Bounds for Adaptive Massively Parallel Computation Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures, (141-151)
- Adoni H, Nahhal T, Krichen M, Aghezzaf B and Elbyed A (2019). A survey of current challenges in partitioning and processing of graph-structured data in parallel and distributed systems, Distributed and Parallel Databases, 38:2, (495-530), Online publication date: 1-Jun-2020.
- Pathak A, Pandey M and Rautaray S (2020). Approaches of enhancing interoperations among high performance computing and big data analytics via augmentation, Cluster Computing, 23:2, (953-988), Online publication date: 1-Jun-2020.
- Su W, Aurora A, Chen M and Zadok E Supporting Transactions for Bulk NFSv4 Compounds Proceedings of the 13th ACM International Systems and Storage Conference, (75-86)
- Dupont D, Barbosa J and Alves B (2019). CHSPAM: a multi-domain model for sequential pattern discovery and monitoring in contexts histories, Pattern Analysis & Applications, 23:2, (725-734), Online publication date: 1-May-2020.
- Oh H, Cho B, Kim C, Park H and Seo J AniFilter Proceedings of the Fifteenth European Conference on Computer Systems, (1-15)
- Li C, Wang S, Hoffmann H and Lu S Statically inferring performance properties of software configurations Proceedings of the Fifteenth European Conference on Computer Systems, (1-16)
- Mazaheri Soudani N, Fatemi A and Nematbakhsh M (2019). An investigation of big graph partitioning methods for distribution of graphs in vertex-centric systems, Distributed and Parallel Databases, 38:1, (1-29), Online publication date: 1-Mar-2020.
- Yang C, Chen S, Liu J, Liu R and Chang C (2019). On construction of an energy monitoring service using big data technology for the smart campus, Cluster Computing, 23:1, (265-288), Online publication date: 1-Mar-2020.
- Uta A, Custura A, Duplyakin D, Jimenez I, Rellermeyer J, Maltzahn C, Ricci R and Iosup A Is big data performance reproducible in modern cloud networks? Proceedings of the 17th Usenix Conference on Networked Systems Design and Implementation, (513-528)
- Vengadeswaran S and Balasundaram S CLUST Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, (1-9)
- Yeh T, Wang Y and Tu Y (2020). Maintaining data integrity in cloud systems through version management, International Journal of Ad Hoc and Ubiquitous Computing, 34:2, (63-73), Online publication date: 1-Jan-2020.
- Yeh T and Yu S Achieving Dynamic Resource Allocation in the Hadoop Cloud System Internet of Vehicles. Technologies and Services Toward Smart Cities, (267-283)
- Gao Z, Pansare N and Jermaine C (2019). Declarative Parameterizations of User-Defined Functions for Large-Scale Machine Learning and Optimization, IEEE Transactions on Knowledge and Data Engineering, 31:11, (2079-2092), Online publication date: 1-Nov-2019.
- Hu L, Yuan X, Liu X, Xiong S and Luo X (2019). Efficiently Detecting Protein Complexes from Protein Interaction Networks via Alternating Direction Method of Multipliers, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16:6, (1922-1935), Online publication date: 1-Nov-2019.
- Wang S, Liagouris J, Nishihara R, Moritz P, Misra U, Tumanov A and Stoica I Lineage stash Proceedings of the 27th ACM Symposium on Operating Systems Principles, (338-352)
- García‐Gil D, Luque‐Sánchez F, Luengo J, García S and Herrera F (2019). From Big to Smart Data, International Journal of Intelligent Systems, 34:12, (3260-3274), Online publication date: 18-Oct-2019.
- Bagui S, Mondal A and Bagui S (2020). Improving the Performance of kNN in the MapReduce Framework Using Locality Sensitive Hashing, International Journal of Distributed Systems and Technologies, 10:4, (1-16), Online publication date: 1-Oct-2019.
- Matsuba H, Matsuda M and Kawai M Pyne Workshop Proceedings of the 48th International Conference on Parallel Processing, (1-10)
- Moharrer A and Ioannidis S (2019). Distributing Frank---Wolfe via map-reduce, Knowledge and Information Systems, 60:2, (665-690), Online publication date: 1-Aug-2019.
- DeFever R, Hanger W, Sarupria S, Kilgannon J, Apon A and Ngo L Building A Scalable Forward Flux Sampling Framework using Big Data and HPC Practice and Experience in Advanced Research Computing 2019: Rise of the Machines (learning), (1-8)
- Behnezhad S, Brandt S, Derakhshan M, Fischer M, Hajiaghayi M, Karp R and Uitto J Massively Parallel Computation of Matching and MIS in Sparse Graphs Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, (481-490)
- Gamlath B, Kale S, Mitrovic S and Svensson O Weighted Matchings via Unweighted Augmentations Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, (491-500)
- Mrozek D, Suwała M and Małysiak-Mrozek B (2019). High-throughput and scalable protein function identification with Hadoop and Map-only pattern of the MapReduce processing model, Knowledge and Information Systems, 60:1, (145-178), Online publication date: 1-Jul-2019.
- Kang D, Patel V, Nair A, Blanas S, Wang Y and Parthasarathy S Henosis Proceedings of the ACM International Conference on Supercomputing, (392-402)
- Azar Y, Emek Y, van Stee R and Vainstein D The Price of Clustering in Bin-Packing with Applications to Bin-Packingwith Delays The 31st ACM Symposium on Parallelism in Algorithms and Architectures, (1-10)
- Gonzalez-Lopez J, Ventura S and Cano A ARFF Data Source Library for Distributed Single/Multiple Instance, Single/Multiple Output Learning on Apache Spark Computational Science – ICCS 2019, (173-179)
- Mondal A, Neogy S, Mukherjee N and Chattopadhyay S (2019). A survey of issues and solutions of health data management systems, Innovations in Systems and Software Engineering, 15:2, (155-166), Online publication date: 1-Jun-2019.
- Ben Hajkacem M, N'cir C and Essoussi N (2019). One-pass MapReduce-based clustering method for mixed large scale data, Journal of Intelligent Information Systems, 52:3, (619-636), Online publication date: 1-Jun-2019.
- Kathiravelu P, Sharma A, Galhardas H, Van Roy P and Veiga L (2019). On-demand big data integration, Distributed and Parallel Databases, 37:2, (273-295), Online publication date: 1-Jun-2019.
- Wang S and Sun Y The Design of Word Cloud Rendering Platform and Its Application on Measuring Systematic Financial Risks Proceedings of the 2019 International Conference on Data Mining and Machine Learning, (132-135)
- Al-Badarneh A Join Algorithms under Apache Spark Proceedings of the 2019 5th International Conference on Computer and Technology Applications, (56-62)
- Devasia J, Chandran P, Soman A, Mathew A and Jharwal J Graph sparsification with parallelization to optimize the identification of causal genes and dysregulated pathways Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, (747-753)
- Roy A, Qureshi S, Pande K, Nair D, Gairola K, Jain P, Singh S, Sharma K, Jagadale A, Lin Y, Sharma S, Gotety R, Zhang Y, Tang J, Mehta T, Sindhanuru H, Okafor N, Das S, Gopal C, Rudraraju S and Kakarlapudi A (2019). Performance Comparison of Machine Learning Platforms, INFORMS Journal on Computing, 31:2, (207-225), Online publication date: 1-Apr-2019.
- Lee G and Fortes J (2019). Improving Data-Analytics Performance Via Autonomic Control of Concurrency and Resource Units, ACM Transactions on Autonomous and Adaptive Systems, 13:3, (1-25), Online publication date: 28-Mar-2019.
- Li F, Waddington D and Song F Userland CO-PAGER Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications, (78-83)
- Zhang Y, Liu H, Chen T and Tang D A Distributed PCM Clustering Algorithm Based on Spark Proceedings of the 2019 11th International Conference on Machine Learning and Computing, (70-74)
- Zou J, Iyengar A and Jermaine C (2019). Pangea, Proceedings of the VLDB Endowment, 12:6, (681-694), Online publication date: 1-Feb-2019.
- Xiang Q, Yu H, Aspnes J, Le F, Kong L and Yang Y Optimizing in the dark Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, (1674-1681)
- Addisie A and Bertacco V Collaborative accelerators for in-memory MapReduce on scale-up machines Proceedings of the 24th Asia and South Pacific Design Automation Conference, (747-753)
- Ghaffari M and Uitto J Sparsifying distributed algorithms with ramifications in massively parallel computation and centralized local computation Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, (1636-1653)
- (2019). Virtual cluster optimisation for MapReduce-like applications, International Journal of High Performance Computing and Networking, 13:4, (378-388), Online publication date: 1-Jan-2019.
- Sakurai S (2019). Discovery of Characteristic Sequential Patterns Based on Two Types of Constraints, International Journal of Extreme Automation and Connectivity in Healthcare, 1:1, (40-54), Online publication date: 1-Jan-2019.
- Matsuno T, Chatterjee B, Kitsuwan N, Oki E, Veeraraghavan M, Okamoto S and Yamanaka N (2019). Designing a Hadoop system based on computational resources and network delay for wide area networks, Telecommunications Systems, 70:1, (13-25), Online publication date: 1-Jan-2019.
- Casturi R and Sunderraman R Distributed Financial Calculation Framework on Cloud Computing Environment Big Data Analytics, (73-88)
- Giannakopoulos I, Konstantinou I, Tsoumakos D and Koziris N (2018). Cloud application deployment with transient failure recovery, Journal of Cloud Computing: Advances, Systems and Applications, 7:1, (1-20), Online publication date: 1-Dec-2018.
- Yousefi M and Goudarzi M (2018). A Task-Based Greedy Scheduling Algorithm for Minimizing Energy of MapReduce Jobs, Journal of Grid Computing, 16:4, (535-551), Online publication date: 1-Dec-2018.
- Makatun D, Lauret J and Rudová H (2018). Planning of distributed data production for High Energy and Nuclear Physics, Cluster Computing, 21:4, (1949-1965), Online publication date: 1-Dec-2018.
- Almasi M and Saniee Abadeh M (2018). A new MapReduce associative classifier based on a new storage format for large-scale imbalanced data, Cluster Computing, 21:4, (1821-1847), Online publication date: 1-Dec-2018.
- Yu Z, Bei Z and Qian X (2018). Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster Computing, ACM SIGPLAN Notices, 53:2, (564-577), Online publication date: 30-Nov-2018.
- Fierro G, Pritoni M, AbdelBaky M, Raftery P, Peffer T, Thomson G and Culler D Mortar Proceedings of the 5th Conference on Systems for Built Environments, (172-181)
- Sánchez-Rada J, Pascual A, Conde E and Iglesias C A Big Linked Data Toolkit for Social Media Analysis and Visualization Based on W3C Web Components On the Move to Meaningful Internet Systems. OTM 2018 Conferences, (498-515)
- Manousakis I, Goiri Í, Bianchini R, Rigo S and Nguyen T Uncertainty Propagation in Data Processing Systems Proceedings of the ACM Symposium on Cloud Computing, (95-106)
- Moritz P, Nishihara R, Wang S, Tumanov A, Liaw R, Liang E, Elibol M, Yang Z, Paul W, Jordan M and Stoica I Ray Proceedings of the 13th USENIX conference on Operating Systems Design and Implementation, (561-577)
- Chávez F, Fernández de Vega F, Lanza D, Benavides C, Villegas J, Trujillo L, Olague G and Román G (2018). Deploying massive runs of evolutionary algorithms with ECJ and Hadoop, International Journal of High Performance Computing Applications, 32:5, (706-720), Online publication date: 1-Sep-2018.
- Li S and Wang B (2018). Hybrid Parrallel Bayesian Network Structure Learning from Massive Data Using MapReduce, Journal of Signal Processing Systems, 90:8-9, (1115-1121), Online publication date: 1-Sep-2018.
- Velthuis P, Schäfer M and Steinebach M New authentication concept using certificates for big data analytic tools Proceedings of the 13th International Conference on Availability, Reliability and Security, (1-7)
- Santos W, Avelar G, Ribeiro M, Guedes D and Meira W (2018). Scalable and efficient data analytics and mining with lemonade, Proceedings of the VLDB Endowment, 11:12, (2070-2073), Online publication date: 1-Aug-2018.
- Liu Q, Ma L, Fan S, Abbod M, Lu C, Lin T, Jen K, Wu S and Shieh J (2018). Design and Evaluation of a Real Time Physiological Signals Acquisition System Implemented in Multi-Operating Rooms for Anesthesia, Journal of Medical Systems, 42:8, (1-19), Online publication date: 1-Aug-2018.
- Ghaffari M, Gouleakis T, Konrad C, Mitrović S and Rubinfeld R Improved Massively Parallel Computation Algorithms for MIS, Matching, and Vertex Cover Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing, (129-138)
- Wang Y, Huang R and Xu W Authentication with User Driven Web Application for Accessing Remote Resources Proceedings of the Practice and Experience on Advanced Research Computing: Seamless Creativity, (1-7)
- Beckert B, Bingmann T, Kiefer M, Sanders P, Ulbrich M and Weigl A Relational Equivalence Proofs Between Imperative and MapReduce Algorithms Verified Software. Theories, Tools, and Experiments, (248-266)
- Song H, Chen G and Yang W (2018). An Image Classification Algorithm and its Parallel Implementation Based on ANL-RBM, Journal of Information Technology Research, 11:3, (29-46), Online publication date: 1-Jul-2018.
- Balasundaram S and Vengadeswaran S (2018). An Optimal Data Placement Strategy for Improving System Performance of Massive Data Applications Using Graph Clustering, International Journal of Ambient Computing and Intelligence, 9:3, (15-30), Online publication date: 1-Jul-2018.
- Fier F, Augsten N, Bouros P, Leser U and Freytag J (2018). Set similarity joins on mapreduce, Proceedings of the VLDB Endowment, 11:10, (1110-1122), Online publication date: 1-Jun-2018.
- Ayub M and Siddiqui J Efficiently finding minimal failing input in MapReduce programs Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings, (177-178)
- Pandove D, Goel S and Rani R (2018). Systematic Review of Clustering High-Dimensional and Large Datasets, ACM Transactions on Knowledge Discovery from Data, 12:2, (1-68), Online publication date: 30-Apr-2018.
- Uta A and Obaseki H A Performance Study of Big Data Workloads in Cloud Datacenters with Network Variability Companion of the 2018 ACM/SPEC International Conference on Performance Engineering, (113-118)
- Geetha J. , Uday Bhaskar N and Chenna Reddy P. (2018). An Analytical Approach for Optimizing the Performance of Hadoop Map Reduce Over RoCE, International Journal of Information Communication Technologies and Human Development, 10:2, (1-14), Online publication date: 1-Apr-2018.
- Sebaa A, Chikh F, Nouicer A and Tari A (2018). Medical Big Data Warehouse, Journal of Medical Systems, 42:4, (1-16), Online publication date: 1-Apr-2018.
- Gomes H, Barddal J, Enembreck F and Bifet A (2017). A Survey on Ensemble Learning for Data Stream Classification, ACM Computing Surveys, 50:2, (1-36), Online publication date: 31-Mar-2018.
- Yu Z, Bei Z and Qian X Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster Computing Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, (564-577)
- Gupta M, Patwa F and Sandhu R An Attribute-Based Access Control Model for Secure Big Data Processing in Hadoop Ecosystem Proceedings of the Third ACM Workshop on Attribute-Based Access Control, (13-24)
- Segatori A, Marcelloni F and Pedrycz W (2018). On Distributed Fuzzy Decision Trees for Big Data, IEEE Transactions on Fuzzy Systems, 26:1, (174-192), Online publication date: 1-Feb-2018.
- Hosseini B and Kiani K (2018). FWCMR, Expert Systems with Applications: An International Journal, 91:C, (198-210), Online publication date: 1-Jan-2018.
- Verma C, Pandey R and Katiyar D (2018). Performance Evaluating System Based on MapReduce in Context of Educational Big Data, International Journal of Organizational and Collective Intelligence, 8:1, (1-12), Online publication date: 1-Jan-2018.
- Ilyasova N, Kupriyanov A, Paringer R and Kirsh D (2018). Particular Use of BIG DATA in Medical Diagnostic Tasks, Pattern Recognition and Image Analysis, 28:1, (114-121), Online publication date: 1-Jan-2018.
- Carvalho O, Roloff E and Navaux P A Distributed Stream Processing based Architecture for IoT Smart Grids Monitoring Companion Proceedings of the10th International Conference on Utility and Cloud Computing, (9-14)
- Shwe T and Aritsugi M Proactive Re-replication Strategy in HDFS based Cloud Data Center Proceedings of the10th International Conference on Utility and Cloud Computing, (121-130)
- Bateni M, Behnezhad S, Derakhshan M, Hajiaghayi M, Kiveris R, Lattanzi S and Mirrokni V Affinity clustering Proceedings of the 31st International Conference on Neural Information Processing Systems, (6867-6877)
- Hirchoua B, Ouhbi B and Frikh B A new knowledge capitalization framework in big data context Proceedings of the 19th International Conference on Information Integration and Web-based Applications & Services, (40-48)
- Wang Y, Li C, Li M and Liu Z (2017). HBase storage schemas for massive spatial vector data, Cluster Computing, 20:4, (3657-3666), Online publication date: 1-Dec-2017.
- Boukhris I, Elouedi Z and Ajabi M (2017). Toward intrusion detection using belief decision trees for big data, Knowledge and Information Systems, 53:3, (671-698), Online publication date: 1-Dec-2017.
- Grover A, Arya D and Venkataraman G Latency Reduction via Decision Tree Based Query Construction Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, (1399-1407)
- Li Z and Shen H (2017). Measuring Scale-Up and Scale-Out Hadoop with Remote and Local File Systems and Selecting the Best Platform, IEEE Transactions on Parallel and Distributed Systems, 28:11, (3201-3214), Online publication date: 1-Nov-2017.
- Peralta D, Garca S, Benitez J and Herrera F (2017). Minutiae-based fingerprint matching decomposition, Information Sciences: an International Journal, 408:C, (198-212), Online publication date: 1-Oct-2017.
- Jin C, Chen J and Liu H (2017). MapReduce-based entity matching with multiple blocking functions, Frontiers of Computer Science: Selected Publications from Chinese Universities, 11:5, (895-911), Online publication date: 1-Oct-2017.
- Calimeri F, Caracciolo M, Marzullo A and Stamile C BioHIPI: Biomedical Hadoop Image Processing Interface Machine Learning, Optimization, and Big Data, (540-548)
- Vega C, Roquero P, Leira R, Gonzalez I and Aracil J (2017). Loginson, The Journal of Supercomputing, 73:9, (3879-3900), Online publication date: 1-Sep-2017.
- Erdil D (2017). Self-organized dynamic provisioning for big data, Cluster Computing, 20:3, (2749-2762), Online publication date: 1-Sep-2017.
- Bordogna G, Ciriello D and Psaila G A flexible framework to cross-analyze heterogeneous multi-source geo-referenced information Proceedings of the International Conference on Web Intelligence, (499-508)
- Oliveira G, Coutinho F, Campello R and Naldi M (2017). Improving k-means through distributed scalable metaheuristics, Neurocomputing, 246:C, (45-57), Online publication date: 12-Jul-2017.
- Rodriguez-Mier P, Mucientes M and Bugarín A Scalable modeling of thermal dynamics in buildings using fuzzy rules for regression 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), (1-6)
- Totoni E, Anderson T and Shpeisman T HPAT Proceedings of the International Conference on Supercomputing, (1-10)
- Gupta M, Patwa F, Benson J and Sandhu R Multi-Layer Authorization Framework for a Representative Hadoop Ecosystem Deployment Proceedings of the 22nd ACM on Symposium on Access Control Models and Technologies, (183-190)
- Mattmann C and Sharan M Scalable Hadoop-Based Pooled Time Series of Big Video Data from the Deep Web Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, (117-120)
- Matthews S (2017). Using Phoenix++ MapReduce to introduce undergraduate students to parallel computing, Journal of Computing Sciences in Colleges, 32:6, (165-174), Online publication date: 1-Jun-2017.
- Lee J, Chung J, Ahn J and Choi K (2017). Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 25:6, (1793-1806), Online publication date: 1-Jun-2017.
- Kabáăź M, Consel C and Volanschi N (2017). Designing parallel data processing for enabling large-scale sensor applications, Personal and Ubiquitous Computing, 21:3, (457-473), Online publication date: 1-Jun-2017.
- Zhang Y, Zhang T, Jia Y, Sun J, Xu F and Xu W DataLab Proceedings of the 39th International Conference on Software Engineering: Software Engineering and Education Track, (47-56)
- Chowdhary V and Greenwood S EMT Proceedings of the 1st Workshop on Data Management for End-to-End Machine Learning, (1-9)
- Santos W, Carvalho L, de P. Avelar G, Silva Á, Ponce L, Guedes D and Meira W Lemonade Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, (745-748)
- Anderson J, Gropp C, Ngo L and Apon A Random access in nondelimited variable-length record collections for parallel reading with Hadoop 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), (965-970)
- Hamraz H, Contreras M and Zhang J (2017). A scalable approach for tree segmentation within small-footprint airborne LiDAR data, Computers & Geosciences, 102:C, (139-147), Online publication date: 1-May-2017.
- Silva J, Silva D, Marques E, Lopes L and Silva F P3-Mobile Proceedings of the 4th Workshop on CrossCloud Infrastructures & Platforms, (1-7)
- Voinov N, Drobintsev P, Kotlyarov V and Nikiforov I Distributed OAIS-Based Digital Preservation System with HDFS Technology Proceedings of the 20th Conference of Open Innovations Association FRUCT, (491-497)
- Zhao J, Zhang F, Tu L, Xu C, Shen D, Tian C, Li X and Li Z (2017). Estimation of Passenger Route Choice Pattern Using Smart Card Data for Complex Metro Systems, IEEE Transactions on Intelligent Transportation Systems, 18:4, (790-801), Online publication date: 1-Apr-2017.
- Sebaa A, Nouicer A, Chikh F and Tari A Big Data Technologies to Improve Medical Data Warehousing Proceedings of the 2nd international Conference on Big Data, Cloud and Applications, (1-5)
- Devasia J, Chandran P, Shreya G, R. A and R. A On parallelizing graph theoretical approaches for identifying causal genes and pathways from very large biological networks Proceedings of the Second International Conference on Internet of things, Data and Cloud Computing, (1-6)
- Cheng D, Rao J, Guo Y, Jiang C and Zhou X (2017). Improving Performance of Heterogeneous MapReduce Clusters with Adaptive Task Tuning, IEEE Transactions on Parallel and Distributed Systems, 28:3, (774-786), Online publication date: 1-Mar-2017.
- Lulli A, Carlini E, Dazzi P, Lucchese C and Ricci L (2017). Fast Connected Components Computation in Large Graphs by Vertex Pruning, IEEE Transactions on Parallel and Distributed Systems, 28:3, (760-773), Online publication date: 1-Mar-2017.
- Mera D, Batko M and Zezula P (2017). Speeding up the multimedia feature extraction, Multimedia Tools and Applications, 76:5, (7497-7517), Online publication date: 1-Mar-2017.
- Forkan A, Khalil I and Atiquzzaman M (2017). ViSiBiD, Computer Networks: The International Journal of Computer and Telecommunications Networking, 113:C, (244-257), Online publication date: 11-Feb-2017.
- Pulgar-Rubio F, Rivera-Rivas A, Pérez-Godoy M, González P, Carmona C and del Jesus M (2017). MEFASD-BD, Knowledge-Based Systems, 117:C, (70-78), Online publication date: 1-Feb-2017.
- Arias J, Gamez J and Puerta J (2017). Learning distributed discrete Bayesian Network Classifiers under MapReduce with Apache Spark, Knowledge-Based Systems, 117:C, (16-26), Online publication date: 1-Feb-2017.
- Maillo J, Ramírez S, Triguero I and Herrera F (2017). kNN-IS, Knowledge-Based Systems, 117:C, (3-15), Online publication date: 1-Feb-2017.
- El Aboudi N and Benhlima L (2017). Parallel and Distributed Population based Feature Selection Framework for Health Monitoring, International Journal of Cloud Applications and Computing, 7:1, (57-71), Online publication date: 1-Jan-2017.
- Salah S, Akbarinia R and Masseglia F (2017). A highly scalable parallel algorithm for maximally informative k-itemset mining, Knowledge and Information Systems, 50:1, (1-26), Online publication date: 1-Jan-2017.
- Warren M, Skillman S, Chartrand R, Kelton T, Keisler R, Raleigh D and Turk M Data-intensive supercomputing in the cloud Proceedings of the 7th International Workshop on Data-Intensive Computing in the Cloud, (24-31)
- Seref B and Bostanci E Opportunities, Threats and Future Directions in Big Data for Medical Wearables Proceedings of the International Conference on Big Data and Advanced Wireless Technologies, (1-5)
- Ma Y, Zhang Y, Sheng Z, Ruan H, Wang J and Sun Y (2016). CGMP, Multimedia Tools and Applications, 75:21, (13317-13332), Online publication date: 1-Nov-2016.
- Nath A, Fox K, Agarwal P and Munagala K Massively parallel algorithms for computing TIN DEMs and contour trees for large terrains Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, (1-10)
- Desai A and Chaudhary S Distributed Decision Tree Proceedings of the 9th Annual ACM India Conference, (43-50)
- Rodríguez-Fdez I, Mucientes M and Bugarín A (2016). S-FRULER, Knowledge-Based Systems, 110:C, (255-266), Online publication date: 15-Oct-2016.
- Baer A, Casas P, D'Alconzo A, Fiadino P, Golab L, Mellia M and Schikuta E (2016). DBStream, Computer Networks: The International Journal of Computer and Telecommunications Networking, 107:P1, (5-19), Online publication date: 9-Oct-2016.
- Ríos L and Diéguez J (2016). A Big Data Test-bed for Analyzing Data Generated by an Air Pollution Sensor Network, International Journal of Web Services Research, 13:4, (19-35), Online publication date: 1-Oct-2016.
- Portela F, Lima L and Santos M (2016). Why Big Data? Towards a Project Assessment Framework, Procedia Computer Science, 98:C, (604-609), Online publication date: 1-Oct-2016.
- Gadiraju K, Verma M, Davis K and Talaga P (2016). Benchmarking performance for migrating a relational application to a parallel implementation, Future Generation Computer Systems, 63:C, (148-156), Online publication date: 1-Oct-2016.
- Liu Z, Zhang Q, Boutaba R, Liu Y and Wang B (2016). OPTIMA, Journal of Network and Systems Management, 24:4, (859-883), Online publication date: 1-Oct-2016.
- Song M, Hu Y, Xu Y, Li C, Chen H, Yuan J and Li T Bridging the Semantic Gaps of GPU Acceleration for Scale-out CNN-based Big Data Processing Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, (315-326)
- Kathiravelu P and Sharma A A Dynamic Data Warehousing Platform for Creating and Accessing Biomedical Data Lakes Proceedings of the Second International Workshop on Data Management and Analytics for Medicine and Healthcare - Volume 10186, (101-120)
- Bajaber F, Elshawi R, Batarfi O, Altalhi A, Barnawi A and Sakr S (2016). Big Data 2.0 Processing Systems, Journal of Grid Computing, 14:3, (379-405), Online publication date: 1-Sep-2016.
- Li P, Xu M, Wu J and Shang L Using canonical correlation analysis for parallelized attribute reduction Proceedings of the 14th Pacific Rim International Conference on Trends in Artificial Intelligence, (433-445)
- He H, Pang S and Zhao Z (2016). Dynamic Scalable Stochastic Petri Net, Scientific Programming, 2016, (10), Online publication date: 1-Aug-2016.
- Sousa M and Dillig I (2016). Cartesian hoare logic for verifying k-safety properties, ACM SIGPLAN Notices, 51:6, (57-69), Online publication date: 1-Aug-2016.
- Bilal M, Oyedele L, Qadir J, Munir K, Ajayi S, Akinade O, Owolabi H, Alaka H and Pasha M (2016). Big Data in the construction industry, Advanced Engineering Informatics, 30:3, (500-521), Online publication date: 1-Aug-2016.
- Wang J (2016). Extracting significant pattern histories from timestamped texts using MapReduce, The Journal of Supercomputing, 72:8, (3236-3260), Online publication date: 1-Aug-2016.
- Guzun G, Canahuate G and Chiu D A Two-Phase MapReduce Algorithm for Scalable Preference Queries over High-Dimensional Data Proceedings of the 20th International Database Engineering & Applications Symposium, (43-52)
- Agarwal P, Fox K, Munagala K and Nath A Parallel Algorithms for Constructing Range and Nearest-Neighbor Searching Data Structures Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, (429-440)
- Sousa M and Dillig I Cartesian hoare logic for verifying k-safety properties Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, (57-69)
- Yuan W, Deng P, Taleb T, Wan J and Bi C (2016). An Unlicensed Taxi Identification Model Based on Big Data Analysis, IEEE Transactions on Intelligent Transportation Systems, 17:6, (1703-1713), Online publication date: 1-Jun-2016.
- Tang Z, Liu M, Ammar A, Li K and Li K (2016). An optimized MapReduce workflow scheduling algorithm for heterogeneous computing, The Journal of Supercomputing, 72:6, (2059-2079), Online publication date: 1-Jun-2016.
- Matsuzaki K and Miyazaki R (2016). Parallel Tree Accumulations on MapReduce, International Journal of Parallel Programming, 44:3, (466-485), Online publication date: 1-Jun-2016.
- Paiva E and Revoredo K Big Data and Transparency: Using MapReduce functions to increase Public Expenditure transparency Proceedings of the XII Brazilian Symposium on Information Systems on Brazilian Symposium on Information Systems: Information Systems in the Cloud Computing Era - Volume 1, (25-32)
- Dao T and Chiba S HPC-reuse Proceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, (342-345)
- Guo Y, Varbanescu A, Epema D and Iosup A Design and experimental evaluation of distributed heterogeneous graph-processing systems Proceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, (203-212)
- Zhang Y, Xu F, Frise E, Wu S, Yu B and Xu W DataLab Proceedings of the 2nd International Workshop on BIG Data Software Engineering, (12-18)
- Asad Z, Rehman Chaudhry M and Malone D (2016). Greener Data Exchange in the Cloud: A Coding-Based Optimization for Big Data Processing, IEEE Journal on Selected Areas in Communications, 34:5, (1360-1377), Online publication date: 1-May-2016.
- Honarvar A and Sami A (2016). Extracting Usage Patterns from Power Usage Data of Homes' Appliances in Smart Home using Big Data Platform, International Journal of Information Technology and Web Engineering, 11:2, (39-50), Online publication date: 1-Apr-2016.
- Memishi B, Pérez M and Antoniu G (2016). Feedback-Based Resource Allocation in MapReduce-Based Systems, Scientific Programming, 2016, Online publication date: 1-Apr-2016.
- Kumar M, Rath N and Rath S (2016). Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier, Journal of Biomedical Informatics, 60:C, (395-409), Online publication date: 1-Apr-2016.
- Marszałkowski J, Drozdowski M and Marszałkowski J (2016). Time and Energy Performance of Parallel Systems with Hierarchical Memory, Journal of Grid Computing, 14:1, (153-170), Online publication date: 1-Mar-2016.
- Dan O, Parikh V and Davison B Improving IP Geolocation using Query Logs Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, (347-356)
- Huang S, Wang B, Qiu J, Yao J, Wang G and Yu G (2016). Parallel ensemble of online sequential extreme learning machine based on MapReduce, Neurocomputing, 174:PA, (352-367), Online publication date: 22-Jan-2016.
- Dayarathna M, Wen Y and Fan R (2016). Data Center Energy Consumption Modeling: A Survey, IEEE Communications Surveys & Tutorials, 18:1, (732-794), Online publication date: 1-Jan-2016.
- Nodarakis N, Pitoura E, Sioutas S, Tsakalidis A, Tsoumakos D and Tzimas G kdANN+ Special Issue on Database- and Expert-Systems Applications on Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIV - Volume 9510, (139-168)
- Hsu C, Slagter K and Chung Y (2015). Locality and loading aware virtual machine mapping techniques for optimizing communications in MapReduce applications, Future Generation Computer Systems, 53:C, (43-54), Online publication date: 1-Dec-2015.
- Xiong R, Luo J and Dong F (2015). Optimizing data placement in heterogeneous Hadoop clusters, Cluster Computing, 18:4, (1465-1480), Online publication date: 1-Dec-2015.
- Phan T, D'Orazio L and Rigaux P A Theoretical and Experimental Comparison of Filter-Based Equijoins in MapReduce Transactions on Large-Scale Data- and Knowledge-Centered Systems XXV - Volume 9620, (33-70)
- Gencer A, Bindel D, Sirer E and van Renesse R Configuring Distributed Computations Using Response Surfaces Proceedings of the 16th Annual Middleware Conference, (235-246)
- Qi L, Zhang H and Schneider M Design and representation of complex objects in database systems Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, (1-4)
- Hamdaqa M, Sabri M, Singh A and Tahvildari L Adoop Proceedings of the 25th Annual International Conference on Computer Science and Software Engineering, (26-34)
- Sundaravarathan K, Bhat A and Martin P A study of three MapReduce frameworks Proceedings of the 25th Annual International Conference on Computer Science and Software Engineering, (16-25)
- Kumar M and Kumar Rath S (2015). Classification of microarray using MapReduce based proximal support vector machine classifier, Knowledge-Based Systems, 89:C, (584-602), Online publication date: 1-Nov-2015.
- Wang W, Zhao W, Cai C, Huang J, Xu X and Li L (2015). An efficient image aesthetic analysis system using Hadoop, Image Communication, 39:PC, (499-508), Online publication date: 1-Nov-2015.
- Sharma B and Suryanarayana G Towards a catalog of performance smells for parallel computing Proceedings of the 22nd Conference on Pattern Languages of Programs, (1-9)
- Shang H, Zhao X, Kiran U and Kitsuregawa M Towards Scale-out Capability on Social Graphs Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, (253-262)
- Doulkeridis C, Vlachou A, Nikitopoulos P, Tampakis P and Saouk M The RoadRunner framework for efficient and scalable processing of big data Proceedings of the 19th Panhellenic Conference on Informatics, (215-220)
- Triguero I, del Río S, López V, Bacardit J, Benítez J and Herrera F (2015). ROSEFW-RF, Knowledge-Based Systems, 87:C, (69-79), Online publication date: 1-Oct-2015.
- Nodarakis N, Sioutas S, Gerolymatos P, Tsakalidis A and Tzimas G Convex Polygon Planar Range Queries on the Cloud Revised Selected Papers of the First International Workshop on Algorithmic Aspects of Cloud Computing - Volume 9511, (114-125)
- Kendea M, Gkantouna V, Rapti A, Sioutas S, Tzimas G and Tsolis D Graph DBs vs. Column-Oriented Stores Revised Selected Papers of the First International Workshop on Algorithmic Aspects of Cloud Computing - Volume 9511, (62-74)
- Han Hu , Yonggang Wen , Yue Gao , Tat-Seng Chua and Xuelong Li (2015). Toward an SDN-enabled big data platform for social TV analytics, IEEE Network: The Magazine of Global Internetworking, 29:5, (43-49), Online publication date: 1-Sep-2015.
- Van L and Takasu A An Efficient Distributed Index for Geospatial Databases Proceedings, Part I, of the 26th International Conference on Database and Expert Systems Applications - Volume 9261, (28-42)
- Hamid A and Tran Q (2015). Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology, 10.5555/2876104, Online publication date: 21-Aug-2015.
- Shen J, Geyik S and Dasdan A Effective Audience Extension in Online Advertising Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2099-2108)
- Xing E, Ho Q, Dai W, Kim J, Wei J, Lee S, Zheng X, Xie P, Kumar A and Yu Y Petuum Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (1335-1344)
- Ekbia H, Mattioli M, Kouper I, Arave G, Ghazinejad A, Bowman T, Suri V, Tsou A, Weingart S and Sugimoto C (2015). Big data, bigger dilemmas, Journal of the Association for Information Science and Technology, 66:8, (1523-1545), Online publication date: 1-Aug-2015.
- Tolk A The next generation of modeling & simulation Proceedings of the Conference on Summer Computer Simulation, (1-8)
- Salah S, Akbarinia R and Masseglia F Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce Proceedings of the 11th International Conference on Machine Learning and Data Mining in Pattern Recognition - Volume 9166, (217-231)
- Jiang Y, Yang J, Tang L, Liu Y, Zhao X and Hao X A Distributed Data Mining System Framework for Mobile Internet Access Log Based on Hadoop Transactions on Edutainment XI - Volume 8971, (243-252)
- Reed D and Dongarra J (2015). Exascale computing and big data, Communications of the ACM, 58:7, (56-68), Online publication date: 25-Jun-2015.
- Agrawal B, Wiktorski T and Rong C Analyzing and Predicting Failure in Hadoop Clusters Using Distributed Hidden Markov Model Revised Selected Papers of the Second International Conference on Cloud Computing and Big Data - Volume 9106, (232-246)
- Zuo C, Liao Q, Gu T, Li T and Yang Y Node Capability Modeling for Reduce Phase's Scheduling in MapReduce Environment Revised Selected Papers of the Second International Conference on Cloud Computing and Big Data - Volume 9106, (217-231)
- Schuler R, Kesselman C and Czajkowski K Data Centric Discovery with a Data-Oriented Architecture Proceedings of the 1st Workshop on The Science of Cyberinfrastructure: Research, Experience, Applications and Models, (37-44)
- Zhang J, Li X, Huo Y and Li S Research on Multiple Files Input Programming Method Based on MapReduce Revised Selected Papers, Part II, of the 5th International Conference on Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques - Volume 9243, (336-342)
- Miguel J, Caballé S, Xhafa F and Prieto J (2015). A massive data processing approach for effective trustworthiness in online learning groups, Concurrency and Computation: Practice & Experience, 27:8, (1988-2003), Online publication date: 10-Jun-2015.
- Jianzhong Huang , Xianhai Liang , Xiao Qin , Ping Xie and Changsheng Xie (2015). Scale-RS: An Efficient Scaling Scheme for RS-Coded Storage Clusters, IEEE Transactions on Parallel and Distributed Systems, 26:6, (1704-1717), Online publication date: 1-Jun-2015.
- Lee J and Kang M (2015). Geospatial Big Data, Big Data Research, 2:2, (74-81), Online publication date: 1-Jun-2015.
- Donnelly P, Hazekamp N and Thain D Confuga Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, (392-401)
- Guo Y, Varbanescu A, Iosup A and Epema D An empirical performance evaluation of GPU-enabled graph-processing systems Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, (423-432)
- Gunther N, Puglia P and Tomasette K (2015). Hadoop Superlinear Scalability, Queue, 13:5, (20-42), Online publication date: 1-May-2015.
- Bux M and Leser U (2015). DynamicCloudSim, Future Generation Computer Systems, 46:C, (85-99), Online publication date: 1-May-2015.
- Ferrucci F, Salza P, Kechadi M and Sarro F A parallel genetic algorithms framework based on Hadoop MapReduce Proceedings of the 30th Annual ACM Symposium on Applied Computing, (1664-1667)
- Gunther N, Puglia P and Tomasette K (2015). Hadoop superlinear scalability, Communications of the ACM, 58:4, (46-55), Online publication date: 23-Mar-2015.
- Jiang H, Chen Y, Qiao Z, Weng T and Li K (2015). Scaling up MapReduce-based Big Data Processing on Multi-GPU systems, Cluster Computing, 18:1, (369-383), Online publication date: 1-Mar-2015.
- Rehman M, Boles J, Hammoud M and Sakr M A Cloud Computing Course Proceedings of the 46th ACM Technical Symposium on Computer Science Education, (338-343)
- Triguero I, Peralta D, Bacardit J, García S and Herrera F (2015). MRPR, Neurocomputing, 150:PA, (331-345), Online publication date: 20-Feb-2015.
- Wu Y, Zheng L, Heilig B and Gao G Design and evaluation of a novel dataflow based bigdata solution Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, (40-48)
- López V, del Río S, Benítez J and Herrera F (2015). Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data, Fuzzy Sets and Systems, 258:C, (5-38), Online publication date: 1-Jan-2015.
- Shahrivari S and Jalili S (2015). Fast parallel all-subgraph enumeration using multicore machines, Scientific Programming, 2015, (6-6), Online publication date: 1-Jan-2015.
- Costantini L and Nicolussi R (2015). Performances evaluation of a novel Hadoop and Spark based system of image retrieval for huge collections, Advances in Multimedia, 2015, (11-11), Online publication date: 1-Jan-2015.
- Cai L, Guan X, Chi P, Chen L and Luo J (2015). Big data visualization collaborative filtering algorithm based on RHadoop, International Journal of Distributed Sensor Networks, 2015, (3-3), Online publication date: 1-Jan-2015.
- Bruno R and Ferreira P SCADAMAR Proceedings of the 2nd International Workshop on CrossCloud Systems, (1-6)
- Cheng D, Rao J, Guo Y and Zhou X Improving MapReduce performance in heterogeneous environments with adaptive task tuning Proceedings of the 15th International Middleware Conference, (97-108)
- Ko B, Lee J and Jo H Toward Enhancing Block I/O Performance for Virtualized Hadoop Cluster Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, (481-482)
- del Río S, López V, Benítez J and Herrera F (2014). On the use of MapReduce for imbalanced big data using Random Forest, Information Sciences: an International Journal, 285:C, (112-137), Online publication date: 20-Nov-2014.
- Ru J, Grundy J and Keung J Software engineering for multi-tenancy computing challenges and implications Proceedings of the International Workshop on Innovative Software Development Methodologies and Practices, (1-10)
- Dayarathna M and Suzumura T Towards scalable distributed graph database engine for hybrid clouds Proceedings of the 5th International Workshop on Data-Intensive Computing in the Clouds, (1-8)
- Pippal S, Singh S and Kushwaha D Data Trasfer From MySQL To Hadoop Proceedings of the 2014 International Conference on Information and Communication Technology for Competitive Strategies, (1-5)
- (2014). A comprehensive view of Hadoop research-A systematic literature review, Journal of Network and Computer Applications, 46:C, (1-25), Online publication date: 1-Nov-2014.
- Lahmer I and Zhang N MapReduce Proceedings of the 7th International Conference on Security of Information and Networks, (392-398)
- Yang C, Liu J, Hsu C and Chou W (2014). On improvement of cloud virtual machine availability with virtualization fault tolerance mechanism, The Journal of Supercomputing, 69:3, (1103-1122), Online publication date: 1-Sep-2014.
- Geyik S, Saxena A and Dasdan A Multi-Touch Attribution Based Budget Allocation in Online Advertising Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, (1-9)
- Shi J, Zou J, Lu J, Cao Z, Li S and Wang C (2014). MRTuner, Proceedings of the VLDB Endowment, 7:13, (1319-1330), Online publication date: 1-Aug-2014.
- Radenski A Big data, high-performance computing, and MapReduce Proceedings of the 15th International Conference on Computer Systems and Technologies, (13-24)
- Li M, Zeng L, Meng S, Tan J, Zhang L, Butt A and Fuller N MRONLINE Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, (165-176)
- Okcan A and Riedewald M Anti-combining for MapReduce Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, (839-850)
- Andoni A, Nikolov A, Onak K and Yaroslavtsev G Parallel algorithms for geometric graph problems Proceedings of the forty-sixth annual ACM symposium on Theory of computing, (574-583)
- R K, Anwar A and Butt A hatS Proceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, (502-511)
- Suthaharan S (2014). Big data classification, ACM SIGMETRICS Performance Evaluation Review, 41:4, (70-73), Online publication date: 17-Apr-2014.
- Fisk M and Hash C FileMap Proceedings of the Fourth International Workshop on Cloud Data and Platforms, (1-6)
- Bruno R and Ferreira P freeCycles Proceedings of the Fourth International Workshop on Cloud Data and Platforms, (1-6)
- Watanabe H and Kawarasaki M Impact of Data Transfer to Hadoop Job Performance Proceedings of the 2014 ACM Southeast Conference, (1-6)
- Guo Y, Varbanescu A, Iosup A, Martella C and Willke T Benchmarking graph-processing platforms Proceedings of the 5th ACM/SPEC international conference on Performance engineering, (289-292)
- Bumgardner V and Marek V Scalable hybrid stream and hadoop network analysis system Proceedings of the 5th ACM/SPEC international conference on Performance engineering, (219-224)
- Silva Y, Dietrich S, Reed J and Tsosie L Integrating big data into the computing curricula Proceedings of the 45th ACM technical symposium on Computer science education, (139-144)
- Buys K, Cagniart C, Baksheev A, De Laet T, De Schutter J and Pantofaru C (2014). An adaptable system for RGB-D based human body detection and pose estimation, Journal of Visual Communication and Image Representation, 25:1, (39-52), Online publication date: 1-Jan-2014.
- Donnelly P and Thain D Design of an active storage cluster file system for DAG workflows Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems, (37-42)
- Griffis E, Martin P and Cheney J Semantics and provenance for processing element composition in dispel workflows Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science, (38-47)
- Prasad S, Shekhar S, McDermott M, Zhou X, Evans M and Puri S GPGPU-accelerated interesting interval discovery and other computations on GeoSpatial datasets Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, (65-72)
- de Oliveira M, Alves A, Leite D, Rocha J, Filho J and de Souza Baptista C Introducing spatial context in comparative pricing and product search Proceedings of the Fifth International Conference on Management of Emergent Digital EcoSystems, (127-134)
- Park H and Chung C An efficient MapReduce algorithm for counting triangles in a very large graph Proceedings of the 22nd ACM international conference on Information & Knowledge Management, (539-548)
- Sakr S, Liu A and Fayoumi A (2013). The family of mapreduce and large-scale data processing systems, ACM Computing Surveys, 46:1, (1-44), Online publication date: 1-Oct-2013.
- León X and Navarro L Incentives for Dynamic and Energy-Aware Capacity Allocation for Multi-tenant Clusters Proceedings of the 10th International Conference on Economics of Grids, Clouds, Systems, and Services - Volume 8193, (106-121)
- Ruan G, Zhang H and Plale B Exploiting MapReduce and data compression for data-intensive applications Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery, (1-8)
- Suthaharan S A single-domain, representation-learning model for big data classification of network intrusion Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition, (296-310)
- Aniello L, Baldoni R and Querzoni L Adaptive online scheduling in storm Proceedings of the 7th ACM international conference on Distributed event-based systems, (207-218)
- Bux M and Leser U DynamicCloudSim Proceedings of the 2nd ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, (1-12)
- Bergen A, Yazır Y, Müller H and Coady Y RPC automation Proceedings of the 8th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, (175-180)
- Langhans P, Wieser C and Bry F Crowdsourcing MapReduce Proceedings of the 22nd International Conference on World Wide Web, (253-256)
- Lee Y and Lee Y (2012). Toward scalable internet traffic measurement and analysis with Hadoop, ACM SIGCOMM Computer Communication Review, 43:1, (5-13), Online publication date: 9-Jan-2013.
- Feldman D, Schmidt M and Sohler C Turning big data into tiny data Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms, (1434-1453)
- Jia Z, Zhou R, Zhu C, Wang L, Gao W, Shi Y, Zhan J and Zhang L The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems Revised Selected Papers of the First Workshop on Specifying Big Data Benchmarks - Volume 8163, (44-59)
- Begoli E A short survey on the state of the art in architectures and platforms for large scale data analysis and knowledge discovery from data Proceedings of the WICSA/ECSA 2012 Companion Volume, (177-183)
- Ganchev I, Ji Z and O'Droma M A conceptual framework for building a mobile services' recommendation engine 2016 IEEE 8th International Conference on Intelligent Systems (IS), (285-289)
- Wu X and Loiseau P Algorithms for scheduling deadline-sensitive malleable tasks 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), (530-537)
Index Terms
- Hadoop: The Definitive Guide