• Herodotou H and Kakoulli E. (2023). Cost-based Data Prefetching and Scheduling in Big Data Platforms over Tiered Storage Systems. ACM Transactions on Database Systems. 48:4. (1-40). Online publication date: 31-Dec-2024.

    https://doi.org/10.1145/3625389

  • Lin Y, Tang B, Zhou S, Xie Z and Ye B. (2023). Efficient Node Selection for Coding-based Timely Computation over Heterogeneous Systems 2023 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom). 10.1109/ISPA-BDCloud-SocialCom-SustainCom59178.2023.00065. 979-8-3503-2922-3. (246-253).

    https://ieeexplore.ieee.org/document/10491747/

  • Alamro S, Lan T and Subramaniam S. (2023). Forseti: Dynamic chunk-level reshaping for data processing on heterogeneous clusters. Journal of Parallel and Distributed Computing. 10.1016/j.jpdc.2022.09.003. 171. (14-23). Online publication date: 1-Jan-2023.

    https://linkinghub.elsevier.com/retrieve/pii/S0743731522001915

  • Cha S, Lee M, Lee S and Oh H. SymTuner. Proceedings of the 44th International Conference on Software Engineering. (2068-2079).

    https://doi.org/10.1145/3510003.3510185

  • Bei Z, Kim N, HWang K and Yu Z. OSC: An Online Self-Configuring Big Data Framework for Optimization of QoS. IEEE Transactions on Computers. 10.1109/TC.2021.3063278. 71:4. (809-823).

    https://ieeexplore.ieee.org/document/9369009/

  • Guo Y, Shan H, Huang S, Hwang K, Fan J and Yu Z. GML: Efficiently Auto-Tuning Flink's Configurations Via Guided Machine Learning. IEEE Transactions on Parallel and Distributed Systems. 10.1109/TPDS.2021.3081600. 32:12. (2921-2935).

    https://ieeexplore.ieee.org/document/9435010/

  • Ajibade Lukuman Saheed , Abu Bakar Kamalrulnizam , Ahmed Aliyu and Tasneem Darwish . (2021). Latency-aware Straggler Mitigation Strategy in Hadoop MapReduce Framework: A Review. Systematic Literature Review and Meta-Analysis Journal. 10.54480/slrm.v2i2.19. 2:2. (53-60).

    http://slr-m.com/index.php/home/article/view/19

  • Jinan R, Badita A, Sarvepalli P and Parag P. (2021). Low latency replication coded storage over memory -constrained servers 2021 IEEE International Symposium on Information Theory (ISIT). 10.1109/ISIT45174.2021.9517901. 978-1-5386-8209-8. (2340-2345).

    https://ieeexplore.ieee.org/document/9517901/

  • Herodotou H and Kakoulli E. (2021). Trident. Proceedings of the VLDB Endowment. 14:9. (1570-1582). Online publication date: 1-May-2021.

    https://doi.org/10.14778/3461535.3461545

  • Bakni N and Assayad I. Survey on improving the performance of MapReduce in Hadoop. Proceedings of the 4th International Conference on Networking, Information Systems & Security. (1-5).

    https://doi.org/10.1145/3454127.3456617

  • Zhong X, Li M, Yang H, Liu Y and Qian D. swMR: A Framework for Accelerating MapReduce Applications on Sunway Taihulight. IEEE Transactions on Emerging Topics in Computing. 10.1109/TETC.2018.2881265. 9:2. (1020-1030).

    https://ieeexplore.ieee.org/document/8534314/

  • Herodotou H, Chen Y and Lu J. (2020). A Survey on Automatic Parameter Tuning for Big Data Processing Systems. ACM Computing Surveys. 53:2. (1-37). Online publication date: 31-Mar-2021.

    https://doi.org/10.1145/3381027

  • Wang B, Tang J, Zhang R, Liu J, Liu S and Qi D. (2020). A Task-Aware Fine-Grained Storage Selection Mechanism for In-Memory Big Data Computing Frameworks. International Journal of Parallel Programming. 10.1007/s10766-020-00662-2.

    http://link.springer.com/10.1007/s10766-020-00662-2

  • Liu W, Huang P and He X. (2020). StragglerHelper: Alleviating Straggling in Computing Clusters via Sharing Memory Access Patterns 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 10.1109/IPDPS47924.2020.00068. 978-1-7281-6876-0. (602-611).

    https://ieeexplore.ieee.org/document/9139789/

  • Badita A, Parag P and Aggarwal V. Optimal Server Selection for Straggler Mitigation. IEEE/ACM Transactions on Networking. 10.1109/TNET.2020.2973224. 28:2. (709-721).

    https://ieeexplore.ieee.org/document/9018098/

  • Trotter M, Wood T and Hwang J. (2019). Forecasting a Storm: Divining Optimal Configurations using Genetic Algorithms and Supervised Learning 2019 IEEE International Conference on Autonomic Computing (ICAC). 10.1109/ICAC.2019.00025. 978-1-7281-2411-7. (136-146).

    https://ieeexplore.ieee.org/document/8831217/

  • Cheng D, Zhou X, Xu Y, Liu and Jiang C. (2019). Deadline-Aware MapReduce Job Scheduling with Dynamic Resource Availability. IEEE Transactions on Parallel and Distributed Systems. 30:4. (814-826). Online publication date: 1-Apr-2019.

    https://doi.org/10.1109/TPDS.2018.2873373

  • Lee G and Fortes J. (2019). Improving Data-Analytics Performance Via Autonomic Control of Concurrency and Resource Units. ACM Transactions on Autonomous and Adaptive Systems. 13:3. (1-25). Online publication date: 28-Mar-2019.

    https://doi.org/10.1145/3309539

  • Pires M, Silva N, Rocha L, Meira W and Ferreira R. (2019). Efficient Parallel Associative Classification Based on Rules Memoization. Computational Science – ICCS 2019. 10.1007/978-3-030-22747-0_3. (31-44).

    https://link.springer.com/10.1007/978-3-030-22747-0_3

  • Yu Z, Bei Z and Qian X. (2018). Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster Computing. ACM SIGPLAN Notices. 53:2. (564-577). Online publication date: 30-Nov-2018.

    https://doi.org/10.1145/3296957.3173187

  • Wang S, Chen W, Pi A and Zhou X. Aggressive Synchronization with Partial Processing for Iterative ML Jobs on Clusters. Proceedings of the 19th International Middleware Conference. (253-265).

    https://doi.org/10.1145/3274808.3274828

  • Li J and Li B. (2018). Parallelism-Aware Locally Repairable Code for Distributed Storage Systems 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS). 10.1109/ICDCS.2018.00019. 978-1-5386-6871-9. (87-98).

    https://ieeexplore.ieee.org/document/8416282/

  • Perez T, Chen W, Ji R, Liu L and Zhou X. (2018). PETS: Bottleneck-Aware Spark Tuning with Parameter Ensembles 2018 27th International Conference on Computer Communication and Networks (ICCCN). 10.1109/ICCCN.2018.8487324. 978-1-5386-5156-8. (1-9).

    https://ieeexplore.ieee.org/document/8487324/

  • Fu Z, Song T, Qi Z and Guan H. (2018). Efficient shuffle management with SCache for DAG computing frameworks. ACM SIGPLAN Notices. 53:1. (305-316). Online publication date: 23-Mar-2018.

    https://doi.org/10.1145/3200691.3178510

  • Yu Z, Bei Z and Qian X. Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster Computing. Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. (564-577).

    https://doi.org/10.1145/3173162.3173187

  • Fu Z, Song T, Qi Z and Guan H. Efficient shuffle management with SCache for DAG computing frameworks. Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. (305-316).

    https://doi.org/10.1145/3178487.3178510

  • Marco V, Taylor B, Porter B and Wang Z. Improving spark application throughput via memory aware task co-location. Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference. (95-108).

    https://doi.org/10.1145/3135974.3135984

  • Trotter M, Liu G and Wood T. (2017). Into the Storm: Descrying Optimal Configurations Using Genetic Algorithms and Bayesian Optimization 2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W). 10.1109/FAS-W.2017.144. 978-1-5090-6558-5. (175-180).

    http://ieeexplore.ieee.org/document/8064120/

  • Cheng D, Zhou X, Lama P, Wu J and Jiang C. Cross-Platform Resource Scheduling for Spark and MapReduce on YARN. IEEE Transactions on Computers. 10.1109/TC.2017.2669964. 66:8. (1341-1353).

    http://ieeexplore.ieee.org/document/7857034/

  • Lee G and Fortes J. (2017). Hierarchical Self-Tuning of Concurrency and Resource Units in Data-Analytics Frameworks 2017 IEEE International Conference on Autonomic Computing (ICAC). 10.1109/ICAC.2017.45. 978-1-5386-1762-5. (49-58).

    http://ieeexplore.ieee.org/document/8005327/

  • Chen W, Rao J and Zhou X. (2017). Addressing Memory Pressure in Data-intensive Parallel Programs via Container Based Virtualization 2017 IEEE International Conference on Autonomic Computing (ICAC). 10.1109/ICAC.2017.28. 978-1-5386-1762-5. (197-202).

    http://ieeexplore.ieee.org/document/8005349/

  • Nicolae B, Costa C, Misale C, Katrinis K and Park Y. (2017). Leveraging Adaptive I/O to Optimize Collective Data Shuffling Patterns for Big Data Analytics. IEEE Transactions on Parallel and Distributed Systems. 28:6. (1663-1674). Online publication date: 1-Jun-2017.

    https://doi.org/10.1109/TPDS.2016.2627558

  • Rakshith P, Manishankar S and Sushmitha P. (2017). Enterprise data analytics and processing with an integrated hadoop and R platforms 2017 International Conference on Intelligent Computing and Control (I2C2). 10.1109/I2C2.2017.8321815. 978-1-5386-0374-1. (1-5).

    http://ieeexplore.ieee.org/document/8321815/

  • Cheng D, Chen Y, Zhou X, Gmach D and Milojicic D. (2017). Adaptive scheduling of parallel jobs in spark streaming IEEE INFOCOM 2017 - IEEE Conference on Computer Communications. 10.1109/INFOCOM.2017.8057206. 978-1-5090-5336-0. (1-9).

    http://ieeexplore.ieee.org/document/8057206/

  • Cheng Y, Jiang H, Wang F, Hua Y and Feng D. (2017). BlitzG: Exploiting high-bandwidth networks for fast graph processing IEEE INFOCOM 2017 - IEEE Conference on Computer Communications. 10.1109/INFOCOM.2017.8057203. 978-1-5090-5336-0. (1-9).

    http://ieeexplore.ieee.org/document/8057203/

  • Cheng D, Rao J, Guo Y, Jiang C and Zhou X. (2017). Improving Performance of Heterogeneous MapReduce Clusters with Adaptive Task Tuning. IEEE Transactions on Parallel and Distributed Systems. 28:3. (774-786). Online publication date: 1-Mar-2017.

    https://doi.org/10.1109/TPDS.2016.2594765

  • Guo Y, Rao J, Jiang C and Zhou X. (2017). Moving Hadoop into the Cloud with Flexible Slot Management and Speculative Execution. IEEE Transactions on Parallel and Distributed Systems. 28:3. (798-812). Online publication date: 1-Mar-2017.

    https://doi.org/10.1109/TPDS.2016.2587641

  • Lee S, Bae M, Eum J and Oh S. (2017). Efficient vCore Based Container Deployment Algorithm for Improving Heterogeneous Hadoop YARN Performance. Information Science and Applications 2017. 10.1007/978-981-10-4154-9_23. (191-201).

    http://link.springer.com/10.1007/978-981-10-4154-9_23

  • Dai J, Song L and Gu J. (2016). The architecture and task scheduling design for the video analysis center 2016 International Conference on Progress in Informatics and Computing (PIC). 10.1109/PIC.2016.7949560. 978-1-5090-3484-0. (545-549).

    http://ieeexplore.ieee.org/document/7949560/

  • Feng Y and Chen H. (2016). Optimization of spark storage solutions 2016 International Conference on Progress in Informatics and Computing (PIC). 10.1109/PIC.2016.7949547. 978-1-5090-3484-0. (473-478).

    http://ieeexplore.ieee.org/document/7949547/

  • Zhang X, Wu Y and Zhao C. (2016). MrHeter. Cluster Computing. 19:4. (1691-1701). Online publication date: 1-Dec-2016.

    https://doi.org/10.1007/s10586-016-0625-2

  • Yang H, Liu X, Chen S, Lei Z, Du H and Zhu C. (2016). Improving Spark performance with MPTE in heterogeneous environments 2016 International Conference on Audio, Language and Image Processing (ICALIP). 10.1109/ICALIP.2016.7846627. 978-1-5090-0654-0. (28-33).

    http://ieeexplore.ieee.org/document/7846627/

  • Dai J and Wang X. (2016). Effective Task Scheduling for Large-Scale Video Processing. Security, Privacy and Anonymity in Computation, Communication and Storage. 10.1007/978-3-319-49145-5_32. (323-331).

    http://link.springer.com/10.1007/978-3-319-49145-5_32

  • Aalst W and Damiani E. Processes Meet Big Data: Connecting Data Science with Process Science. IEEE Transactions on Services Computing. 10.1109/TSC.2015.2493732. 8:6. (810-819).

    http://ieeexplore.ieee.org/document/7302592/

  • Cheng D, Rao J, Jiang C and Zhou X. Resource and Deadline-Aware Job Scheduling in Dynamic Hadoop Clusters. Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium. (956-965).

    https://doi.org/10.1109/IPDPS.2015.36

  • Guo Y, Rao J, Cheng D, Jiang C, Xu C and Zhou X. (2015). StoreApp: A shared storage appliance for efficient and scalable virtualized Hadoop clusters IEEE INFOCOM 2015 - IEEE Conference on Computer Communications. 10.1109/INFOCOM.2015.7218427. 978-1-4799-8381-0. (594-602).

    http://ieeexplore.ieee.org/document/7218427/