Andrew A Lamb |
Apache DataFusion PMC (Chair) | Apache Arrow PMC |
Member, Apache Software Foundation |
LinkedIn | Github |
Last Update: Sep, 2024 |
I am a software engineer with experience in environments ranging from 2 developers in a VC's office, to large multinational corporations and distributed open source projects (I love small companies). I focus on systems (e.g. databases), and platform engineering, and have been both an architect and manager/VP. I currently work in Rust on InfluxDB 3.0, focused on query processing, the Apache DataFusion query engine and the Apache Arrow ecosystem. I am honored to serve on the Apache DataFusion PMC (2024 Chair), and Apache Arrow PMC (2023 Chair). I actively contribute to the Apache Arrow DataFusion query engine and the Apache Arrow Rust implementation |
Highlights (full list below)2024-09-23 Carnegie Mellon Univeristy: Database Building Blocks Seminar Series - Fall 2024 Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine (talk) slides, recording 2024-06-19 Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine (alternate download) Andrew Lamb, Yijie Shen, Daniël Heres, Jayjeet Chakraborty, Mehmet Ozan Kabak, Chao Sun, and Liang-Chi Hsieh, 2024 International Conference on Management of Data (SIGMOD 2024), June 9-15, 2024, Santiago, Chile 2012-08-27 The Vertica Analytic Database: C-Store 7 Years Later. Andrew Lamb, Matt Fuller, Ramakrishna Varadarajan, Nga Tran, Ben Vandiver, Lyric Doshi, Chuck Bear. 38th International Conference on Very Large Data Bases, Proceedings of the VLDB Endowment, Vol. 5, No. 12 |
Blogs2024-03-18 [InfluxData Blog] Making Most Recent Value Queries Hundreds of Times Faster 2023-08-01 [InfluxData Blog] Aggregating Millions of Groups Fast in Apache Arrow DataFusion (cross post on Arrow Blog ) 2022-12-07 [InfluxData Blog] Querying Parquet with Millisecond Latency (cross post on arrow.apache.org/blog ) 2022-11-07 [Apache Arrow Blog] Fast and Memory Efficient Multi-Column Sorts in Apache Arrow Rust, Part 2 2022-11-07 [Apache Arrow Blog] Fast and Memory Efficient Multi-Column Sorts in Apache Arrow Rust, Part 1 2022-10-27 [ODBMS.org] On InfluxData's New Storage Engine. Q&A with Andrew Lamb 2022-10-08 [Apache Arrow Blog] Arrow and Parquet Part 2: Nested and Hierarchical Data using Structs and Lists 2022-10-05 [Apache Arrow Blog] Arrow and Parquet Part 1: Primitive Types and Nullability 2022-01-14 [InfluxData Blog] Rust Object Store Donation 2022-01-14 Using Rustlang's Async Tokio Runtime for CPU-Bound Tasks. |
Talks and Presentations2024-10-28 Boston Univeristy: MiDAS Fall 2024 (Data Systems Seminar) Apache DataFusion: Design Choices when Building Modern Analytic Systems slides, slides(pdf), recording 2024-09-27 Belgrade Apache DataFusion Meetup Apache DataFusion: What, Why and How slides, recording 2024-09-23 Carnegie Mellon Univeristy: Database Building Blocks Seminar Series - Fall 2024 Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine (talk) slides, recording 2024-06-26 New York City Apache DataFusion Meetup NYC Meetup slides 2024-06-26 Microsoft Gray Systems Lab: Building InfluxDB 3.0 (and other systems) without starting from "scratch" with Apache DataFusion slides 2024-06-25 San Francisco Bay Area Apache DataFusion Meetup DataFusion Meetup 2.0 - San Francisco slides 2024-06-14 2024 Simplicy in Management of Data (SiMOD) DataFusion: The Case for Building Open Data Systems (keynote) slides 2024-06-13 2024 ACM SIGMOD International Conference on Management of Data Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine (talk) slides, recording, paper 2023-05-09 [ODSC East 2024]: Introduction to Apache Arrow and Apache Parquet, using Python and pyarrow (updated). slides 2024-03-27 DataCouncil 2024: Building InfluxDB 3.0 with Apache Arrow, DataFusion, Flight and Parquet. slides, recording, 2024-03-27 Apache Arrow Datafusion Meetup: Introduction, Agenda, Remarks. slides, recording, 2023-09-27 MIT Database Group: Implementing InfluxDB IOx, "from scratch" using Apache Arrow, DataFusion, and Rust. slides, 2023-06-02 [Dutch Seminar on Database System Design]: Implementing InfluxDB IOx, "from scratch" using Apache Arrow, DataFusion, and Rust. slides, recording, 2023-05-09 [ODSC East 2023]: Introduction to Apache Arrow and Apache Parquet, using Python and pyarrow. slides 2023-04-05 The Apache Arrow DataFusion Architecture Part 3: Physical Plan and Execution. slides, recording, 2023-04-04 The Apache Arrow DataFusion Architecture Part 2: Logical Plans and Expressions. slides, recording, 2023-03-31 The Apache Arrow DataFusion Architecture Part 1: Query Engines. slides, recording, 2023-02-15 [Invited Talk at Optum Labs]: Building a new time series database "from scratch" Using Apache Arrow, Parquet, DataFusion and Rust slides, 2022-06-27 [DataBricks Data+AI Summit]: DataFusion and Arrow: Supercharge Your Data Analytical Tool with a Rusty Query Engine. slides, recording 2022-05-23 [The Data Thread 2022]: Apache Arrow and DataFusion: Changing the Game for Implementing Database Systems. slides, recording 2022-04-06 [EM.S20, MIT Sloan School of Management, Guest Speaker]: Managing Software Dependencies and the Supply Chain. slides 2021-10-13 [InfluxData Tech Talk]: Query Processing in InfluxDB IOx. slides, recording 2021-04-20 [USC CSE-132 Database Systems Implementation, Guest Speaker]: Apache Arrow and its impact on the database industry. slides, recording 2021-03 [InfluxData Tech Talk]: Query Engine Design and the Rust-Based DataFusion in Apache Arrow. slides, slides (slideshare), recording 2020-12-09 [InfluxData Tech Talk]: A Rusty Introduction to Apache Arrow and how it applies to a TimeSeries Database. slides, recording 2013-01-10 [MIT IAP Talk]: Tradeoffs in Massively Parallel Analytical Systems. slides |
Journal / Conference Papers2024-08-26 The Five-Minute Rule for the Cloud: Caching in Analytics Systems Kira Duwe (EPFL), Angelos-Christos Anadiotis (Oracle Zurich), Andrew Lamb (InfluxData), Lucas Lersch (Amazon), Boaz Leskes (MotherDuck), Daniel Ritter (SAP), Pinar Tozun (IT University of Copenhagen) The Conference on Innovative Data Systems Research (CIDR), 2025, Amsterdam, The Netherlands 2024-08-26 POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least Resistance (alternate download) David Justen, Daniel Ritter, Campbell Fraser, Andrew Lamb, Allison Lee, Thomas Bodner, Mhd Yamen Haddad, Steffen Zeuch, Volker Markl, and Matthias Boehm, Proc. VLDB Endow. 17, 6 (February 2024), 1350-1363. 2024-06-19 Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine (alternate download) Andrew Lamb, Yijie Shen, Daniël Heres, Jayjeet Chakraborty, Mehmet Ozan Kabak, Chao Sun, and Liang-Chi Hsieh, 2024 International Conference on Management of Data (SIGMOD 2024), June 9-15, 2024, Santiago, Chile 2014-03-31 The Vertica Query Optimizer: The Case for Specialized Query Optimizers. (alternate download) Nga Tran, Andrew Lamb, L. Shrinivas, Sreenath Bodagala and Jaimin Dave, IEEE International Conference on Data Engineering (ICDE - 2014) 2012-08-27 The Vertica Analytic Database: C-Store 7 Years Later. (alternate download) Andrew Lamb, Matt Fuller, Ramakrishna Varadarajan, Nga Tran, Ben Vandiver, Lyric Doshi, Chuck Bear. 38th International Conference on Very Large Data Bases, Proceedings of the VLDB Endowment, Vol. 5, No. 12 2003-06-08 Linear analysis and optimization of stream programs. (alternate download) Andrew A. Lamb, William Thies and Saman Amarasinghe. ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI) 2002-08-05 A stream compiler for communication-exposed architectures. (alternate download) Michael I. Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali S. Meli, Andrew A. Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, Saman Amarasinghe. International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) |
Really Old Content |
Old Blog |
Six Hertz, Six Bytes |
Pre-github projects |
Class List |
School Projects |