The D3S Seminar is an irregular meeting event of the department members and guest speakers. It is also a course Advanced Topics in Distributed and Component-Based Systems I, II NSWI057, NSWI058. This course is recommended for Ph.D. and advanced graduate students.
Meetings take place on Tuesdays at 14:00 in S9 (if not noted otherwise in the schedule below). Each seminar is announced in advance at this webpage. We recommend subscribing to the seminar mailing list to receive announcements on special seminars, schedule updates and other important news.
Scheduled Seminars
No seminars are currently scheduled.
Passed Seminars
Grigory Fedyukovich (Florida State University)
December 02, 2024
Maximizing Branch Coverage with Constrained Horn Clauses
State-of-the-art solvers for constrained Horn clauses (CHC) are
successfully used to generate reachability facts for software using its
symbolic encoding. In this talk, I will present a new application of
CHCs to test-case generation, a problem of finding a set of tuples of
input values to a program under which the program visits as many
branches as possible. The key insight to achieve maximality is to
identify and skip blocks of code that are provably unreachable. The new
approach to test case generation called HORNTINUUM uses CHC to construct
different program unrollings incrementally and extract test cases from
models of satisfiable formulas. At the same time, a CHC solver keeps
track of CHCs that represent unreachable blocks of code, making the
unrolling process more efficient. In practice, this lets HORNTINUUM
terminate early while guaranteeing maximal coverage. HORNTINUUM exhibits
promising performance: it generates high coverage in most cases and
takes less time on average than state-of-the-art based on bounded model
checking, concolic execution, and/or fuzzing.
Bio: Grigory
Fedyukovich is an Assistant Professor at Florida State University. He
completed his Ph.D. at the University of Lugano under the supervision of
Prof Natasha Sharygina, a postdoc at the University of Washington with
Prof Rastislav Bodik, and a postdoc at Princeton University with Prof
Aarti Gupta. His main research interests are in the fields of automated
reasoning, software verification, and synthesis.
Aleksander Boruch-Gruszecki
October 15, 2024
Gradient: Gradual Compartmentalization via Object Capabilities Tracked in Types
Modern software needs fine-grained compartmentalization, i.e., intra-process isolation. A particularly important reason for it are supply-chain attacks, the need for which is aggravated by modern applications depending on hundreds or even thousands of libraries. Object capabilities (ocaps) are a particularly salient approach to compartmentalization, but they require the entire program to assume a lack of ambient authority. Most of existing code was written under no such assumption; effectively, existing applications need to undergo a rewrite-the-world migration to reap the advantages of ocap. We propose gradual compartmentalization, an approach which allows gradually migrating an application to object capabilities, component by component in arbitrary order, all the while continuously enjoying security guarantees. The approach relies on runtime authority enforcement and tracking the authority of objects the type system. We present Gradient, a proof-of-concept gradual compartmentalization extension to Scala which uses Enclosures and Capture Tracking as its key components. We evaluate our proposal by migrating the standard XML library of Scala to Gradient.
Andrej Pečimúth
October 15, 2024
An Analysis of Compiled Code Reusability in Dynamic Compilation (Rehearsal Talk)
Large applications reliant on dynamic compilation for performance often run in horizontally scaled architectures. When this is combined with frequent deployment or demand-based scaling, hardware capacity is lost to frequent warmup phases due to the need to recompile the code after each start of the virtual machine (VM). Moreover, the individual VMs waste hardware resources by repeating the same compilations. Offloading compilation jobs to a dedicated compilation server can mitigate these problems. Such a server can compile the code in a mode where the compilation result is reusable for multiple VMs. The goal is to save compilation resources, such as CPU and memory, and potentially improve the warmup time of individual VMs. This paper investigates the options to reuse previous compilation results within a high-performance VM. We present an empirical study using the GraalVM compiler and the HotSpot Java VM. To facilitate code reuse, we introduce an approach that compiles code into a reusable high-level intermediate representation (IR). This approach defers VM-specific optimizations until the time of reuse. The incurred slowdown of such code varies by workload, ranging between a negligible impact and a 6x slowdown. Although deferred optimization impacts the efficiency of particular code patterns, such reused code still performs significantly better than that compiled by a lower-tier compiler. Therefore, the presented approach can form the foundation for improving warmup times in certain workloads.
Jan Liam Verter
October 08, 2024
Don’t Call Us, We’ll Call You (Towards Mixed-Initiative Interactive Proof Assistants for Programming Language Theory)
There are two kinds of systems that programming language researchers use for their work. Semantics engineering tools let them interactively explore their definitions, while proof assistants can be used to check the proofs of their properties. The disconnect between the two kinds of systems leads to errors in accepted publications and also limits the modes of interaction available when writing proofs. When constructing a proof, one typically states the property and then develops the proof manually until an automatic strategy can fill the remaining gaps. We believe that an integrated and more interactive tool that leverages the typical structure of programming language could do better. In the early work presented in this paper, we focus on the problem of interacting with a proof assistant. Rather than starting with manual proof construction and then completing the last steps automatically, we propose a way of working where the tool starts with an automatic proof search and then breaks when it requires feedback from the user. Our early experience suggests that this way of working can make proof construction easier.
Maya Mückenschnabel
October 08, 2024
Algebraic Effect Handlers with Bidirectional Type-Checking
In the development of type systems there are multiple paths for achieving more expressiveness. On the one hand, algebraic effect handlers allow us to reason about program's side effects. On the other hand, dependent types make it possible to more precisely reason about program states and data and to prove general mathematical statements. Our work is concerned with the development of a practical Lisp-based programming language that combines dependent types with effect handlers. More specifically we focus on describing the type system that has the two following features. First it combines the algebraic effect system with the bidirectional type-checking algorithm in order to support effects in dependent types. Second, it introduces further type inference rules for greater flexibility.
Adam Šmelko
September 24, 2024
Employing Parallel Computing in Data-Intensive Tasks (PhD. defence rehearsal talk)
Michal Töpfer
May 28, 2024
Can LLMs Understand Software Architectures?
This talk will focus on two examples of software architectures and whether large language models (LLMs) can understand them. First, we will look at workflow architectures represented by experiment workflows from the ExtremeXP project. The second part of the talk will be about the DEECo ensemble-based component model and what to expect from LLMs in this context.
Andrej Pečimúth
May 21, 2024
Remote JIT compilation with code caching
Large, complex applications based on dynamic runtimes can be inefficient in deployments with frequent virtual machine (VM) restarts, such as serverless computing or deployments with demand-driven scaling. The problem is the need to recompile code on each VM startup so that the application can reach peak performance. We propose to offload the compilations to a separate server and reuse the information from previous compilations. The goal of this approach is to save resources and speed up how long it takes each VM to start.
Milad Ashqi Abdullah
May 07, 2024
Robin
Systematic literature mapping is an essential part of research methodology. Conducting a systematic literature mapping is challenging. Researchers query publications from various sources, which need to be filtered, categorized, and cleared of duplicates. It is usually the case that the number of publications ranges between hundreds to thousands. The whole process is often performed iteratively and repeatedly, especially at the start of the mapping study, which only further increases the effort. When a team of researchers conducts a mapping study, the members may have different opinions on filtering and categorizing papers, which must be resolved. To our knowledge, this problem is very poorly supported by open-source tools. To address these issues, we present a tool called Robin which facilitates managing the steps of conducting a mapping study within a team. It provides search tools, categorization, and a platform for team members to define their criteria for including and excluding papers. In addition, Robin is connected to publicly available publications search platforms such as IEEE API and Scopus API. Robin is written in Python-Django and can be installed as a web service.
Jiří Klepl
April 23, 2024
Pure C++ Approach to Optimized Parallel Traversal of Regular Data Structures
Cristina Abad Robalino (Escuela Superior Politécnica del Litoral in Guayaquil-Ecuador)
March 26, 2024
Designing Serverless Platforms to Support Emerging Applications
Serverless computing offerings from cloud providers have gained significant traction in recent years due to the advantages that these platforms bring with their flexible pricing models, built-in scalability, and minimal operational requirements. In a recent survey of serverless use cases, we found a wide variety of applications that depend on these services, including implementing the core functionality at the backend of mobile applications, automating the DevOps tasks of complex distributed applications, real-time processing of IoT streaming data, and scientific applications. To properly support these applications, the platforms should be fast, self-managing, and provide support for diverse QoS requirements. As a result, novel improvements to serverless platforms are rapidly being proposed and adopted. Evaluating these solutions necessitates application-based, workload-aware benchmarking tools that the community can rely on. This talk addresses these challenges and our research efforts on tackling them, presenting a performance engineering perspective about the current state and future challenges of serverless computing research. I will describe our solutions in resource management for serverless platforms, focusing on solutions that improve performance or reduce costs via scheduling, caching, and right-sizing of resources, along with our ongoing efforts in developing an application-driven serverless benchmark.
Jan Vitek
October 17, 2023
Two talks for the price of one!
Two rehearsal talks for the OOPSLA conference. First on “Reusing JIT compiled code” an OOPSLA talk where we show how to soundly reuse previously compiled code across programs. Second an invited talk for the Dynamic Language Symposium titled “Prof. Strangelove Or: How I stopped worrying and started to love dynamic languages” where I will tell you about my experience teaching, researching and funding work on strange languages.
Andrej Pečimúth
October 10, 2023
Remote JIT Compilation / Diagnosing Compiler Performance
This talk is a rehearsal for a conference consisting of two parts. In the first part, we introduce remote JIT compilation. Remote JIT compilation for dynamic languages addresses the challenges related to resource usage of the compiler and warmup. In the second part, we describe an approach to diagnose performance issues in dynamic compilers. We do this by capturing and comparing the compiler's optimization decisions.
Darius Blasband (CEO of Raincode)
September 21, 2023
Compiler-Integrated Meta-Programming For Legacy Languages
Most meta-programming systems operate on small volumes of code, written in idealized languages, processed in controlled and friendly environments. At the opposite end of the spectrum, this presentation describes an industrial-grade mechanism integrated in a commercial product to be deployed on tens or hundreds of unsupervised machines, and how it addresses the challenges that come with huge code bases written in antiquated and poorly defined languages. A number of real-world uses cases are described to validate the approach.
Philipp Rümmer (University of Regensburg)
May 23, 2023
Black Ostrich: Web Application Scanning with String Solvers
Securing web applications remains a pressing challenge. Unfortunately, the state of the art in web crawling and security scanning still falls short of deep crawling. A major roadblock is the crawlers’ limited ability to pass input validation checks when web applications require data of a certain format, such as email, phone number, or zip code. This talk presents Black Ostrich, a principled approach to deep web crawling and scanning. The key idea is to equip web crawling with string constraint-solving capabilities to dynamically infer suitable inputs from regular expression patterns in web applications and thereby pass input validation checks. To enable this use of constraint solvers, we develop new automata-based techniques to handle complex real-world regular expressions, including support for the relevant features of ECMA JavaScript regular expressions. We implement our approach by extending and combining the Ostrich constraint solver with the Black Widow web crawler. Joint work with Benjamin Eriksson, Amanda Stjerna, Riccardo De Masellis, and Andrej Sabelfeld, to appear in the ACM Conference on Computer and Communications Security (CCS) 2023.
David Georg Reichelt (Lancaster University in Leipzig)
May 04, 2023
Examination of Performance Changes
Changes to the source code of software may result in varied performance. The detection and analysis require the specification of workloads, their measurement and the analysis of measurement results. The specification of workloads, which are able to detect performance regressions, requires immense manual effort. The talk presents the Peass approach (Performance analysis of software systems). Peass is based on the assumption, that performance changes can be identified by unit tests. Therefore, Peass consists of (1) a method for regression test selection, which determines between which commits the performance may have changed based on static code analysis and analysis of the runtime behaviour, (2) a method for transforming unit tests into performance tests and for statistically reliable and reproducible measurement of the performance and (3) a method for aiding the diagnosis of root causes of performance changes. The Peass approach thereby allows us to automatically examine performance changes that are measurable by the workload of unit tests.
Alexandru Iosup
April 26, 2023
Massivizing Computer Systems: VU on the Science, Design, and Engineering of Distributed Systems and Ecosystems
Wherever we look, our society is turning digital. Science and engineering, business-critical and economic operations, and online education and gaming rely increasingly on the effective digitalization of their processes. For digitalization to succeed, societal processes must leverage efficient computer systems, effectively and efficiently integrated into larger _ecosystems_, managed primarily without application developer and even client input. However successful until now, we cannot take these ecosystems for granted: the core does not yet rely on sound principles of science and design, and there are warning signs about the scalability, dependability, and sustainability of engineered operations. This is the challenge of massivizing computer systems. In this talk, inspired by this challenge and by our experience with distributed computer systems for over 15 years, we focus on understanding, deploying, scaling, and evolving computer ecosystems successfully. We posit we can achieve this through an ambitious, comprehensive research program, which starts from the idea that we can address the grand, fundamental challenge by focusing on computer ecosystems rather than merely on (individual, small-scale) computer systems. To this end, we define (distributed) computer ecosystems and differentiate them from (distributed) computer systems. We formulate principles and introduce a reference architecture for computer ecosystems supporting diverse workloads – AI/ML, big data and graph processing, online gaming and metaverse, and business-critical and serverless – and diverse resources and backend services across the computing continuum. We synthesize a framework of resource management and scheduling (RM&S) techniques, which we argue should be explored systematically in the next decade. We show early results obtained experimentally, through controlled real-world experiments, long-term observation, and what-if analysis of short- and long-term scenarios using the OpenDC digital twin for datacenters. This is a call to the entire community: there is much to discover and achieve, and to get meaningful, long-lasting results we need to form a community spanning distributed systems, performance engineering, software engineering, and more. Our joint work could lead to holistic improvements of applications, services, and processes, together with the computing infrastructure supporting them. This vision aligns with the Manifesto on Computer Systems and Networking Research in the Netherlands, which the speaker co-leads. Many of our examples come from real-world prototyping and experimentation, grand experiments in computer systems, and/or benchmarking and performance analysis work conducted with the Cloud group of SPEC RG.
Karthik Vaidhyanathan (International Institute of Information Technology – Hyderabad)
April 11, 2023
Engineering IoT systems through the lens of a smart city living lab
Many urban cities today are facing various challenges associated with sustainability, traffic congestion, crowd management, etc. Given this context, different government and private organizations have realized the importance of building smarter cities which can offer a way to address many of these challenges. Many cities worldwide have started utilizing technology extensively to offer different smart solutions, made possible by the Internet of Things (IoT) and related technologies. However, developing and maintaining IoT systems, in general, presents different challenges arising from interoperability (communication protocols), heterogeneity (types of devices), resource constraints, performance requirements etc. Further, it is also not feasible to test an IoT solution in a large-scale city-wide setup prior to deployment. To this end, the smart city living lab, an open-innovation ecosystem has been set up at IIITH, with support from the MEITY, Smart City Mission and Government of Telangana, India and in collaboration with the technology partners EBTC and Amsterdam Innovation Arena to discover & develop cutting edge innovations with smart city use cases and enrich them with the knowledge from research. The lab has the active deployment of more than 200 IoT nodes spanning different IoT verticals such as air quality, water quality and quantity, solar power, etc. These then serve as a test bed for various stakeholders to build and test their smart city solutions. In this talk, I will provide an overview of the Smart city living lab by elaborating on the stack used to handle some of the above-mentioned challenges including details on the IoT nodes deployed, the software stack used, the data pipeline and the visualization generated. Further, the talk will provide a glimpse into some of the existing challenges, and ongoing and future research plans for the smart city living lab.
David Košťál
April 04, 2023
Insecurity Refactoring: Automated Injection of Vulnerabilities in Source Code (report on paper)
Important part of software development is to make the code safe against possible attackers. One of possible attacks is SQL injection, where if a field is not sanitized correctly, attacker may inject code resulting in leaking information to the attacker or even enabling him to modify the database. This being said, it is important to teach programmers beginners how to recognize vulnerability to this type of attack and how to correctly fix it. For this purpose usually only small example programs are made where the vulnerability is pretty obvious and artifical. Presented solution for automatically making more complex learning examples, Insecurity refactoring, takes existing code and changes its internal structure to inject a vulnerability without changing the observable behavior in a normal use case scenario. Insecurity refactoring is achieved by creating an Adversary Controlled Input Dataflow tree, which is then used to find possible injection paths. The injected vulnerabilities stem from real Common vulnerabilities and Exposures (CVE) reports which makes the learning examples unique and realistic.
Tomáš Petříček
March 28, 2023
Cultures of Programming
Computer programming originated at the intersection of logic, art, electrical engineering, business and psychology. Those disciplines brought with themselves different ways of thinking about and doing programming. In the early days, the friction between cultures could have been just a sign of the immaturity of the field. 70 years later, the idea that a unified culture will emerge as the field matures is becoming difficult to believe. Different cultures keep their strong identity, but they interact. Interesting innovations appear and revealing discussions occur when multiple cultures meet or clash. In this talk, I will illustrate the interactions between different cultures of programming using three case studies from the history of programming: the history of formal reasoning about programs, the rise of interactive programming tools and the development of software engineering.
Martin Blicha
March 14, 2023
Effective Automated Software Verification: A Multilayered Approach (Ph.D. Thesis Rehearsal)
Automated software verification decides correctness of a given program with respect to a given formal specification. Formal verification techniques based on model checking provide the necessary guarantees by exploring program’s behaviour exhaustively and automatically. Even though the general problem that automated software verification is trying to solve is undecidable, it is quite efficient on many instances that arise in practice. However, significant challenges related to scalability persist for complex modern programs. In our work, we argue that this task can be approached by providing solutions at different levels, which we identify as foundational, verification and cooperative layers of the problem. These correspond to decision and interpolation procedures, sequential model-checking algorithms, and multi-agent solving approaches. We further argue that working on the higher layers can significantly benefit from a deep understanding of the layers beneath them. Overall, we advance the field of automated software verification by contributing solutions on all three layers.
Andrej Pečimúth
February 28, 2023
Transformation Bisection Tool for the Graal Compiler (Review of the project)
The performance of generated machine code is closely linked to the applied optimizations. Conversely, compiler regressions often manifest as changes in the applied optimizations. We propose capturing optimization and inlining decisions performed by a compiler in each compilation unit. These decisions can be represented in the form of a tree. We introduce an approach utilizing tree matching to detect optimization differences in a semi-automated way. The presented techniques were employed to pinpoint the causes of performance problems in various benchmarks of the Graal compiler.
Michal Töpfer
December 06, 2022
Introducing Estimators—Abstraction for Easy ML Employment in Self-adaptive Architectures
In this talk, we will summarize our recent work in the area of using machine learning for self-adaptation of software systems. We focused on the task of providing application-friendly abstractions and tools that would allow the architects to focus on the application business logic rather than on the intricacies of integrating ML into the adaptation loop. We proposed ML-DEECo – an ensemble-based component model with Estimators, which can provide predictions on future and currently unobservable values in the self-adaptive system. The architect only needs to specify the inputs and outputs of the Estimator and the underlying ML model is trained automatically by our ML-DEECo runtime framework.
Milad Abdullah
November 29, 2022
Reducing Computation Costs in Performance Regression Detection
We will cover recent attempts to automatically detect performance regressions in benchmarking project such as GraalVM. The general aim is to reduce computation costs for benchmarking projects, by reducing number of benchmark runs which are not likely to provide useful information regarding existence of a performance regression.
Adam Šmelko
November 08, 2022
Noarr – C++ library for handling memory layouts and traversals of regular data structures
Jan Pacovský
October 25, 2022
Generalization of Machine-learning Adaptation in Ensemble-based Self-adaptive Systems
Filip Kliber
October 11, 2022
Fuzzing of Multithreaded Programs in .NET - Challenges and Solutions
Reasoning about programs using multiple computational threads is very difficult, because the behaviour of the program might depend on the specific interleaving of operations done by individual threads. Moreover some of the thread interleavings might exhibit a subtle bug in the application. This talk will demonstrate what challenges needs to be tackled when trying to control the thread interleaving of a .NET application in order to force some atomicity violations to manifest.
Michele Tucci
September 27, 2022
Code Coverage for Performance Testing
Test coverage measures what percentage of source code has been executed by a test suite, and it has been used for decades as a metric to assess the quality of tests. Despite its popularity in functional testing, the same quality criteria have seldom been applied to performance testing. Nonetheless, given the considerable costs associated with the design and execution of performance tests, metrics of code coverage could become crucial in the optimization of performance testing activities. In this talk, we will focus on how code coverage can be defined for performance testing, in a way that is compatible with current goals and practices in the field. We will introduce a method to compute code coverage in practice, and we will explore the challenges that arise when compared to functional testing.
Pavel Parízek
January 28, 2020
Rehearsal habilitation talk