[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to content
Publicly Available Published by De Gruyter December 6, 2022

A Time-Adaptive Space-Time FMM for the Heat Equation

  • Raphael Watschinger ORCID logo and Günther Of ORCID logo EMAIL logo

Abstract

We present a new time-adaptive FMM for a space-time boundary element method for the heat equation. The method extends the existing parabolic FMM by adding new operations that allow for an efficient treatment of tensor product meshes which are adaptive in time. We analyze the efficiency of the new operations and the approximation quality of the related kernel expansions and present numerical experiments that reveal the benefits of the new method.

MSC 2010: 65M38; 65M50; 41A10

1 Introduction

For the solution of initial boundary value problems of the transient heat equation or rather their numerical approximation there exists a variety of different methods. Among them are boundary element methods, see e.g. [1, 5, 14]. Boundary element methods rely on a reformulation of the initial boundary value problem as an integral equation on the boundary of the space-time domain, which is then discretized. Typically tensor product discretizations with a fixed spatial mesh and uniform time steps are considered. This is advantageous for several aspects. Due to the tensor product structure of the mesh and the causality of the problem, the BEM matrices are lower triangular block matrices. The uniformity in time leads to a block Toeplitz structure. Thus, only one block per time step has to be computed and stored. These properties allow to use an efficient blockwise forward elimination algorithm for the solution of the BEM systems.

However, the restriction to uniform meshes and the described solution procedure limit the range of applications. Firstly, the level of parallelism of the time-sequential forward elimination procedure is limited by the spatial components. This may be a bottleneck with modern hardware. In [7, 25] the authors have presented BEM implementations for the heat equation, which are parallel in space and time. Secondly, the concept prevents adaptivity. A huge number of uniform time steps might be necessary to resolve some local aspects with respect to time. Here a discretization which is adaptive in time can be advantageous. While local temporal aspects can be resolved much better, the total number of time steps may be much lower. This reduces the total number of degrees of freedom and the size of the system of linear equations. However, the block Toeplitz structure is lost for varying time step sizes and more blocks have to be computed and stored. This can efficiently be handled by a distributed parallelization.

A general difficulty when using standard BEM is that the occurring system matrices are dense. Several fast and data-sparse algorithms have been developed to overcome this problem. Typically these methods provide an efficient compression of the matrices and a fast realization of matrix vector products, which leads to efficient solvers. Some examples for the heat equation are algorithms based on Fourier series and FFT [10, 9], the parabolic fast multipole method (pFMM) [22, 23, 18, 19], or a fast sparse grid method [12]. In this work we present a time-adaptive FMM which stems from the pFMM. In its original form the pFMM combines the standard ideas and principles of FMM algorithms, like clustering the computational domain and approximating interactions in well-separated clusters by suitable truncated series expansions, with an efficient forward elimination algorithm.

Just like most other fast BEM heat solvers, the pFMM was developed for uniform time steps. While it can be applied for meshes with non-uniform time steps, the overall compression and performance of the pFMM will be worse in this setting than for a uniform time discretization with the same number of time intervals. In fact, there might be many blocks which are not compressed. Here we will present a new variant of the parabolic FMM which performs well for widely varying time step sizes. Similarly to the adaptive version [3, 4, 20] of the FMM for the Laplace kernel, we will introduce certain one-sided expansions with respect to time and related FMM operations. These operations are in the spirit of S2L (source to local, also called Q2L) and M2T (moment to target, also called M2P) operations, which turned out to be efficient in cases where only a few elements are contained in a cluster. The new operations enable the compression of additional matrix blocks and improve the performance of the method. The new method is described without the original forward elimination scheme to allow for parallelism in space and time as in [25].

The paper is organized as follows: In Section 2 we briefly describe the considered boundary element method for the heat equation. In Section 3 we discuss the kernel expansions of the time-adaptive FMM. Besides the original expansion used in the pFMM, which consists of a two-sided interpolation in time and an additional truncated Chebyshev expansion in space, we introduce new temporally one-sided expansions for a more efficient algorithm. We analyze the approximation errors of the different expansions and point out the advantages of the new temporally one-sided space-time expansions in adaptive situations. In Section 4 we present the new time-adaptive space-time FMM for the heat equation. It is based on the original FMM algorithm [22] for the heat equation and enhanced by new operations. To enable these new operations we modify the generation of the cluster tree in Section 4.1 by introducing additional clusters and identify the related matrix blocks in the hierarchical structure in Section 4.2. The original and new FMM operations are introduced in Sections 4.3 and 4.4, and the new time-adaptive space-time FMM algorithm is presented in Section 4.5. In Sections 4.6 and 4.7 we comment on the runtime complexity and storage requirements of its operations. Finally, we present two numerical examples which are well-suited for adaptive temporal meshes and demonstrate the gains of the new time-adaptive FMM for the heat equation.

2 Boundary Integral Equations and Discretization

We consider the initial Dirichlet boundary value problem of the transient heat equation with the heat capacity constant α > 0 for a bounded Lipschitz domain Ω 3 with a boundary Γ = Ω as a model problem:

(2.1)

t u ( 𝒙 , t ) - α Δ u ( 𝒙 , t ) = 0 for  ( 𝒙 , t ) Ω × ( 0 , T ) ,
u ( 𝒙 , 0 ) = u 0 ( 𝒙 ) for  𝒙 Ω ,
u ( 𝒙 , t ) = g ( 𝒙 , t ) for  ( 𝒙 , t ) Σ := Γ × ( 0 , T ) .

Detailed descriptions on how to solve such a problem by using boundary integral equations and related boundary element methods are provided in [5, 6]. Common approaches (direct via the first boundary integral equation or indirect) require the solution of a boundary integral equation of the form

(2.2) V q = f on  Σ .

Here q is the unknown quantity of interest, and V is the single layer boundary integral operator that has the integral representation

V q ( 𝒙 , t ) = 0 t Γ G α ( 𝒙 - 𝒚 , t - τ ) q ( 𝒚 , τ ) d 𝒔 𝒚 d τ for  ( 𝒙 , t ) Σ

with the fundamental solution of the heat equation

(2.3) G α ( 𝒙 - 𝒚 , t - τ ) = { 1 ( 4 π α ( t - τ ) ) 3 2 exp ( - | 𝒙 - 𝒚 | 2 4 α ( t - τ ) ) for  τ < t , 0 for  τ > t .

The right-hand side f in (2.2) depends on the chosen approach. In case of a direct approach

(2.4) f = ( 1 2 Id + K ) g - M 0 u 0 ,

where K is the double layer boundary integral operator, see e.g. [6, Section 4.6], and M 0 is the trace of the initial potential, see [6, Section 4.2]. In case of an indirect approach f = g - M 0 u 0 .

For the numerical approximation of (2.2) we consider a Galerkin method. As a first step we discretize the lateral boundary Σ by a space-time tensor product mesh Σ h = Γ h h . Here, Γ h = { γ j 𝒙 } j 𝒙 = 1 E 𝒙 is an admissible triangular surface mesh of Γ and h is a potentially adaptive mesh for the time interval ( 0 , T ) with grid points t j t , j t = 0 , 1 , , E t , sorted in ascending order such that the j t th temporal element is ( t j t - 1 , t j t ) . The elements in the space-time mesh Σ h are tensor products of the form σ j t , j 𝒙 = γ j 𝒙 × ( t j t - 1 , t j t ) .

We want to find a piecewise constant approximation

q h ( 𝒚 , τ ) = j t = 1 E t j 𝒙 = 1 E 𝒙 q j t , j 𝒙 φ j t , j 𝒙 0 , 0 ( 𝒚 , τ )

of q in (2.2) with basis functions φ j t , j 𝒙 0 , 0 which are one on a space-time boundary element σ j t , j 𝒙 and zero otherwise. The system of linear equations of the considered Galerkin method for this purpose is

(2.5) 𝖵 h 𝒒 = 𝒇 ,

where 𝒒 is the vector of the coefficients q j t , j 𝒙 and 𝒇 is the corresponding vector of the right-hand side. We assume that the coefficients are ordered by the temporal index j t first and then by the spatial index j 𝒙 . The entries of the Galerkin matrix 𝖵 h are given by

𝖵 h [ ( k t - 1 ) E 𝒙 + k 𝒙 , ( j t - 1 ) E 𝒙 + j 𝒙 ] = t k t - 1 t k t γ k 𝒙 t j t - 1 t j t γ j 𝒙 G α ( 𝒙 - 𝒚 , t - τ ) d 𝒔 𝒚 d τ d 𝒔 𝒙 d t

for k 𝒙 , j 𝒙 = 1 , , E 𝒙 and k t , j t = 1 , , E t . Due to the causality of the heat kernel this matrix has lower triangular block structure. More details on the discretization and computation of the matrix entries of 𝖵 h are given in [26].

Note that system (2.5) is huge and that the matrix 𝖵 h is dense except for its lower triangular block structure. Thus, a standard BEM is limited to small problems. In general a data-sparse method such as the FMM is necessary to solve large-scale problems. In the following sections we will introduce a new variant of the FMM which is well suited for the treatment of meshes that are adaptive in time.

3 Standard and Temporally One-Sided Kernel Expansions

The FMM presented in this paper is based on suitable separable approximations of the heat kernel (2.3) that combine an interpolation in time with a truncated Chebyshev expansion in space. In addition to the expansion described in [23, 17] we consider two new temporally one-sided expansions. All three expansions are introduced and analyzed in the following.

We start by fixing the notation. T m ( x ) = cos ( m arccos ( x ) ) are the Chebyshev polynomials of order m on [ - 1 , 1 ] . The Chebyshev nodes { ξ k ( m ) } k = 0 m of order m + 1 on [ - 1 , 1 ] are the roots of T m + 1 . For an interval I = [ a , b ] we define the transformed Chebyshev nodes ξ I , k ( m ) := φ I ( ξ k ( m ) ) with the affine map φ I ( x ) := a ( 1 - x ) + b ( 1 + x ) 2 from [ - 1 , 1 ] to I. Let 𝒫 m ( I ) be the space of polynomials on I whose degrees are less than or equal to m. When interpolating a function f C ( I ) by polynomials in 𝒫 m ( I ) we use the transformed Chebyshev nodes as interpolation points and obtain

I ( m ) [ f ] = k = 0 m f ( ξ I , k ( m ) ) L I , k ,

where I ( m ) : C ( I ) 𝒫 m ( I ) denotes the interpolation operator and { L I , k } k = 0 m are the Lagrange polynomials

L I , k ( t ) := j = 0 j k m t - ξ I , k ( m ) ξ I , j ( m ) - ξ I , k ( m ) .

When considering two intervals I 1 and I 2 and a function f C ( I 1 × I 2 ) we define the one-sided interpolation operator dir , I 1 ( m ) as the operator that interpolates f only in I 1 , i.e.

dir , I 1 ( m ) := I 1 ( m ) Id .

Similarly we set

dir , I 2 ( m ) := Id I 2 ( m ) .

In addition, we consider the two-sided interpolation operator I 1 × I 2 ( m ) : C ( I 1 × I 2 ) 𝒫 m ( I 1 ) 𝒫 m ( I 2 ) which interpolates f in both variables and is defined by

I 1 × I 2 ( m ) := I 1 ( m ) I 2 ( m ) .

We will use interpolation for the approximation of the heat kernel (2.3) in the temporal variables t and τ.

In the spatial variables 𝒙 and 𝒚 we will use a truncated Chebyshev expansion as in [23, 17]. For this purpose we define the transformed Chebyshev polynomials T [ a , b ] , k := T k φ [ a , b ] - 1 for an interval [ a , b ] . For a box [ 𝒂 , 𝒃 ] = j = 1 3 [ a j , b j ] 3 and a multi-index 𝜿 0 3 we define in addition the tensor products

(3.1) T [ 𝒂 , 𝒃 ] , 𝜿 ( 𝒙 ) := j = 1 3 T [ a j , b j ] , κ j ( x j ) .

Let X 1 = [ 𝒂 , 𝒃 ] and X 2 = [ 𝒄 , 𝒅 ] be two boxes in 3 . A function f C ( X 1 × X 2 ) can be represented by the Chebyshev series

(3.2) f ( 𝒙 , 𝒚 ) = 𝜿 0 3 𝝂 0 3 f 𝜿 , 𝝂 T X 1 , 𝝂 ( 𝒙 ) T X 2 , 𝜿 ( 𝒚 ) ,

where f 𝜿 , 𝝂 are the appropriate expansion coefficients that can be computed by using the orthogonality of the Chebyshev polynomials in a properly weighted inner product as in the one-dimensional case, see e.g. [21, Section 1.5]. We will specify these coefficients in the later application. By truncating the Chebyshev series we get the approximation

𝒮 X 1 × X 2 ( m ) [ f ] ( 𝒙 , 𝒚 ) = 𝜿 : | 𝜿 | m 𝝂 : | 𝜿 + 𝝂 | m f 𝜿 , 𝝂 T X 1 , 𝝂 ( 𝒙 ) T X 2 , 𝜿 ( 𝒚 ) .

For the approximation of the heat kernel (2.3) in space and time we distinguish three cases. In all three cases we consider 4D boxes Z tar = X 1 × I 1 and Z src = X 2 × I 2 , where X 1 = [ 𝒄 , 𝒄 + 2 h ~ x 𝟏 ] and X 2 = [ 𝒅 , 𝒅 + 2 h ~ x 𝟏 ] are two fixed cubes in space with edge length 2 h ~ x . We distinguish three different configurations of the time intervals I 1 and I 2 .

Case 1: I 1 and I 2 are intervals of similar length and dist ( I 1 , I 2 ) | I 1 | .

We interpolate the heat kernel in both temporal variables t and τ and expand it in both spatial variables to get the approximation

(3.3) 𝒮 X 1 × X 2 ( m 𝒙 ) I 1 × I 2 ( m t ) [ G α ] ( 𝒙 , t , 𝒚 , τ ) = a , b = 0 m t | 𝜿 + 𝝂 | m 𝒙 E 𝜿 , 𝝂 a , b T X 1 , 𝝂 ( 𝒙 ) T X 2 , 𝜿 ( 𝒚 ) L I 1 , b ( t ) L I 2 , a ( τ )

for ( 𝒙 , t ) , ( 𝒚 , τ ) Z tar × Z src . The coefficients E 𝜿 , 𝝂 a , b correspond to the expansion coefficients in (3.2) when expanding the function ( 𝒙 , 𝒚 ) G α ( 𝒙 - 𝒚 , ξ I 1 , b ( m t ) - ξ I 2 , a ( m t ) ) . They are given by

(3.4)

(3.4a) E 𝜿 , 𝝂 a , b := 1 ( 4 π α ( ξ I 1 , b ( m t ) - ξ I 2 , a ( m t ) ) ) 3 2 j = 1 3 E κ j , ν j ( r j , d a , b ) , with
(3.4b) r j := c j - d j h ~ x , d a , b := 4 α ( ξ I 1 , b ( m t ) - ξ I 2 , a ( m t ) ) h ~ x 2 ,
(3.4c) E k , ( r , d a , b ) := λ k λ l π 2 - 1 1 - 1 1 exp ( - | r + x ^ - y ^ | 2 d a , b ) T ( x ^ ) T k ( y ^ ) w ( x ^ ) w ( y ^ ) d x ^ d y ^ ,

where λ 0 = 1 , λ k = 2 for all k > 0 and w ( x ) := ( 1 - x ) - 1 2 , cf. [23, Section 5.3, p. 209]. Note that there is no closed formula known for the evaluation of the integrals in (3.4c). Therefore, we approximate them by a Gauß–Chebyshev quadrature rule in the application as proposed in [24, p. 3563] which yields

E k , ( r , d a , b ) λ k λ l ( m 𝒙 + 1 ) 2 n , m = 0 m 𝒙 exp ( - | r + ξ n ( m 𝒙 ) - ξ m ( m 𝒙 ) | 2 d a , b ) T ( ξ n ( m 𝒙 ) ) T k ( ξ m ( m 𝒙 ) ) .

Case 2: The source interval I 2 is smaller than the target interval I 1 and dist ( I 1 , I 2 ) | I 2 | .

In this case we interpolate the heat kernel only in the variable τ I 2 , but still expand it in both spatial variables to get

(3.5) 𝒮 X 1 × X 2 ( m 𝒙 ) dir , I 2 ( m t ) [ G α ] ( 𝒙 , t , 𝒚 , τ ) = a = 0 m t | 𝜿 + 𝝂 | m 𝒙 E 𝜿 , 𝝂 a , ( t ) T X 1 , 𝝂 ( 𝒙 ) T X 2 , 𝜿 ( 𝒚 ) L I 2 , a ( τ )

for ( 𝒙 , t ) , ( 𝒚 , τ ) Z tar × Z src . The expansion coefficients E 𝜿 , 𝝂 a , ( t ) that depend on t I 1 are given by

E 𝜿 , 𝝂 a , ( t ) = 1 ( 4 π α ( t - ξ I 2 , a ( m t ) ) ) 3 2 j = 1 3 E κ j , ν j ( r j , 4 α ( t - ξ I 2 , a ( m t ) ) h ~ x 2 ) ,

where r j and E κ j , ν j are the same as in (3.4). The superscript of E 𝜿 , 𝝂 a , indicates that the corresponding approximation involves an interpolation in the source interval, but does not involve an interpolation in the target interval.

Case 3: The target interval I 1 is smaller than the source interval I 2 and dist ( I 1 , I 2 ) | I 1 | .

Then we interpolate the heat kernel in t I 1 and expand it in both spatial variables to get

(3.6) 𝒮 X 1 × X 2 ( m 𝒙 ) dir , I 1 ( m t ) [ G α ] ( 𝒙 , t , 𝒚 , τ ) = b = 0 m t | 𝜿 + 𝝂 | m 𝒙 E 𝜿 , 𝝂 , b ( τ ) T X 1 , 𝝂 ( 𝒙 ) T X 2 , 𝜿 ( 𝒚 ) L I 1 , b ( t ) ,
E 𝜿 , 𝝂 , b ( τ ) = 1 ( 4 π α ( ξ I 1 , b ( m t ) - τ ) ) 3 2 j = 1 3 E κ j , ν j ( r j , 4 α ( ξ I 1 , b ( m t ) - τ ) h ~ x 2 )

for ( 𝒙 , t ) , ( 𝒚 , τ ) Z tar × Z src , where the expansion coefficients E 𝜿 , 𝝂 , b depend on τ I 2 .

In the following subsections we investigate the one- and two-sided interpolation error in time and the Chebyshev series truncation error in space for the heat kernel. These results will reveal how the edge length of the spatial boxes X 1 and X 2 and the temporal configuration of I 1 and I 2 influence the approximation error in the three cases described above. We will use this knowledge in the construction of the new time-adaptive FMM.

3.1 Analysis of the Interpolation in Time

The approximation error of an interpolation of the heat kernel ( ( 𝒙 , t ) , ( 𝒚 , τ ) ) G α ( 𝒙 - 𝒚 , t - τ ) in two separated time intervals decays exponentially with increasing interpolation degree, see e.g. [22, Section 3.1] or [17, p. 73]. We formulate this result in Theorem 1. In Theorem 2 we consider the case of two separated time intervals of different lengths and investigate the error of an interpolation in the shorter time interval only. We will compare the resulting estimates and use them to motivate the one-sided temporal expansions in Cases 2 and 3 above.

To simplify the notation we fix the heat capacity constant α and the spatial points 𝒙 and 𝒚 , and set

(3.7) g ( t 1 , t 2 ) := G α ( 𝒙 - 𝒚 , t 1 - t 2 ) = 1 ( 4 π α ( t 1 - t 2 ) ) 3 2 exp ( - r 2 4 α ( t 1 - t 2 ) )

for all t 1 > t 2 and r 2 = | 𝒙 - 𝒚 | 2 . The following property of g, which is often denoted as asymptotic smoothness (see e.g. [11, Definition 4.14]), will play a central role in the proofs of the interpolation error estimates.

Proposition 1 (cf. [17, p. 73]).

Let g be the kernel defined in (3.7). There exists a constant c as such that

(3.8) | t 1 k t 2 g ( t 1 , t 2 ) | c as ( k + ) ! ( k + + 1 ) 3 2 α 3 2 ( t 1 - t 2 ) k + + 3 2 for all  k , 0 and all  t 1 > t 2 .

Proof.

The proof follows the lines of [17, proof of Lemma 4.1] where an analogous result is shown for a similar function. Since g depends only on the difference of the variables t 1 and t 2 it suffices to bound | t 1 n g ( t 1 , 0 ) | for all t 1 > 0 to show (3.8). For n = 0 the estimate in (3.8) is clearly satisfied for all constants c as ( 4 π ) - 3 2 . For the case n 1 we consider the function z g ~ ( z ) := g ( z , 0 ) , which is holomorphic on 0 . Hence, we can use Cauchy’s integral formula to get

d n d z n g ~ ( t 1 ) = n ! 2 π i B ( t 1 , δ ) g ~ ( ζ ) ( ζ - t 1 ) ( n + 1 ) d ζ

for all t 1 > 0 and δ < t 1 , where B ( t 1 , δ ) is the disc with center t 1 and radius δ in . For all ζ B ( t 1 , δ ) there holds | ζ | ( ζ ) > ( t 1 - δ ) and ( - r 2 4 α ζ ) < 0 , so it follows that

| g ~ ( ζ ) | = 1 ( 4 π α ) 3 2 | 1 ζ 3 2 exp ( - r 2 4 α ζ ) | 1 ( 4 π α ) 3 2 1 ( t 1 - δ ) 3 2 .

Therefore, we can estimate

| d n d z n g ~ ( t 1 ) | n ! 2 π B ( t 1 , δ ) | g ~ ( ζ ) | | ζ - t 1 | ( n + 1 ) d ζ
n ! 2 π B ( t 1 , δ ) 1 ( 4 π α ( t 1 - δ ) ) 3 2 δ ( n + 1 ) d ζ = n ! ( 4 π α ( t 1 - δ ) ) 3 2 δ n .

The optimal radius δ to minimize this bound is δ = 2 n t 1 2 n + 3 , for which we get

| d n d z n g ~ ( t 1 ) | n ! ( 4 π α ) 3 2 ( 2 n + 3 3 ) 3 2 ( 1 + 3 2 n ) n 1 t 1 n + 3 2 e 3 2 ( 4 π ) 3 2 n ! ( n + 1 ) 3 2 α 3 2 t 1 n + 3 2 .

This is the desired estimate for all n 1 with c as = e 3 2 / ( 4 π ) 3 2 . ∎

Before we proceed with the main results of this section, we recall the definition of the Lebesgue constant

(3.9) Λ m := sup u C ( J ) { 0 } J ( m ) [ u ] , J u , J ,

i.e. the operator norm of the interpolation operator J ( m ) : C ( J ) 𝒫 m ( J ) . Note that Λ m does not depend on the particular interval J as long as the same interpolation points – up to an affine transformation – are used. In this work we always use transformed Chebyshev nodes as interpolation points. Therefore, we can estimate (cf. [21, Theorem 1.2])

(3.10) Λ m 2 π log ( m + 1 ) + 1 .

Theorem 1 (Two-Sided Interpolation Error, cf. [17, p. 73]).

Let η 2 > 0 and q 2 := 1 + 3 2 η 2 , and let I 1 = [ a 1 , b 1 ] and I 2 = [ a 2 , b 2 ] be two non-empty intervals such that a 1 > b 2 . Let the admissibility criterion

(3.11) η 2 dist ( I 1 , I 2 ) max { | I 1 | , | I 2 | }

be satisfied. Then there exists a constant c 2 > 0 such that

(3.12) g - I 1 × I 2 ( m ) [ g ] , I 1 × I 2 c 2 ( α dist ( I 1 , I 2 ) ) 3 2 q 2 - ( m + 1 ) .

Proof.

According to [2, Corollary 4.21] it suffices to show that there exist C g 0 , γ g > 0 and σ such that

(3.13) t k n g , I 1 × I 2 C g γ g n ( n + σ - 1 ) ! ( σ - 1 ) ! for all  n 0  and  k { 1 , 2 }

to conclude that

(3.14) g - I 1 × I 2 ( m ) [ g ] , I 1 × I 2 4 e C g ( Λ m + 1 ) 2 ( m + 2 ) σ ( 1 + max { | I 1 | , | I 2 | } γ g ) ϱ ( 2 γ g max { | I 1 | , | I 2 | } ) - ( m + 1 ) ,

where ϱ ( r ) := r + 1 + r 2 . From estimate (3.8) and the admissibility criterion (3.11) it follows that

t k n g , I 1 × I 2 max ( t 1 , t 2 ) I 1 × I 2 c as n ! ( n + 1 ) 3 2 α 3 2 ( t 1 - t 2 ) n + 3 2 = c as n ! ( n + 1 ) 3 2 α 3 2 dist ( I 1 , I 2 ) n + 3 2
c as n ! c β β n α 3 2 dist ( I 1 , I 2 ) 3 2 ( η 2 max { | I 1 | , | I 2 | } ) n ,

where β > 1 is arbitrary and c β is chosen such that ( n + 1 ) 3 2 c β β n for all n 0 . Thus, (3.13) is satisfied for the choice σ = 1 , γ g = max { | I 1 | , | I 2 | } / ( η 2 β ) and C g = c as c β / ( α dist ( I 1 , I 2 ) ) 3 2 . Since c β as β 1 , we fix β = 4 3 and a proper constant c β . Inserting these choices of σ, γ g and C g into (3.14) yields

g - I 1 × I 2 ( m ) [ g ] , I 1 × I 2 4 e c as c β ( Λ m + 1 ) 2 ( m + 2 ) ( α dist ( I 1 , I 2 ) ) 3 2 ( 1 + 4 3 η 2 ) ϱ ( 3 2 η 2 ) - ( m + 1 ) .

Similarly as in [2, Remark 4.23] we want to simplify this estimate. We set q 2 = 1 + 3 2 η 2 , ϱ 2 = ϱ ( 3 2 η 2 ) ) and μ 2 = ϱ 2 q 2 . Since ϱ ( r ) > 1 + r for all r , we see that μ 2 > 1 . Thus, we can rewrite the right-hand side of the above estimate as

4 e c as c β ( Λ m + 1 ) 2 ( m + 2 ) ( 1 + 4 η 2 3 ) μ 2 m + 1 1 ( α dist ( I 1 , I 2 ) ) 3 2 q 2 - ( m + 1 ) .

The first fraction is a zero sequence in m and can thus be bounded by a constant c 2 . In fact, its numerator grows like log ( m + 1 ) 2 ( m + 2 ) due to (3.10), while its denominator grows exponentially with increasing m. Therefore, we end up with the estimate in (3.12). ∎

Theorem 2 (One-Sided Interpolation Error).

Let η 1 > 0 and q 1 := 1 + 3 2 η 1 , and let I 1 = [ a 1 , b 1 ] and I 2 = [ a 2 , b 2 ] be two non-empty intervals such that a 1 > b 2 . Let the admissibility criterion

(3.15) η 1 dist ( I 1 , I 2 ) min { | I 1 | , | I 2 | }

be satisfied. Let k be such that I k is the shorter time interval. Then there exists a constant c 1 > 0 such that

(3.16) g - dir , I k ( m ) [ g ] , I 1 × I 2 c 1 ( α dist ( I 1 , I 2 ) ) 3 2 q 1 - ( m + 1 ) .

Proof.

The proof is very similar to the proof of Theorem 1. This time we apply [2, Lemma 4.19] which leads to the estimate

g - dir , I k ( m ) [ g ] , I 1 × I 2 2 e C g ( Λ m + 1 ) ( m + 2 ) σ ( 1 + | I k | γ g ) ϱ ( 2 γ g | I k | ) - ( m + 1 ) ,

with the same constants σ, γ g and C g and function ϱ as in the proof of Theorem 1. With the admissibility criterion (3.15) we get

g - dir , I k ( m ) [ g ] , I 1 × I 2 2 e C g ( Λ m + 1 ) ( m + 2 ) σ ( 1 + 4 3 η 1 ) ϱ ( 3 2 η 1 ) - ( m + 1 ) ,

which can be simplified in the same way as the corresponding result in the proof of Theorem 1 to show (3.16). ∎

Remark 1.

The results in Theorems 1 and 2 regarding the two- and one-sided interpolation errors are very similar. In both situations the error decreases exponentially with increasing interpolation degree m. The convergence rate depends on the constants η 2 and η 1 from the admissibility criteria (3.11) and (3.15), respectively, and suffers if they become large. For two fixed, separate intervals I 1 , I 2 we can choose η 2 = max { | I 1 | , | I 2 } / dist ( I 1 , I 2 ) and η 1 = min { | I 1 | , | I 2 | } / dist ( I 1 , I 2 ) as the smallest constants for which (3.11) and (3.15) hold. Hence, in all situations where max { | I 1 | , | I 2 | } / dist ( I 1 , I 2 ) is considerably larger than min { | I 1 | , | I 2 | } / dist ( I 1 , I 2 ) the one-sided interpolation in the shorter time interval is expected to perform significantly better than the two-sided interpolation.

3.2 Analysis of the Expansion in Space

The approximation error for the truncated Chebyshev expansion of the heat kernel for a fixed temporal setting has been investigated in [23, 17]. Here we collect and slightly improve those results. We start by citing an estimate of the absolute value of the expansion coefficients defined in (3.4c).

Lemma 1 (cf. [23, Lemma 1]).

For the coefficients E k , ( r , δ ) in (3.4c) there holds

(3.17) | E k , ( r , δ ) | λ k λ a k + exp ( 1 δ ( a - 1 a ) 2 ) ,

where a R > 0 can be chosen arbitrarily.

Remark 2.

The estimate in (3.17) is slightly sharper than the one in the referenced paper where an additional multiplicative factor 4 is included in the argument of the exponential function. This factor can be dropped in the last step of the proof in [23], see [15, Satz 4.13].

Theorem 3 (Chebyshev Series Truncation Error, cf. [23, p. 209] and [17, Corollary 4.1]).

Let c ~ st R > 0 . Let X 1 and X 2 be two cubes with edge length 2 h ~ x . Let t , τ R such that t > τ and

(3.18) 4 α ( t - τ ) h ~ x 2 c ~ st .

Let the function σ : R > 0 R be defined by

(3.19) σ ( s ) := 1 2 ( ln ( s + 1 + s 2 ) - s 2 + 1 - 1 s ) .

Then there exists a constant c 𝐱 > 0 such that

(3.20) ( Id - 𝒮 X 1 × X 2 ( m 𝒙 ) ) [ G α ( 𝒙 - 𝒚 , t - τ ) ] , X 1 × X 2 c 𝒙 ( m 𝒙 + 2 ) 5 ( α ( t - τ ) ) 3 2 exp ( - ( m 𝒙 + 1 ) σ ( ( m 𝒙 + 1 ) c ~ st 12 ) ) .

Proof.

The proof is based on the proof of [17, Corollary 4.1] and the idea to minimize the right-hand side of (3.17) in [24, p. 3551], where the function κ corresponds to σ here. To estimate the error we represent it as the remainder of the Chebyshev series, namely

( Id - 𝒮 X 1 × X 2 ( m 𝒙 ) ) [ G α ( 𝒙 - 𝒚 , t - τ ) ] ( 𝒙 , 𝒚 ) = n = m 𝒙 + 1 | 𝜿 + 𝝂 | = n E 𝜿 , 𝝂 - ( t , τ ) T X 1 , 𝝂 ( 𝒙 ) T X 2 , 𝜿 ( 𝒚 ) ,

where

E 𝜿 , 𝝂 - ( t , τ ) = ( 4 π α ( t - τ ) ) - 3 2 j = 1 3 E κ j , ν j ( r j , δ )

with δ = 4 α ( t - τ ) / ( h ~ x 2 ) and r j and E κ j , ν j ( r j , δ ) from (3.4). The polynomials T X 1 , 𝝂 and T X 2 , 𝜿 are tensor products of transformed Chebyshev polynomials as defined in (3.1) and, thus, there holds | T X 1 , 𝝂 ( 𝒙 ) | 1 for all 𝒙 X 1 and all 𝝂 0 3 and the same for T X 2 , 𝜿 in X 2 . Hence, we can estimate

(3.21) ( Id - 𝒮 X 1 × X 2 ( m 𝒙 ) ) [ G α ( 𝒙 - 𝒚 , t - τ ) ] , X 1 × X 2 n = m 𝒙 + 1 | 𝜿 + 𝝂 | = n | E 𝜿 , 𝝂 - ( t , τ ) | .

For a given n and 𝜿 , 𝝂 0 3 such that | 𝜿 + 𝝂 | = n the estimate in (3.17) yields

( 4 π α ( t - τ ) ) 3 2 | E 𝜿 , 𝝂 - ( t , τ ) | exp ( 1 δ ( a n - 1 a n ) 2 ) 3 j = 1 3 λ κ j λ ν j a n κ j + ν j 64 a n n exp ( 3 δ ( a n - 1 a n ) 2 ) ,

where a n > 0 can be chosen arbitrarily. To find the optimal a n , we minimize the function f n on the right-hand side, which can be rewritten in the form

f n ( a n ) = 64 exp ( - n ln ( a n ) + 3 δ ( a n - 1 a n ) 2 ) .

From the necessary condition f n ( a n ) = 0 we obtain

a n = n δ 12 + 1 + ( n δ 12 ) 2 ,

which is indeed a minimizer of f n in > 0 since f n becomes unbounded for a n tending to zero or infinity. Therefore, the minimum of f n is given by

f n ( a n ) = 64 exp ( - n ( ln ( a n ) - 12 4 n δ ( a n - 1 a n ) 2 ) ) = 64 exp ( - n σ ( n δ 12 ) )

with the function σ from (3.19) and we obtain

| E 𝜿 , 𝝂 - ( t , τ ) | 64 ( 4 π α ( t - τ ) ) 3 2 exp ( - n σ ( n δ 12 ) ) .

By using this estimate and # { ( 𝜿 , 𝝂 ) 0 3 × 0 3 : | 𝜿 + 𝝂 | = n } = ( n + 5 5 ) , we can further estimate the approximation error in (3.21) by

( Id - 𝒮 X 1 × X 2 ( m 𝒙 ) ) [ G α ( 𝒙 - 𝒚 , t - τ ) ] , X 1 × X 2 64 ( 4 π α ( t - τ ) ) 3 2 n = m 𝒙 + 1 ( n + 5 5 ) exp ( - n σ ( n δ 12 ) ) .

Standard arguments of calculus show that the function σ is monotonically increasing. Hence, we can use (3.18), which corresponds to the estimate δ = 4 α ( t - τ ) / h ~ x 2 c ~ st , to get for all n > m 𝒙 ,

exp ( - n σ ( n δ 12 ) ) exp ( - n σ ( ( m 𝒙 + 1 ) c ~ st 12 ) ) = q n , where  q := exp ( - σ ( ( m 𝒙 + 1 ) c ~ st 12 ) ) .

The series n = m 𝒙 + 1 5 ! ( n + 5 5 ) q n is the remainder of the fifth derivative of the geometric series n = 0 q n truncated after ( m 𝒙 + 1 ) terms. It can be estimated by

n = m 𝒙 + 1 5 ! ( n + 5 5 ) q n 5 ! ( m 𝒙 + 2 ) 5 ( 1 - q ) 6 q m 𝒙 + 1 .

The variable q converges monotonically to 0 as m 𝒙 so we can bound ( 1 - q ) - 6 by a constant c for all m 𝒙 0 . We conclude

( Id - 𝒮 X 1 × X 2 ( m 𝒙 ) ) [ G α ( 𝒙 - 𝒚 , t - τ ) ] , X 1 × X 2 64 ( 4 π α ( t - τ ) ) 3 2 c ( m 𝒙 + 2 ) 5 q m 𝒙 + 1

which is (3.20) with c 𝒙 = ( 64 c ) ( 4 π ) - 3 2 . ∎

Remark 3.

The behavior of σ in (3.19) determines the approximation error in (3.20). In the proof of Theorem 3 we have already pointed out that σ is monotonically increasing. Furthermore, there holds σ ( s ) s 4 for s 0 and σ ( s ) ln ( 2 s ) 2 for s (see [24, p. 3551]). Thus, the convergence in (3.20) is super-exponential in m 𝒙 . Note that the approximation quality depends on the constant c ~ st in (3.18), which appears in the argument of σ. For small values of c ~ st the estimate suffers. Thus, we have to ensure that (3.18) holds for a reasonably large constant c ~ st in the later application. This means that we have to adapt the size of the cubes to the temporal configuration.

3.3 The Space-Time Approximation Error

By combining the results from the previous two sections we can estimate the approximation error of the expansions (3.3), (3.5) and (3.6) in Z tar × Z src .

Theorem 4 (Full Space-Time Expansion Error, [17, cf. Lemma 7.4]).

Let c ~ st , η 2 R > 0 and q 2 := 1 + 3 2 η 2 . Let Λ m t be the Lebesgue constant defined in (3.9) and let σ be the function in (3.19). Let I 1 = [ a 1 , b 1 ] and I 2 = [ a 2 , b 2 ] be two non-empty intervals with a 1 > b 2 that satisfy the admissibility criterion (3.11). Let X 1 and X 2 be two cubes in R 3 with edge length 2 h ~ x such that

(3.22) 4 α dist ( I 1 , I 2 ) h ~ x 2 c ~ st .

Let Z tar = X 1 × I 1 and Z src = X 2 × I 2 . Then there exist constants c 2 , c 𝐱 R > 0 such that

(3.23)

( Id - 𝒮 X 1 × X 2 ( m 𝒙 ) I 1 × I 2 ( m t ) ) [ G α ] , Z tar × Z src
1 ( α dist ( I 1 , I 2 ) ) 3 2 ( c 2 q 2 - ( m t + 1 ) + c 𝒙 Λ m t 2 ( m 𝒙 + 2 ) 5 exp ( - ( m 𝒙 + 1 ) σ ( ( m 𝒙 + 1 ) c ~ st 12 ) ) ) .

Proof.

Throughout this proof we use to denote , Z tar × Z src to shorten the notation. We start to estimate the approximation error by adding and subtracting I 1 × I 2 ( m t ) [ G α ] and using the triangle inequality to get

( Id - 𝒮 X 1 × X 2 ( m 𝒙 ) I 1 × I 2 ( m t ) ) [ G α ] ( Id - I 1 × I 2 ( m t ) ) [ G α ] + ( Id - 𝒮 X 1 × X 2 ( m 𝒙 ) ) I 1 × I 2 ( m t ) [ G α ] .

The first term can be further estimated by using Theorem 1. In fact, the function g defined in (3.7) which is considered in that theorem is just the heat kernel for fixed spatial points 𝒙 and 𝒚 and the estimate does not depend on these points. Thus,

( Id - I 1 × I 2 ( m t ) ) [ G α ] c 2 ( α dist ( I 1 , I 2 ) ) 3 2 q 2 - ( m t + 1 )

with c 2 from Theorem 1. For the second term we observe that

( Id - 𝒮 X 1 × X 2 ( m 𝒙 ) ) I 1 × I 2 ( m t ) [ G α ] sup f C ( Z tar × Z src ) { 0 } I 1 × I 2 ( m t ) [ f ] f ( Id - 𝒮 X 1 × X 2 ( m 𝒙 ) ) [ G α ] ,

where we used that the operators 𝒮 X 1 × X 2 ( m 𝒙 ) and I 1 × I 2 ( m t ) commute. The operator norm of I 1 × I 2 ( m t ) is bounded by Λ m t 2 , which follows from (3.9). For the other term we use assumption (3.22) and apply Theorem 3 to get

( Id - 𝒮 X 1 × X 2 ( m 𝒙 ) ) [ G α ] sup ( t , τ ) I 1 × I 2 { c 𝒙 ( m 𝒙 + 2 ) 5 ( α ( t - τ ) ) 3 2 exp ( - ( m 𝒙 + 1 ) σ ( ( m 𝒙 + 1 ) c ~ st 12 ) ) }
c 𝒙 ( m 𝒙 + 2 ) 5 ( α dist ( I 1 , I 2 ) ) 3 2 exp ( - ( m 𝒙 + 1 ) σ ( ( m 𝒙 + 1 ) c ~ st 12 ) ) .

Combining all these estimates completes the proof. ∎

By using the result from Theorem 2 one can prove the following approximation error estimates for the expansions (3.5) and (3.6) in an analogous way.

Theorem 5 (Temporally One-Sided Space-Time Expansion Errors).

Let c ~ st , η 1 R > 0 and set q 1 := 1 + 3 2 η 1 . Let I 1 = [ a 1 , b 1 ] and I 2 = [ a 2 , b 2 ] be two non-empty intervals with a 1 > b 2 that satisfy the one-sided admissibility criterion (3.15). Let X 1 and X 2 be two cubes in R 3 with edge length 2 h ~ x such that criterion (3.22) holds. Let Z tar = X 1 × I 1 and Z src = X 2 × I 2 . Let k { 1 , 2 } be such that I k is the shorter time interval. Then there exist constants c 1 , c 𝐱 R > 0 such that

(3.24)

( Id - 𝒮 X 1 × X 2 ( m 𝒙 ) dir , I k ( m t ) ) [ G α ] , Z tar × Z src
1 ( α dist ( I 1 , I 2 ) ) 3 2 ( c 1 q 1 - ( m t + 1 ) + c 𝒙 Λ m t ( m 𝒙 + 2 ) 5 exp ( - ( m 𝒙 + 1 ) σ ( ( m 𝒙 + 1 ) c ~ st 12 ) ) ) .

Theorems 4 and 5 show that we can control the approximation error of the expansions in (3.3), (3.5) and (3.6) if we can control the respective errors in time and space separately. The temporally single sided expansions (3.5) and (3.6) are favorable in those situations, where the one-sided interpolation performs better than the two-sided interpolation, see Remark 1. The spatial sizes of the boxes Z tar and Z src have to be adapted to the distance of their temporal components according to the criterion (3.22) for all three expansions.

Note that in the previous works [25, 22, 23, 17] a criterion different from (3.22) is used to determine the proper spatial size of the boxes for the expansion (3.3). There it is required that there exists a constant c st such that

(3.25) h ~ x 2 4 α h ~ t c st

holds for each space-time box, where h ~ x denotes the half length of the edges of its spatial component and h ~ t the half length of its temporal interval. However, the criterion (3.22) follows from (3.25) for two boxes Z tar = X 1 × I 1 and Z src = X 2 × I 2 if the admissibility criterion (3.11) holds in addition. Indeed,

4 α dist ( I 1 , I 2 ) h ~ x 2 4 α max { | I 1 | , | I 2 | } h ~ x 2 η 2 2 c st η 2 ,

which is (3.22) with c ~ s t = 2 ( c st η 2 ) - 1 . In the setting of the temporally one-sided expansions (3.5) and (3.6) the situation is similar. If criterion (3.25) is satisfied for the space-time cluster with the smaller time interval, the spatial sizes of the two boxes coincide, and the admissibility criterion (3.15) is satisfied, then (3.22) follows as above. Therefore, we will use the criterion (3.25) instead of (3.22) in the following sections to allow for an easier comparison of this paper with previous ones.

4 A Time-Adaptive Space-Time FMM Algorithm

In this section we describe a new variant of the FMM presented in [25] and [22, 23, 17] which improves the performance for meshes which are adaptive in time. Throughout this section we focus on the matrix 𝖵 h , but other BEM matrices of the heat equation like the double layer operator 𝖪 h can be handled as well with slight modifications. In a first step, we create a hierarchy of space-time boxes that subdivides the elements in the space-time tensor product mesh Σ h . This hierarchy, which we denote as box cluster tree, will allow us to find a suitable partition of the BEM matrices, which forms the backbone of our FMM.

4.1 A Four-Dimensional Space-Time Box Cluster Tree

The following construction of space-time box cluster trees is similar to the one which we described in [25]. The idea is to create a hierarchy of boxes that partition the space-time tensor product mesh Σ h by recursively subdividing an initial box Z ( 0 ) that contains the whole mesh. Here we introduce new purely spatial subdivisions of certain boxes that will enable adaptive FMM operations in the later algorithm. Note that from here on we consider half-open boxes Z = ( 𝒂 , 𝒃 ] × ( c , d ] 4 , since they allow for a simple, non-overlapping partition into smaller half-open boxes.

A space-time boundary element σ j t , j 𝒙 = γ j 𝒙 × ( t j t - 1 , t j t ) of a given mesh Σ h is assigned to a box Z = X × I 4 if its geometrical center is contained in Z. By Z ^ we denote the set of all indices ( j t , j 𝒙 ) corresponding to elements σ j t , j 𝒙 Σ h that are assigned to Z. We use the symbol # Z ^ for the cardinality of Z ^ and additionally introduce n t ( Z ^ ) as the number of all distinct time-indices in Z ^ . For example, if Z is a box with n t ( Z ^ ) = 1 , all space-time boundary elements that are assigned to Z share the same temporal component. Such boxes play a central role in the new time-adaptive FMM which motivates the following definition.

Definition 1.

A box Z 𝒯 Σ is called temporally indivisible if n t ( Z ^ ) = 1 .

Algorithm 1 (Construction of a 4D space-time box cluster tree T Σ for Σ h .).

(4.1)

Algorithm 1 describes the recursive construction of a space-time box cluster tree 𝒯 Σ . The concrete subdivision steps in line 8, 10 and 12 for a box Z = ( 𝒂 , 𝒃 ] × ( c , d ] are executed as follows:

  1. Purely temporal subdivision (line 8): We split the time interval ( c , d ] into ( c , c ~ ] and ( c ~ , d ] , where c ~ is the grid point t j of the time partition closest to ( c + d ) / 2 . Then we subdivide Z into the two boxes Z 1 = ( 𝒂 , 𝒃 ] × ( c , c ~ ] and Z 2 = ( 𝒂 , 𝒃 ] × ( c ~ , d ] .

  2. Purely spatial subdivision (line 12): The box ( 𝒂 , 𝒃 ] is uniformly split into eight boxes ( 𝒂 , 𝒂 ~ ] , , ( 𝒂 ~ , 𝒃 ] , where 𝒂 ~ = 1 2 ( 𝒂 + 𝒃 ) is the center of ( 𝒂 , 𝒃 ] . By combining each of these spatial boxes with the whole time interval ( c , d ] one obtains a spatial subdivision of Z.

  3. Space-time subdivision (line 10): We subdivide the time interval as in (a) and the spatial box as in (b). The sixteen children of Z are obtained by considering all possible combinations of the resulting time intervals and spatial boxes.

In general, we switch between purely temporal and space-time subdivisions of boxes in the tree construction due to criterion (3.25), at least after some initial, purely temporal subdivisions. A purely spatial subdivision of a cluster is only executed if it cannot be further subdivided in time. Note that such a cluster is not subdivided anymore in the standard algorithm in [25]. The additional spatial subdivisions here will allow for some new efficient operations in the FMM.

By construction, the temporal part of a space-time boundary element σ j t , j 𝒙 is always fully contained in a box Z if the element is assigned to this box. To guarantee the same property in space, we pad the spatial boxes in a uniform way. More details about this padding and the handling of other exceptional situations are given in [25].

In the rest of the paper we will use the terms box and cluster interchangeably when we speak about boxes in the tree 𝒯 Σ . For a box Z 𝒯 Σ we denote its parent in the tree by par ( Z ) and the set of all its children by child ( Z ) . The temporal level t ( Z ) of Z is defined as the number of performed temporal subdivisions starting from the root to obtain Z and the spatial level x ( Z ) as the number of spatial subdivisions. Due to our subdivision strategy the temporal level t ( Z ) of a box corresponds to its actual level ( level ( Z ) ) in the tree, unless Z is obtained from a purely spatial subdivision of its parent. The largest level in the tree 𝒯 Σ is denoted by depth ( 𝒯 Σ ) .

4.2 Matrix Partitioning Using Interaction Lists

The boxes in the space-time box cluster tree 𝒯 Σ are used to partition the BEM matrix 𝖵 h . For two given boxes Z tar = X tar × I tar and Z src = X src × I src in 𝒯 Σ let 𝖵 h | Z ^ tar × Z ^ src be the block of 𝖵 h formed by the rows corresponding to indices ( j t , j 𝒙 ) Z ^ tar and the columns corresponding to indices ( k t , k 𝒙 ) Z ^ src . Let Z ( 0 ) be the root of 𝒯 Σ . The block 𝖵 h | Z ^ ( 0 ) × Z ^ ( 0 ) corresponds to the full matrix 𝖵 h . We can subdivide it into n c 2 blocks 𝖵 h | Z ^ i ( 1 ) × Z ^ j ( 1 ) , where { Z j ( 1 ) } j = 1 n c are the children of Z ( 0 ) in 𝒯 Σ . For the FMM we recursively subdivide the matrix into blocks in this manner. The subdivision is stopped for a block 𝖵 h | Z ^ tar × Z ^ src if Z tar and Z src allow for one of the kernel approximations in (3.3), (3.5) or (3.6) due to a proper temporal separation. Such a block is called admissible. We will see in Section 4.3 how to approximate admissible blocks of 𝖵 h efficiently. All other blocks are called inadmissible and are further subdivided as long as Z tar and Z src are not both temporally indivisible. The preferred kernel approximation is (3.3), which is why both the source and target boxes are subdivided for a typical block refinement, if these box subdivisions include a temporal subdivision. If either the source box or the target box is temporally indivisible (recall Definition 1), the respective other box is further subdivided to detect admissible blocks related to the kernel approximations (3.5) or (3.6). These one-sided subdivisions are new compared to the previous works [25, 23, 17] in the setting of the heat equation and inspired by the adaptive FMM in [3, 4, 20]. In the following we describe the full partitioning scheme in more detail.

In Theorem 4 we have shown that the full expansion (3.3) is well-suited for Z tar and Z src if the temporal intervals I tar and I src satisfy the standard admissibility criterion (3.11) for a given constant η 2 > 0 . This is checked explicitly during the recursive construction of the blocks. For the temporally one-sided expansions (3.5) and (3.6) we have to explicitly check that the criterion (3.15) is satisfied for a given η 1 > 0 . The criterion (3.25) will be satisfied in all three cases for a suitable constant c st > 0 due to the construction of boxes of the space-time box cluster tree 𝒯 Σ in Algorithm 1 and the subdivision of blocks, which we will describe in more detail in Algorithm 2.

The partitioning of 𝖵 h is further influenced by the causality and the exponential decay of the heat kernel G α in (2.3). Due to the causality, G α ( 𝒙 - 𝒚 , t - τ ) vanishes for all ( 𝒙 , t ) Z tar and ( 𝒚 , τ ) Z src with I tar = ( a 1 , b 1 ] and I src = ( a 2 , b 2 ] if a 2 b 1 . We say that I src is causally relevant for I tar in the contrary case, i.e. if a 2 < b 1 . If I src is not causally relevant for I tar , all entries of the block 𝖵 h | Z ^ tar × Z ^ src are zero and we can ignore it. This is related to the lower triangular block structure of 𝖵 h .

The exponential decay of the heat kernel allows to neglect interactions between boxes that are sufficiently separated in space, which we identify as follows. Let Z tar be a box in 𝒯 Σ with spatial level 𝒙 = 𝒙 ( Z ) . Due to the uniform subdivision strategy employed in the construction of 𝒯 Σ , X tar is one of 8 𝒙 possible boxes in a regular grid 𝒢 𝒙 . We can label the boxes in this grid with multi-indices in { 0 , , 2 𝒙 - 1 } 3 in a regular way (ascending componentwise) and use these indices to measure the distance of boxes. For two spatial boxes X and Y in 𝒢 𝒙 with corresponding multi-indices 𝝃 and 𝜻 we define the grid distance of X and Y by

(4.2) griddist ( X , Y ) := max j { | ξ j - ζ j | } .

For a fixed truncation parameter n tr we define the interaction area 𝒜 ( X ) of X in 𝒢 𝒙 by

(4.3) 𝒜 ( X ) := { Y 𝒢 𝒙 : griddist ( X , Y ) n tr } .

In the partitioning of 𝖵 h we will ignore blocks 𝖵 h | Z ^ tar × Z ^ src if X src 𝒜 ( X tar ) . From [23, Section 5.4] it follows that a single, properly chosen truncation parameter n tr can be used for the definition of the interaction area and the related truncation for various pairs of clusters Z tar and Z src . A suitable requirement is that the ratio h ~ 𝒙 2 / max { | I tar | , | I src | } of these clusters is similar, where h ~ 𝒙 denotes the spatial half length of Z tar and Z src . This will be the case in the partitioning of 𝖵 h for clusters Z src and Z tar in the part of 𝒯 Σ where we alternate between purely temporal and space-time refinements.

To keep track of the blocks 𝖵 h | Z ^ tar × Z ^ src in the final partition of 𝖵 h , we introduce four different interaction lists for clusters Z tar in 𝒯 Σ . The standard M2L interaction list M2L ( Z tar ) will include clusters Z src such that the full space-time expansion (3.3) is admissible for Z tar and Z src but not for their parents. For the new temporally one-sided expansion (3.5) in the source interval we introduce the M2Lx interaction list M2Lx ( Z tar ) and for the temporally one-sided expansion (3.6) in the target interval the Mx2L interaction list Mx2L ( Z tar ) . The nomenclature for these lists is related to the associated FMM operations, which we introduce in Section 4.3. The last list is the nearfield 𝒩 ( Z tar ) which contains clusters Z src corresponding to blocks 𝖵 h | Z ^ tar × Z ^ src which are inadmissible and cannot be subdivided further. For a given cluster Z tar some or all of these lists might be empty. The construction of all lists of all clusters in 𝒯 Σ is done during the recursive subdivision of blocks of 𝖵 h which we described above and present in full detail in Algorithms 2 and 3.

Algorithm 2 (Recursive construction of the interaction lists in T Σ .).

(4.4)

The main routine DetermineOperationLists in Algorithm 2 realizes the recursive subdivision of blocks. If the spatial component X src is not in the interaction area 𝒜 ( X tar ) of X tar , the corresponding block can be neglected and the subdivision is stopped (line 3). Otherwise it is checked whether the admissibility criterion (3.11) is satisfied (line 4) in which case Z src is added to the proper interaction list M2L ( Z tar ) . In the contrary case, we want to subdivide the block.

The desired subdivision strategy for an inadmissible block is to consider all pairs of children of Z src and Z tar , check whether the corresponding blocks are non-zero due to causality (line 10), and call the routine DetermineOperationLists for the children if this is the case (line 11). Leaf clusters and temporally indivisible clusters, whose children are obtained by purely spatial subdivisions, are treated separately. If only one of the clusters is temporally indivisible and the other is not a leaf, we can still subdivide the block and check if the resulting blocks are admissible. To motivate this, let e.g. Z src be temporally indivisible and Z tar be neither a leaf nor temporally indivisible. We can consider the blocks of 𝖵 h corresponding to Z src and Z tar , c child ( Z tar ) for which a temporally one-sided expansion (3.6) instead of the full expansion (3.3) may be admissible, since the children of Z tar are subdivided with respect to time. Thus, such clusters are treated separately in the subroutine DetermineMx2LAndNearfieldLists in line 15, which is described in Algorithm 3. Similarly, we treat temporally indivisible clusters Z tar and non-leaf clusters Z src which are not temporally indivisible separately in the subroutine DetermineM2LxAndNearfieldLists in line 22, see Algorithm 3 as well. In all other cases, further subdivisions of the inadmissible block are not possible or would not yield efficiently approximable sub-blocks anymore, so we add Z src to the nearfield 𝒩 ( Z tar ) of Z tar (lines 17, 24 and 26).

Algorithm 3 (The subroutines to determine Mx2L and M2Lx interaction lists by asymmetric subdivisions.).

(4.5)

The routine DetermineMx2LAndNearfieldLists in Algorithm 3 is used to determine pairs of clusters Z src and Z tar and corresponding matrix blocks, for which the approximation (3.6) is applied. It is initially called in line 15 of Algorithm 2 for a temporally indivisible cluster Z src and a target cluster Z tar such that t ( Z tar ) > t ( Z src ) . The main goal of the routine is to recursively subdivide the target cluster until the admissibility criterion (3.15) is satisfied for the temporal components of the resulting clusters or until a final inadmissible block is detected. This is the content of lines 9–16 of Algorithm 3. For the temporally one-sided expansion (3.6) the spatial lengths of Z src and Z tar have to match, which is the case if their spatial levels coincide. The right spatial level is the one of Z tar . This follows from the last paragraph of Section 3.3 since the spatial and temporal half lengths h ~ x and h ~ t of Z tar satisfy (3.25) due to the construction of 𝒯 Σ . It may be necessary to adapt the spatial level 𝒙 ( Z src ) of Z src , which is done in lines 2–7 of Algorithm 3. If 𝒙 ( Z src ) does not match 𝒙 ( Z tar ) and Z src still has spatially refined children, the routine is called recursively for all these children (line 7). If Z src is a leaf in 𝒯 Σ and its spatial level does not match the one of Z tar , we add it to the nearfield of Z tar (line 4) regardless of whether the temporal components satisfy (3.15) or not. Note that we do not check for an additional spatial truncation in Algorithm 3. This is motivated by the fact, that max { | I src | , | I tar | } does not (or at least not significantly) change when only the target cluster is refined. Therefore, an additional spatial truncation is not reasonable.

The routine DetermineM2LxAndNearfieldLists in Algorithm 3 is the analogue of DetermineMx2LAndNearfieldLists when the roles of source and target clusters are interchanged, i.e. when Z tar is temporally indivisible while Z src is not. In this case we want to find pairs of clusters Z src and Z tar and corresponding matrix blocks, for which the approximation (3.5) is admissible. The recursion strategy including the choice of spatial levels is completely analogous to the one in DetermineM2LxAndNearfieldLists.

Note that a purely spatially refined cluster Z src in the tree 𝒯 Σ can only be considered as a source cluster in the subroutine DetermineMx2LAndNearfieldLists in Algorithm 3, if the corresponding target cluster Z tar has the same spatial level. In particular, there might exist spatially refined clusters, which are never considered in this routine. For the same reason such clusters might never be considered as target clusters in the subroutine DetermineM2LXandNearfieldLists in Algorithm 3. Hence, 𝒯 Σ might contain clusters that are never visited in the routine DetermineOperationLists and its subroutines. These clusters (and all their descendants) are not relevant for the FMM and can be removed from the tree 𝒯 Σ . In the rest of the paper we will assume that such clusters do not exist or have already been removed.

The difference between the standard partitioning scheme in e.g. [25, 23, 17] and the one described here is the treatment of temporally indivisible clusters. In the standard version, such clusters are leaves in the space-time cluster tree and the corresponding inadmissible blocks cannot be subdivided any further. This would correspond to a version of Algorithm 2, where Z src is added to the nearfield 𝒩 ( Z tar ) of Z tar not only in line 23, but also in lines 13 and 20. In this work we allow for a further subdivision and more efficient handling of certain blocks by introducing the purely spatial subdivisions of clusters in the construction of 𝒯 Σ , the temporally one-sided expansions (3.5) and (3.6), and the new routines in Algorithm 3. Note that the description of the interaction lists differs in the previous works [25, 23, 17]. Here we use a more algorithmic description.

4.3 Approximation of Matrix Blocks

In this subsection we describe how to use the kernel expansions (3.3), (3.5) and (3.6) to efficiently approximate the admissible sub-blocks of 𝖵 h from the partition constructed in Section 4.2. This will lead to the adaptive FMM algorithm for the matrix-vector multiplication 𝒇 = 𝖵 h 𝒒 , which we will present in Section 4.5.

Let us consider the application of a block 𝖵 h | Z ^ tar × Z ^ src to the appropriate part 𝒒 | Z ^ src of a vector 𝒒 . The result is denoted by 𝒇 ~ | Z ^ tar and is given by

(4.6) f ~ k t , k 𝒙 = ( j t , j 𝒙 ) Z ^ src q j t , j 𝒙 t k t - 1 t k t γ k 𝒙 t j t - 1 t j t γ j 𝒙 G α ( 𝒙 - 𝒚 , t - τ ) d 𝒔 𝒚 d τ d 𝒔 𝒙 d t

for all indices ( k t , k 𝒙 ) Z ^ tar . Recall that we use pairs ( k t , k 𝒙 ) as indices for vectors, see Section 2. We can approximate this operation in an efficient way if Z src is contained in the standard interaction list M2L ( Z tar ) or one of the new lists Mx2L ( Z tar ) or M2Lx ( Z tar ) . The actual approximations depend on the particular list and are discussed in the following. Most of the occurring quantities in this discussion have been introduced in Section 3.

Case 1: Let Z src M2L ( Z tar ) .

We replace the heat kernel in (4.6) by the full space-time expansion in (3.3). This is the standard approximation from [22, 23, 17]. As in [25, Section 3.4] we compute the result in three steps.

Step 1: S2M. For a { 0 , , m t } and a multi-index 𝜿 0 3 with | 𝜿 | m 𝒙 compute the moments 𝝁 ( Z src ) by

(4.7) μ a , 𝜿 ( Z src ) = ( j t , j 𝒙 ) Z ^ src q j t , j 𝒙 t j t - 1 t j t γ j 𝒙 T X src , 𝜿 ( 𝒚 ) L I src , a ( τ ) d 𝒔 𝒚 d τ .

Step 2: M2L. For b { 0 , , m t } and 𝝂 0 3 with | 𝝂 | m 𝒙 compute the related local contributions 𝝀 ( Z tar ) by

(4.8) λ b , 𝝂 ( Z tar ) = a = 0 m t | 𝜿 + 𝝂 | m 𝒙 E 𝜿 , 𝝂 a , b μ a , 𝜿 ( Z src ) .

Step 3: L2T. For all ( k t , k 𝒙 ) in Z ^ tar evaluate

(4.9) f ~ k t , k 𝒙 b = 0 m t | 𝝂 | m 𝒙 λ b , 𝝂 ( Z tar ) t k t - 1 t k t γ k 𝒙 T X tar , 𝝂 ( 𝒙 ) L I tar , b ( t ) d 𝒔 𝒙 d t .

Case 2: Let Z src M2Lx ( Z tar ) .

By construction, Z tar is temporally indivisible in this case and, therefore, Z ^ tar contains only a single time index k t . We use the temporally one-sided approximation (3.5) to replace the heat kernel in (4.6) and get

(4.10) f ~ k t , k 𝒙 t k t - 1 t k t γ k 𝒙 a = 0 m t | 𝜿 + 𝝂 | m 𝒙 E 𝜿 , 𝝂 a , ( t ) T X tar , 𝝂 ( 𝒙 ) d 𝒔 𝒙 d t μ a , 𝜿 ( Z src ) ,

where the moments 𝝁 ( Z src ) are the same as in (4.7). The temporal integral in this equation is evaluated by using a Gauß–Legendre quadrature rule with ρ t + 1 points, i.e.

(4.11) t k t - 1 t k t E 𝜿 , 𝝂 a , ( t ) d t b = 0 ρ t ω k t , b E 𝜿 , 𝝂 a , ( ζ k t , b ) ,

where { ζ k t , b } b = 0 ρ t and { ω k t , b } b = 0 ρ t are the Gauß–Legendre quadrature points and weights on the time interval ( t k t - 1 , t k t ) . The result in (4.6) is then approximated in three steps.

Step 1: S2M. For a { 0 , , m t } and 𝜿 0 3 with | 𝜿 | m 𝒙 compute the moments 𝝁 ( Z src ) as in (4.7).

Step 2: M2Lx. For 𝝂 0 3 with | 𝝂 | m 𝒙 compute the related spatial local contributions 𝝀 ( 𝒙 ) ( Z tar ) by

(4.12) λ 𝝂 ( 𝒙 ) ( Z tar ) = b = 0 ρ t ω k t , b | 𝜿 + 𝝂 | m 𝒙 a = 0 m t E 𝜿 , 𝝂 a , ( ζ k t , b ) μ a , κ ( Z src ) .

Step 3: Lx2T. For all ( k t , k 𝒙 ) in Z ^ tar evaluate

(4.13) f ~ k t , k 𝒙 | 𝝂 | m 𝒙 γ k 𝒙 T X tar , 𝝂 ( 𝒙 ) d 𝒔 𝒙 λ 𝝂 ( 𝒙 ) ( Z tar ) .

Note that the integral over the temporal target interval ( t k t - 1 , t k t ) is not part of the Lx2T operation but has moved to the M2Lx operation. The spatial local contributions 𝝀 ( 𝒙 ) ( Z tar ) in (4.12) and (4.13) depend explicitly on the time interval ( t k t - 1 , t k t ) , but since Z tar contains only a single time interval, we do not indicate this in the notation.

Case 3: Let Z src Mx2L ( Z tar ) .

In this case Z src is temporally indivisible and Z ^ src contains only a single time index j t . We approximate (4.6) in three steps by substituting the heat kernel with the temporally one-sided expansion in (3.6) and by proceeding similarly as in the last case.

Step 1: S2Mx. For 𝜿 0 3 with | 𝜿 | m 𝒙 compute the spatial moments 𝝁 ( 𝒙 ) ( Z src ) by

(4.14) μ 𝜿 ( 𝒙 ) ( Z src ) = ( j t , j 𝒙 ) Z ^ src q j t , j 𝒙 γ j 𝒙 T X src , 𝜿 ( 𝒚 ) d 𝒔 𝒚 .

Step 2: Mx2L. For b { 0 , , m t } and 𝝂 0 3 with | 𝝂 | m 𝒙 compute the related local contributions 𝝀 ( Z tar )

(4.15) λ b , 𝝂 ( Z tar ) = a = 0 ρ t | 𝜿 + 𝝂 | m 𝒙 ω j t , a E 𝜿 , 𝝂 , b ( ζ j t , a ) μ 𝜿 ( 𝒙 ) ( Z src ) ,

where { ζ j t , a } a = 0 ρ t and { ω j t , a } a = 0 ρ t are Gauß–Legendre quadrature points and weights on the time interval ( t j t - 1 , t j t ) .

Step 3: L2T. For all ( k t , k 𝒙 ) Z ^ tar evaluate the local contributions as in (4.9). Note that the integral over the temporal source interval ( t j t - 1 , t j t ) is not part of the S2Mx operation but is included in the Mx2L operation. The spatial moments 𝝁 ( 𝒙 ) ( Z src ) in (4.14) and (4.15) depend explicitly on this time interval ( t j t - 1 , t j t ) .

The FMM operations above are obtained by replacing the heat kernel in (4.6) with a suitable expansion and by reordering the sums. The related approximation error can be deduced from the results in Section 3.3, see e.g. [19, Section 4.1]. Operations like S2M in (4.7) and L2T in (4.9) involve integrals of polynomials, which can be computed exactly by suitable quadrature formulae. In Cases 2 and 3 above we additionally evaluate the integrals of the expansion coefficients E 𝜿 , 𝝂 a , ( t ) and E 𝜿 , 𝝂 , b ( τ ) over the given time intervals by some quadrature formula, see (4.11). It is not surprising that the resulting operations are similar to those in Case 1, since the additional interpolation in that case can be interpreted as an alternative quadrature formula. However, we can choose the quadrature formula in (4.11) freely and adapt it for individual time intervals to obtain a better accuracy. The results in Section 3.1 indicate that this is necessary in those situations, where we use the temporally one-sided expansions. Alternatively, one could evaluate the integrals of the expansion coefficients E 𝜿 , 𝝂 a , ( t ) and E 𝜿 , 𝝂 , b ( τ ) analytically. However, this would destroy the product structure of the coefficients and prohibit an efficient handling as in [24, Section 4.3], on which we will further comment in Section 4.6.

4.4 A Nested Computation of Moments and Local Contributions

To improve the efficiency of the FMM we use a nested computation of moments and local contributions. This is a standard approach in FMM algorithms and was used already in [25, 22, 23, 17] to compute the moments of clusters Z src from the moments of their children and pass local contributions of clusters Z tar down to their children to evaluate them in a single effort. The related operations are based on a suitable change of basis of the temporal Lagrange polynomials and spatial Chebyshev polynomials, so they do not introduce any additional approximation errors. The newly introduced spatial moments and local contributions can be computed in a nested manner as well. In addition, the spatial moments 𝝁 ( 𝒙 ) ( Z src ) can be used to compute standard moments 𝝁 ( Z src ) , if both are needed. Similarly, local contributions 𝝀 ( Z tar ) can be transformed into spatial local contributions 𝝀 ( 𝒙 ) ( Z tar ) and evaluated together. In the following we list all related FMM operations:

Mx2Mx operation.

For a temporally indivisible cluster Z = X × I with spatially refined children the spatial moments 𝝁 ( 𝒙 ) ( Z ) can be computed from the spatial moments of its children by

(4.16) μ 𝝂 ( 𝒙 ) ( Z ) = Z c child ( Z ) Z c = X c × I 𝜿 𝝂 q 𝜿 , 𝝂 ( 𝒙 ) ( X c , X ) μ 𝜿 ( 𝒙 ) ( Z c )

for all 𝝂 0 3 with | 𝝂 | m x . Here 𝜿 𝝂 is understood componentwise and for the boxes X = ( 𝒂 , 𝒃 ] and X c = ( 𝒄 , 𝒅 ] we define q 𝜿 , 𝝂 ( 𝒙 ) ( X c , X ) := j q κ j , ν j ( 𝒙 ) ( ( c j , d j ] , ( a j , b j ] ) with

q κ j , ν j ( 𝒙 ) ( ( c j , d j ] , ( a j , b j ] ) := λ κ j m 𝒙 + 1 n = 0 m 𝒙 T ( a j , b j ] , ν j ( ξ ( c j , d j ] , n ( m 𝒙 ) ) T ( c j , d j ] , κ j ( ξ ( c j , d j ] , n ( m 𝒙 ) ) ,

where λ κ j is defined as in (3.4c) and { ξ ( c j , d j ] , n ( m 𝒙 ) } n = 0 m 𝒙 are Chebyshev nodes on the intervals ( c j , d j ] of X c .

Mx2M operation.

The spatial moments 𝝁 ( 𝒙 ) ( Z ) of a cluster Z = X × I given in (4.14) can be transformed into the standard moments 𝝁 ( Z ) in (4.7) via

(4.17) μ a , 𝜿 ( Z ) = μ 𝜿 ( 𝒙 ) ( Z ) t j t - 1 t j t L I , a ( τ ) d τ

for all a { 0 , , m t } and 𝜿 0 3 with | 𝜿 | m 𝒙 , where ( t j t - 1 , t j t ) is the only time interval contained in the temporally indivisible cluster Z. Note that the conversion of the spatial moments 𝝁 ( 𝒙 ) ( Z ) of Z to the standard moments 𝝁 ( Z ) has to be executed only if Z is a temporally indivisible cluster whose parent is not temporally indivisible. The standard moments of all other temporally indivisible clusters are not needed.

M2M operation.

The operations used to compute the moments 𝝁 ( Z ) of a cluster Z from the moments of its children are the same as in the original works, cf. [22, Sections 4.2 and 4.3]. Based on the subdivision used to construct the children one distinguishes two operations:

Temporal M2M. If the children of Z = X × I were created by a purely temporal subdivision

(4.18) μ b , 𝜿 ( Z ) = Z c child ( Z ) Z c = X × I c a = 0 m t q a , b ( t ) ( I c , I ) μ a , 𝜿 ( Z c )

for all b { 0 , , m t } and 𝜿 0 3 with | 𝜿 | m 𝒙 , where q a , b ( t ) ( I c , I ) := L I , b ( ξ I c , a ( m t ) ) .

Space-time M2M. If the children of Z = X × I were created by a space-time subdivision

(4.19) μ b , 𝝂 ( Z ) = Z c child ( Z ) Z c = X c × I c a = 0 m t 𝜿 𝝂 q a , b ( t ) ( I c , I ) q 𝜿 , 𝝂 ( 𝒙 ) ( X c , X ) μ a , 𝜿 ( Z c )

for all b { 0 , , m t } and 𝝂 0 3 with | 𝝂 | m 𝒙 , where the coefficients q 𝜿 , 𝝂 ( 𝒙 ) ( X c , X ) and q a , b ( t ) ( I c , I ) are the same as in (4.16) and (4.18), respectively.

Temporal L2L operations and space-time L2L operations are used to pass down local contributions of a cluster Z tar to its children. They are the transposed operations of the corresponding M2M operations. For a temporally indivisible cluster Z tar whose parent is not temporally indivisible the local contributions 𝝀 ( Z tar ) are transformed into spatial local contributions 𝝀 ( 𝒙 ) ( Z tar ) with an L2Lx operation. If Z tar is not a leaf, its spatial local contributions are passed down to its children by Lx2Lx operations. The L2Lx and Lx2Lx operations are the transposed operations of the Mx2M and Mx2Mx operations in (4.17) and (4.16), respectively.

Algorithm 4 (Time-adaptive space-time FMM for the approximate evaluation of f = V h q .).

(4.20)

4.5 A Time-Adaptive Space-Time FMM for the Heat Equation

The full time-adaptive space-time FMM algorithm is presented in Algorithm 4. It can be described as an efficient approximation of the matrix-vector product 𝖵 h 𝒒 in a blockwise manner. The blocks are given by the partition constructed by Algorithm 2. Admissible blocks are approximated in the FMM as described in Section 4.3. In addition, the nested computation of moments and local contributions from Section 4.4 is used. Note that the moments 𝝁 ( Z src ) of a cluster Z src have to be computed only once and can then be used for all M2L or M2Lx operations of this cluster. The same is true for spatial moments 𝝁 ( 𝒙 ) ( Z src ) and Mx2L operations. The moments of all clusters in 𝒯 Σ are computed in the first phase of the FMM (lines 2–17), where an explicit distinction and transition between spatial moments and standard moments takes place. After this phase all moments and spatial moments are available, so all M2L, Mx2L and M2Lx operations can be executed (lines 18–25). The resulting local contributions and spatial local contributions of each target cluster are summed up. They are passed downwards and evaluated in a single effort for the leaves of 𝒯 Σ (lines 26–39). Again, explicit transitions between local contributions and their spatial counterparts take place when necessary. In the last phase, the inadmissible blocks are applied directly (lines 40–42).

The new time-adaptive space-time FMM algorithm can be seen as an enhancement of the original one in [22], which is presented in a similar form in [25, Algorithm 2]. The additional subdivision of formerly inadmissible blocks by the routines in Algorithm 3 and the new FMM operations related to purely spatial moments and local contributions allow for a more efficient treatment of space-time tensor product meshes which are adaptive in time. In Sections 4.6 and 4.7 we will discuss the costs for the execution of individual FMM operations and, in particular, analyze the compression obtained by the new operations. The benefits of the time-adaptive space-time FMM will be shown in the numerical examples in Section 5.

We suppose that the concept of the temporal nearfield approximation in [18] can be extended to the setting of adaptive meshes in time with some technical difficulties. This would allow for a compression of the formerly inadmissible blocks as well. However, our approach appears to fit into the present concept more easily.

In [25] the authors presented a parallel version of the standard space-time FMM for the heat equation for distributed memory systems. The parallel algorithm exploits the temporal structure of the FMM, which allows to consider groups of operations on a temporal level. These groups of operations do not have to be executed in the strictly separated phases which we see in Algorithm 4, but can be executed efficiently based on individual dependencies. This flexible execution order enables an efficient parallelization in distributed memory. The distribution of work and communication between different processes can be handled on a temporal level, which makes it simple and efficient. In the parallel algorithm, collections of moments or local contributions of clusters are communicated between processes in a non-blocking way when they are ready and required. The new time-adaptive version allows for a similar treatment by introducing new groups of operations related to the new FMM operations (S2Mx, Mx2Mx, Mx2L, M2Lx, etc.) and adjusting the dependencies of the grouped operations. The distribution of the work among computing nodes has to be adapted to the time-adaptive mesh.

4.6 Costs of Individual FMM Operations

In Table 1 we give an overview of the runtime complexity of all FMM operations. The complexity of the M2L operations, 𝒪 ( ( m 𝒙 + 1 ) 4 ( m t + 1 ) 2 ) , is only achieved if they are executed as proposed in [24, Section 4.3]. A naive implementation would require 𝒪 ( ( m 𝒙 + 3 3 ) 2 ( m t + 1 ) 2 ) operations instead. The same holds for the M2Lx and Mx2L operations, where a smart implementation requires 𝒪 ( ( m 𝒙 + 1 ) 4 ( m t + 1 ) ( ρ t + 1 ) ) operations instead of 𝒪 ( ( m 𝒙 + 3 3 ) 2 ( m t + 1 ) ( ρ t + 1 ) ) in the naive case. The complexity of all other FMM operations can be estimated by counting the number of multiplications in the respective equations (4.7)–(4.19) above, utilizing the tensor product structure of operations whenever possible. For example, in case of the S2M operations one can use a decomposition into operations similar to S2Mx and Mx2M operations in (4.14) and (4.17). We observe that the costs of the newly introduced FMM operations are similar to those of the standard FMM operations.

Table 1

Runtime complexities of all FMM operations in the time-adaptive space-time FMM for the heat equation. The cluster Z appearing in the estimated costs for S2M, L2T, S2Mx and Lx2T operations is the related source or target cluster.

Operations Runtime complexity
S2M and L2T 𝒪 ( # Z ^ ( m 𝒙 + 3 3 ) + n t ( Z ^ ) ( m 𝒙 + 3 3 ) ( m t + 1 ) )
Purely temporal M2M and L2L 𝒪 ( ( m 𝒙 + 3 3 ) ( m t + 1 ) 2 )
Space-time M2M and L2L 𝒪 ( ( m 𝒙 + 3 3 ) ( m t + 1 ) ( m 𝒙 4 + ( m t + 1 ) ) )
M2L 𝒪 ( ( m 𝒙 + 1 ) 4 ( m t + 1 ) 2 )
S2Mx and Lx2T 𝒪 ( # Z ^ ( m 𝒙 + 3 3 ) )
Mx2Mx and Lx2Lx 𝒪 ( ( m 𝒙 + 3 3 ) m 𝒙 4 )
Mx2M and L2Lx 𝒪 ( ( m 𝒙 + 3 3 ) ( m t + 1 ) )
M2Lx and Mx2L 𝒪 ( ( m 𝒙 + 1 ) 4 ( m t + 1 ) ( ρ t + 1 ) )

The numbers in Table 1 reveal why the approximation of admissible blocks of 𝖵 h is efficient. Let us consider for example a block 𝖵 h | Z ^ tar × Z ^ src corresponding to clusters Z tar and Z src in 𝒯 Σ with Z src M2L ( Z tar ) . The computation of the moments 𝝁 ( Z src ) and evaluation of the local contributions 𝝀 ( Z tar ) has to be done only once for each cluster, so it does not have to be repeated for every admissible block. Thus, we can focus on the M2L operations when estimating the costs of the application of such an admissible block in the FMM algorithm. These costs are 𝒪 ( ( m 𝒙 + 1 ) 4 ( m t + 1 ) 2 ) . In comparison, a direct application of the non-approximated block would require 𝒪 ( # Z ^ tar # Z ^ src ) operations. Hence, the low rank approximation is more efficient in terms of the execution time, if Z tar and Z src contain more than 𝒪 ( ( m 𝒙 + 1 ) 2 ( m t + 1 ) ) elements.

In terms of storage requirements, the approximation of blocks in the FMM is beneficial as well. In fact, we do not store the coefficients of the FMM operations in our implementation, but only the moments and local contributions for each cluster – at most one of each kind per cluster. Since the computation of nearfield entries is rather expensive, we fully assemble and store all inadmissible blocks instead. Therefore, the required storage for admissible blocks is negligibly small compared to inadmissible blocks.

4.7 Complexity Analysis for Newly Approximated Blocks

A complete complexity analysis of the presented time-adaptive FMM highly depends on the specific adaptive decomposition of the time interval ( 0 , T ) and would require some related restrictive assumptions. Instead we present a comparison of the standard and the time-adaptive FMM when the latter one provides some additional compression.

We consider two time cluster intervals I src and I tar which are inadmissible with respect to the original criterion (3.11) but admissible with respect to the adaptive criterion (3.15). We assume that the interval I tar and all related space-time clusters Z tar = X tar × I tar , respectively, are temporally indivisible. Note that the situation related to interchanged roles of I src and I tar is similar to the considered one due to the symmetric character of operations like Mx2L and M2Lx. In addition we assume that the spatial mesh Γ h with E 𝒙 similarly sized elements is sufficiently fine and that the spatial clustering resolves the surface.

We start with the complexity analysis of the matrix blocks related to the considered two time intervals for the new time-adaptive version of the FMM. In this setting M2Lx operations are applied for related clusters Z tar and Z src = X src × I src . The combined effort of the two operations L2Lx and Lx2T is the same as the effort of the L2T operation for the same cluster. Therefore, these two operations do not create additional effort in the overall algorithm. Some additional effort is created by the Lx2Lx operations on the spatially refined clusters. A single Lx2Lx operation is rather inexpensive and there is at most one operation per cluster. Compared to the effort of M2L and M2Lx operations the effort of all Lx2Lx operations is small. There are not any related S2Mx, Mx2Mx, and Mx2M operations in this part of the FMM algorithm. Thus we neglect the effort of all these operations in this analysis and focus on the effort of the related M2Lx operations.

The effort of a M2Lx operation between the considered clusters Z src and Z tar is given in Table 1. This operation is applied for all clusters Z src M2Lx ( Z tar ) , see the lines 24f in Algorithm 4. Thus, we have to estimate their number # M2Lx ( Z tar ) . The list M2Lx ( Z tar ) is filled in line 26 of Algorithm 3 and affected by the truncation strategy described in Section 4.2. But the truncation is implemented in line 3 of Algorithm 2. At that point the interaction area 𝒜 in (4.3) typically contains ( 2 n tr + 1 ) 2 non-empty boxes of the regular grid on the surface of the domain. Due to d 𝒙 additional spatial descending steps according to lines 18–23 in Algorithm 3, their number increases by the factor 4 d 𝒙 approximately. Thus,

(4.21) # M2Lx ( Z tar ) ( 2 n tr + 1 ) 2 4 d 𝒙 .

By taking the sum over all clusters Z tar related to I tar , we can estimate the computational effort of the time-adaptive FMM related to the two time intervals by

(4.22) C new # 𝒙 ( 2 n tr + 1 ) 2 4 d 𝒙 𝒪 ( ( m 𝒙 + 1 ) 4 ( m t + 1 ) ( ρ t + 1 ) ) ,

where # 𝒙 is the number of non-empty spatial boxes of the regular grid 𝒢 𝒙 on the spatial level 𝒙 of Z tar and Z src .

In the standard FMM the related blocks are inadmissible and, thus, directly assembled and stored as dense blocks. Again we apply the truncation strategy described in Section 4.2. We have already counted the number of clusters Z src relevant for Z tar in (4.21). If we take the sum over all clusters Z tar related to I tar , which contain all E 𝒙 spatial elements, we can estimate the number of nearfield entries related to the two time cluster intervals as

(4.23) C old E 𝒙 ( 2 n tr + 1 ) 2 4 d 𝒙 n 𝒙 n t ,

where n 𝒙 is the average number of elements in the non-empty spatial boxes 𝒙 of the regular grid 𝒢 𝒙 , and n t is the number of time intervals in the cluster interval I src and Z src , respectively.

Typically we will have a small integer d 𝒙 for locally quasi-uniform temporal meshes. But n 𝒙 may be large for temporally indivisible time intervals. Thus, the number C old of nearfield entries of the considered matrix block may be large or even n t E 𝒙 2 . The new version reduces the memory requirements and the computational time significantly. C old nearfield entries do not have to be computed and stored. Instead we only need to store an additional expansion per cluster which typically requires much less memory.

For the execution time of a matrix vector product the comparison is more involved. In the original version the related effort is proportional to C old . The effort C new of the new method in (4.22) depends on the number of M2Lx operations, i.e. on the specific setting of the two time intervals I src and I tar . Since # 𝒙 = E 𝒙 n 𝒙 , the new method will provide a faster application of the related blocks, if C new < C old , i.e.

𝒪 ( ( m 𝒙 + 1 ) 4 ( m t + 1 ) ( ρ t + 1 ) ) < n 𝒙 2 n t .

This is the case if the number n t of time intervals in I src and the average number n 𝒙 of spatial elements per cluster on the spatial level 𝒙 of Z src and Z tar are large. Thus, the new method is advantageous for fine spatial meshes, in general. As a rule of thumb: If the operations take place on a relatively large spatial level 𝒙 , n 𝒙 is small and the original nearfield matrix vector product is faster. If the operations take place on a relatively small spatial level 𝒙 , n 𝒙 is large and the new FMM version will be faster. Due to the construction of the cluster tree and the operation lists M2Lx , the relevant spatial level 𝒙 is related to the cluster levels of the two time intervals I src and I tar .

5 Numerical Experiments

The time-adaptive FMM algorithm presented and analyzed in the previous sections has been implemented in the publicly available C++ library besthea [16]. The implementation provides a hybrid OpenMP-MPI parallelization of Algorithm 4 following the ideas in [25]. In this section we consider numerical experiments to show the benefits of the newly introduced time-adaptive FMM operations. The experiments in Section 5.1 were carried out using a local workstation with 384 GiB of RAM and two 16-core Intel Xeon Gold 5218 processors. For the compilation we used the Intel compiler v2021.2.0. For the experiments in Section 5.2 we used the VSC-4 cluster in Vienna, Austria, and the Intel compiler v19.1.3.304. The nodes of VSC-4 are equipped with two 24-core Intel Skylake Platinum 8174 processors. We used standard nodes providing 96 GB of RAM and fat nodes, providing 384 GB of RAM, when necessary. A guide to reproduce the experiments is given in the repository [16].

5.1 Example 1: Exponential Decay in Time

In the first example we use a direct boundary integral approach to solve the initial boundary value problem (2.1) of the heat equation with the heat capacity constant α = 1 in the space-time cylinder Q = Ω × ( 0 , T ) , where Ω is the cube ( - 0.5 , 0.5 ) 3 and T = 0.25 . We choose the exact solution of the heat equation

(5.1) u ( 𝒙 , t ) = exp ( - 3 π 2 t ) j = 1 3 sin ( π ( x j + 0.5 ) ) for all  ( 𝒙 , t ) Q

with the related initial and Dirichlet data as well as Neumann datum q. We solve the integral equation (2.2) for q with the right-hand side f in (2.4), which simplifies to f = - M 0 u 0 since g = 0 . Due to the characteristic exponential decay in time of the solution u in (5.1) it is reasonable to increase the time step size as time advances. For our numerical experiment we choose the adaptive temporal mesh h illustrated in Figure 1. For a better presentation we depict the lengths of the intervals. This mesh was constructed by an adaptive refinement algorithm that was tailored to reduce the L 2 projection error of the temporal part q t ( t ) = exp ( - 3 π 2 t ) of the exact solution (5.1). For the spatial mesh Γ h we use a uniform triangulation of the cube’s surface into 12 288 triangles. The resulting space-time tensor product mesh Σ h = Γ h h consists of 245 760 space-time boundary elements. In addition, a volume mesh of 196 608 tetrahedra conforming to Γ h is used for the initial potential.

Figure 1 
                  The time mesh 
                        
                           
                              
                                 ℐ
                                 h
                              
                           
                           
                           {\mathcal{I}_{h}}
                        
                      consisting of twenty time steps in the time interval 
                        
                           
                              
                                 (
                                 0
                                 ,
                                 0.25
                                 )
                              
                           
                           
                           {(0,0.25)}
                        
                      adapted to 
                        
                           
                              
                                 
                                    
                                       q
                                       t
                                    
                                    ⁢
                                    
                                       (
                                       t
                                       )
                                    
                                 
                                 =
                                 
                                    exp
                                    ⁡
                                    
                                       (
                                       
                                          -
                                          
                                             3
                                             ⁢
                                             
                                                π
                                                2
                                             
                                             ⁢
                                             t
                                          
                                       
                                       )
                                    
                                 
                              
                           
                           
                           {q_{t}(t)=\exp(-3\pi^{2}t)}
                        
                     .For each interval its length is depicted.
Figure 1

The time mesh h consisting of twenty time steps in the time interval ( 0 , 0.25 ) adapted to q t ( t ) = exp ( - 3 π 2 t ) .For each interval its length is depicted.

We want to compare the original FMM as implemented in [25] and our new time-adaptive variant when solving the discrete system 𝖵 h 𝒒 = 𝒇 , where 𝒇 is a discretization of the right-hand side - M 0 u 0 . For the evaluation of the initial potential we use analytical integration in time and numerical quadrature formulae for triangles and tetrahedra in space. In singular cases we apply a recursive subdivision routine of volume and surface elements. Our results indicate that we are able to compute the right-hand side 𝒇 accurately enough in this way. For our time-adaptive FMM we use the cluster parameters n max = 800 and c st = 4.5 , the spatial truncation parameter n tr = 2 , the constants η 2 = 1 and η 1 = 1 for the admissibility criteria (3.11) and (3.15), respectively, and the expansion degrees m t = 4 and m 𝒙 = 12 . For the numerical quadrature in (4.11) we use ϱ t = 3 , i.e. a 4-point rule, which proves to be sufficiently accurate in our examples. For the original version of the FMM we use the same parameters except η 1 and ϱ t , which are not needed.

Table 2

Solution of the system 𝖵 h 𝒒 = 𝒇 for the problem in Section 5.1 using the GMRES method with a diagonal preconditioner and the standard FMM and time-adaptive FMM for the matrix-vector multiplication.

Standard FMM Time-adaptive FMM
Relative L 2 approximation error 0.0530951 0.0530951
Storage for the nearfield part of 𝖵 h [GiB] 39.92 32.04
Storage for the moments and local contributions [GiB] 0.0211 0.0216
Time to assemble 𝖵 h [s] 307.32 217.56
Time per GMRES iteration [s] 1.36 1.42
Total time (assembly of 𝖵 h and solution) [s] 373.98 286.92

In Table 2 we compare the results when solving the system 𝖵 h 𝒒 = 𝒇 using the GMRES method with a diagonal preconditioner for the time-adaptive and the standard FMM. The relative L 2 approximation errors q - q h L 2 ( Σ ) / q L 2 ( Σ ) in the first line coincide. This shows that the additional kernel approximations do not affect the accuracy of the time-adaptive FMM. In both versions, we store the nearfield matrices since their computation is costly. The required storage of 39.92 GiB for the nearfield matrices of the standard FMM was reduced by roughly 20 % by introducing the new FMM operations. The storage required for the additional spatial local contributions is negligibly small as we can see from the values in the third line of the table.

In lines 4–6 of Table 2 we compare the execution times when using the standard and time-adaptive FMM. In both cases the GMRES method required 49 iterations to achieve a relative accuracy of 10 - 8 . Note that the total times do not include the computation of the right-hand side, which takes about 250 s by using an FMM. The time for a single GMRES iteration, which is dominated by the time needed for a matrix-vector multiplication, is slightly lower for the standard FMM in this example. This is not completely unexpected, since the application of nearfield matrix blocks is simple and efficient if they are already computed. The newly introduced Mx2L and M2Lx operations are more efficient than the related nearfield operations for source and target clusters containing many space-time boundary elements, but can be less efficient for clusters containing few elements, as we have seen in Section 4.7. The corresponding gains and losses seem to balance out in this example. The assembly of the nearfield matrices requires considerably more time than the GMRES solver and is significantly faster in the time-adaptive version, due to the reduced number of nearfield entries which have to be computed. Therefore, the new time-adaptive FMM outperforms the standard version in this example.

5.2 Example 2: Rapid Change of Boundary Data

We study an example with boundary data that change rapidly over time. This is another scenario where adaptive temporal meshes are better suited than uniform meshes. We consider the function

(5.2) q t ( t ) = t 4 + ( t - 1 6 ) 1 4 𝟙 ( 1 6 , ) ( t ) , t ( 0 , T ) ,

where T = 0.25 , 𝟙 ( 1 6 , ) ( t ) = 1 for t > 1 6 and 𝟙 ( 1 6 , ) ( t ) = 0 for t 1 6 . The function q t is depicted in Figure 2 (a). It is non-smooth at time t = 1 6 when the part ( t - 1 6 ) 1 4 is turned on. A suitable adaptive mesh that resolves this event is advantageous for a piecewise constant approximation of this function. The mesh h in Figure 2 (b) was built for that purpose.

Figure 2

Visualization of the rapidly changing part q t of q in (5.3) in the time interval ( 0 , 0.25 ) and a suitable mesh on this interval that resolves the non-smooth behavior of q t at time t = 1 6 well.

(a) 
                     Temporal function 
                           
                              
                                 
                                    q
                                    t
                                 
                              
                              
                              {q_{t}}
                           
                         in (5.2).
(a)

Temporal function q t in (5.2).

(b) 
                     Interval lengths of an adaptive mesh 
                           
                              
                                 
                                    ℐ
                                    h
                                 
                              
                              
                              {\mathcal{I}_{h}}
                           
                         for 
                           
                              
                                 
                                    q
                                    t
                                 
                              
                              
                              {q_{t}}
                           
                         in (5.2).
(b)

Interval lengths of an adaptive mesh h for q t in (5.2).

We consider a related indirect BEM, i.e. we want to find q such that V q = g for a given Dirichlet datum g and initial datum u 0 = 0 in the space-time cylinder Q = Ω × ( 0 , 0.25 ) , where Ω is a crankshaft geometry whose surface is discretized by a mesh Γ h consisting of 10 722 triangles. We set the exact solution q to

(5.3) q ( 𝒙 , t ) = q t ( t ) q 𝒙 ( 𝒙 ) for all  ( 𝒙 , t ) Σ ,  with  q 𝒙 ( 𝒙 ) = 2 n 1 ( 𝒙 ) x 1 - 3 n 2 ( 𝒙 ) x 2 + n 3 ( 𝒙 ) x 3 10

and q t in (5.2). The mesh h in Figure 2 (b) and the surface mesh Γ h of the crankshaft are well-suited to approximate the solution q and we can check the approximation errors to ensure the correctness of our method. Since the right-hand side g is unknown, we will use a piecewise constant approximation g ^ h on a finer mesh Σ ^ h (constructed from the original mesh Σ h = Γ h h by uniform refinements; once in space, twice in time) whose coefficients are computed by 𝒈 ^ = 𝖵 ^ h 𝒒 ^ . Here, 𝒒 ^ denotes the coefficient vector of the piecewise constant projection of q on Σ ^ h and 𝖵 ^ h is the single layer operator matrix on the fine mesh.

For all examples in this section, we choose the same FMM parameters as in Section 5.1. We compare the standard FMM and our new time-adaptive variant by solving the system 𝖵 h 𝒒 = 𝒈 on the space-time mesh Σ h = Γ h h that contains 396 714 space-time boundary elements. For each of the computations we use a single node of the VSC-4 cluster: a standard node for the time-adaptive FMM and a fat node for the standard FMM due to the higher memory demand. The results of the computations are given in Table 3. They are similar to the results in Section 5.1. The approximation quality does not suffer when introducing the new FMM operations, while the required storage for nearfield matrices and related assembly times are reduced significantly. In this example 98 GMRES iterations were needed in both cases, and a single iteration was faster for the time-adaptive FMM than for the standard FMM.

Table 3

Solution of the system 𝖵 h 𝒒 = 𝒈 for the problem in Section 5.2 and the time-adaptive mesh Σ h . The system is solved using the GMRES method with a diagonal preconditioner and the standard FMM and time-adaptive FMM for the matrix-vector multiplication.

Standard FMM Time-adaptive FMM
Relative L 2 approximation error 0.013129 0.013129
Storage for the nearfield part of 𝖵 h [GiB] 93.23 66.64
Storage for the moments and local contributions [GiB] 0.0281 0.0305
Time to assemble 𝖵 h [s] 495.34 302.27
Time per GMRES iteration [s] 1.27 1.16
Total time (assembly of 𝖵 h and solution) [s] 619.43 415.56

The level of adaptivity differs for the examples in this section and Section 5.1, as the constant related to locally quasi-uniform meshes differs. We have t j + 1 - t j t j - t j - 1 2 for Example 1 and t j - t j - 1 t j + 1 - t j 4 for Example 2. While the grading of the mesh is stronger in Example 2, the reductions of the storage requirements and the computational times are slightly better. In general, the improvement of the new version highly depends on the specific setting, please see the discussion in Section 4.7.

Finally, we compare the computations on the above time-adaptive mesh Σ h to the computation on a suitable uniform mesh. We consider the space-time mesh Σ h , u = Γ h h , u , where h , u is the uniform time mesh on ( 0 , T ) consisting of 512 uniform time steps with length 2 - 11 . Note that Σ h , u contains 5 489 664 space-time boundary elements. The large number of uniform time steps in h , u is required to resolve the rapid change of q t around the point t = 1 6 reasonably well. While the uniform mesh h , u is finer than the adaptive mesh h in Figure 2 (b) in general, it is coarser around t = 1 6 . When projecting the function q t to these two meshes the resulting L 2 approximation errors on ( 0 , T ) are similar, but the error is slightly larger around time t = 1 6 for the uniform mesh. On the uniform mesh Σ h , u we solve the system 𝖵 h , u 𝒒 u = 𝒈 u with the standard FMM. Due to the large size of the problem, we use 16 standard nodes of VSC-4 for the computation and the parallel FMM in [25]. The results are given in Table 4.

Table 4

Solution of the system 𝖵 h , u 𝒒 u = 𝒈 u for the problem in Section 5.2 and the mesh Σ h , u with 512 uniform time steps. The system is solved using the GMRES method with a diagonal preconditioner and the standard FMM using 16 nodes of VSC-4.

Standard FMM (16 nodes)
Relative L 2 approximation error 0.011851
Average storage for the nearfield part of 𝖵 h per node [GiB] 79.90
Average storage for the moments and local contributions per node [GiB] 0.0467
Time to assemble 𝖵 h [s] 406.12
Time per GMRES iteration [s] 2.29
Total time (assembly of 𝖵 h and solution) [s] 756.40

When comparing the results in Tables 3 and 4 one notices the superior performance of the time-adaptive method over the standard method with the uniform time steps. Although we use 16 nodes of VSC-4 for the computations on the uniform mesh, the computational time is higher than the computational time of the adaptive time mesh on a single node, even when using the standard FMM on the latter. The higher computation time is caused by the larger number of GMRES iterations (153 instead of 98) required for the solution and the longer duration of a single matrix vector product.

6 Conclusions and Outlook

We have presented temporally one-sided expansions of the heat kernel and related error estimates. Based on these results we have developed new FMM expansions and operations as well as a new time-adaptive version of the FMM for the heat equation. Our theoretical considerations and the numerical examples demonstrate some substantial improvements of fast BEM calculations for meshes which are adaptive with respect to time.

While the new time-adaptive FMM reduces the amount of non-compressed nearfield blocks, some of the nearfield blocks related to the identical and the previous time cluster remain uncompressed. A sophisticated compression technique for such blocks is presented in [18] for uniform time steps. We plan to develop an alternative method for the non-uniform setting. This will significantly reduce the computational times and the memory requirements of our method once more.

To further speed up the nearfield assembly time, which makes up a significant amount of the total time in our numerical experiments, one could think about using GPUs for these computations. With such devices it might even be possible to compute the nearfield entries on the fly instead of storing them, which would lead to a drastic reduction of the storage requirements. Such an approach was considered for a standard BEM (i.e. without a fast method) in [13] and could be extended for the space-time FMM presented in this paper.

The improvements of the FMM are an important step towards the development of efficient adaptive space-time boundary element methods for the heat equation. First results regarding such an adaptive BEM can be found in [8]. In that work a posteriori error estimates are introduced which drive the non-local mesh refinement of the proposed adaptive algorithm. This adaptive algorithm and the considered numerical examples cover only the spatially two-dimensional case. When transferring the ideas to the spatially three-dimensional case, a fast method is needed for the solution of the related systems because otherwise the problem size is prohibitively limited. A variant of our time-adaptive FMM adapted to meshes without tensor product structure might be suitable for such an application.

Funding source: Austrian Science Fund

Award Identifier / Grant number: 19-29698L

Funding statement: The authors acknowledge the support provided by the Austrian Science Fund (FWF) under the project I 4033-N32 in a joint project with the Czech Science Foundation (project 19-29698L). The computational results presented have been achieved in part using the Vienna Scientific Cluster (VSC).

References

[1] D. N. Arnold and P. J. Noon, Boundary integral equations of the first kind for the heat equation, Boundary Elements IX, Vol. 3 (Stuttgart 1987), University of Southampton, Southampton (1987), 213–229. Search in Google Scholar

[2] S. Börm, Efficient Numerical Methods for Non-Local Operators, EMS Tracts Math. 14, European Mathematical Society, Zürich, 2010. 10.4171/091Search in Google Scholar

[3] J. Carrier, L. Greengard and V. Rokhlin, A fast adaptive multipole algorithm for particle simulations, SIAM J. Sci. Statist. Comput. 9 (1988), no. 4, 669–686. 10.1137/0909044Search in Google Scholar

[4] H. Cheng, L. Greengard and V. Rokhlin, A fast adaptive multipole algorithm in three dimensions, J. Comput. Phys. 155 (1999), no. 2, 468–498. 10.1006/jcph.1999.6355Search in Google Scholar

[5] M. Costabel, Boundary integral operators for the heat equation, Integral Equations Operator Theory 13 (1990), no. 4, 498–552. 10.1007/BF01210400Search in Google Scholar

[6] S. Dohr, K. Niino and O. Steinbach, Space-time boundary element methods for the heat equation, Space-Time Methods—Applications to Partial Differential Equations, Radon Ser. Comput. Appl. Math. 25, De Gruyter, Berlin (2019), 1–60. 10.1515/9783110548488-001Search in Google Scholar

[7] S. Dohr, J. Zapletal, G. Of, M. Merta and M. Kravčenko, A parallel space-time boundary element method for the heat equation, Comput. Math. Appl. 78 (2019), no. 9, 2852–2866. 10.1016/j.camwa.2018.12.031Search in Google Scholar

[8] G. Gantner and R. van Venetië, Adaptive space-time BEM for the heat equation, Comput. Math. Appl. 107 (2022), 117–131. 10.1016/j.camwa.2021.12.022Search in Google Scholar

[9] L. Greengard and P. Lin, Spectral approximation of the free-space heat kernel, Appl. Comput. Harmon. Anal. 9 (2000), no. 1, 83–97. 10.1006/acha.2000.0310Search in Google Scholar

[10] L. Greengard and J. Strain, A fast algorithm for the evaluation of heat potentials, Comm. Pure Appl. Math. 43 (1990), no. 8, 949–963. 10.1002/cpa.3160430802Search in Google Scholar

[11] W. Hackbusch, Hierarchical Matrices: Algorithms and Analysis, Springer Ser. Comput. Math. 49, Springer, Heidelberg, 2015. 10.1007/978-3-662-47324-5Search in Google Scholar

[12] H. Harbrecht and J. Tausch, A fast sparse grid based space-time boundary element method for the nonstationary heat equation, Numer. Math. 140 (2018), no. 1, 239–264. 10.1007/s00211-018-0963-5Search in Google Scholar

[13] J. Homola, Acceleration of the space-time boundary element method using GPUs, Master’s thesis, VŠB Technical University of Ostrava, 2021. Search in Google Scholar

[14] G. C. Hsiao and J. Saranen, Boundary integral solution of the two-dimensional heat equation, Math. Methods Appl. Sci. 16 (1993), no. 2, 87–114. 10.1002/mma.1670160203Search in Google Scholar

[15] G. Kratochwill, Analyse und Vergleich verschiedener Varianten der schnellen Gauß–Transformation, Master’s thesis, Technische Universität Graz, Graz, 2018. Search in Google Scholar

[16] M. Merta, G. Of, R. Watschinger and J. Zapletal, besthea, 2020, https://github.com/zap150/besthea. Search in Google Scholar

[17] M. Meßner, A Fast Multipole Galerkin Boundary Element Method for the Transient Heat Equation, Monogr. Ser. TU Graz Comp. Eng. Sci. 23, Technische Universität Graz, Graz, 2014. Search in Google Scholar

[18] M. Messner, M. Schanz and J. Tausch, A fast Galerkin method for parabolic space-time boundary integral equations, J. Comput. Phys. 258 (2014), 15–30. 10.1016/j.jcp.2013.10.029Search in Google Scholar

[19] M. Messner, M. Schanz and J. Tausch, An efficient Galerkin boundary element method for the transient heat equation, SIAM J. Sci. Comput. 37 (2015), no. 3, A1554–A1576. 10.1137/151004422Search in Google Scholar

[20] K. Nabors, F. T. Korsmeyer, F. T. Leighton and J. White, Preconditioned, adaptive, multipole-accelerated iterative methods for three-dimensional first-kind integral equations of potential theory, SIAM J. Sci. Comput. 15 (1994), no. 3, 713–735. 10.1137/0915046Search in Google Scholar

[21] T. J. Rivlin, Chebyshev Polynomials, 2nd ed., Pure Appl. Math. (New York), John Wiley & Sons, New York, 1990. Search in Google Scholar

[22] J. Tausch, A fast method for solving the heat equation by layer potentials, J. Comput. Phys. 224 (2007), no. 2, 956–969. 10.1016/j.jcp.2006.11.001Search in Google Scholar

[23] J. Tausch, Fast Nyström methods for parabolic boundary integral equations, Fast Boundary Element Methods in Engineering and Industrial Applications, Lect. Notes Appl. Comput. Mech. 63, Springer, Heidelberg (2012), 185–219. 10.1007/978-3-642-25670-7_6Search in Google Scholar

[24] J. Tausch and A. Weckiewicz, Multidimensional fast Gauss transforms by Chebyshev expansions, SIAM J. Sci. Comput. 31 (2009), no. 5, 3547–3565. 10.1137/080732729Search in Google Scholar

[25] R. Watschinger, M. Merta, G. Of and J. Zapletal, A parallel fast multipole method for a space-time boundary element method for the heat equation, SIAM J. Sci. Comput. 44 (2022), no. 4, C320–C345. 10.1137/21M1430157Search in Google Scholar

[26] J. Zapletal, R. Watschinger, G. Of and M. Merta, Semi-analytic integration for a parallel space-time boundary element method modelling the heat equation, Comput. Math. Appl. 103 (2021), 156–170. 10.1016/j.camwa.2021.10.025Search in Google Scholar

Received: 2022-05-25
Revised: 2022-09-26
Accepted: 2022-10-08
Published Online: 2022-12-06
Published in Print: 2023-04-01

© 2022 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 21.1.2025 from https://www.degruyter.com/document/doi/10.1515/cmam-2022-0117/html
Scroll to top button