Letter • The following article is Open access

Probing the effects of broken symmetries in machine learning

Marcel F Langer, Sergey N Pozdnyakov and Michele Ceriotti^*

Published 29 October 2024 • © 2024 The Author(s). Published by IOP Publishing Ltd
Machine Learning: Science and Technology, Volume 5, Number 4 Citation Marcel F Langer et al 2024 Mach. Learn.: Sci. Technol. 5 04LT01 DOI 10.1088/2632-2153/ad86a0

Download Article PDF

Article metrics

1056 Total downloads
Video abstract views

Submit

Submit to this Journal

Dates

Received 6 August 2024
Revised 11 September 2024
Accepted 14 October 2024
Published 29 October 2024

Buy this article in print

Journal RSS

Abstract

Symmetry is one of the most central concepts in physics, and it is no surprise that it has also been widely adopted as an inductive bias for machine-learning models applied to the physical sciences. This is especially true for models targeting the properties of matter at the atomic scale. Both established and state-of-the-art approaches, with almost no exceptions, are built to be exactly equivariant to translations, permutations, and rotations of the atoms. Incorporating symmetries—rotations in particular—constrains the model design space and implies more complicated architectures that are often also computationally demanding. There are indications that unconstrained models can easily learn symmetries from data, and that doing so can even be beneficial for the accuracy of the model. We demonstrate that an unconstrained architecture can be trained to achieve a high degree of rotational invariance, testing the impacts of the small symmetry breaking in realistic scenarios involving simulations of gas-phase, liquid, and solid water. We focus specifically on physical observables that are likely to be affected—directly or indirectly—by non-invariant behavior under rotations, finding negligible consequences when the model is used in an interpolative, bulk, regime. Even for extrapolative gas-phase predictions, the model remains very stable, even though symmetry artifacts are noticeable. We also discuss strategies that can be used to systematically reduce the magnitude of symmetry breaking when it occurs, and assess their impact on the convergence of observables.

Export citation and abstract BibTeX RIS

Next article in issue

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Supplementary data

1. Introduction

Data-driven techniques are increasingly applied across the physical sciences [1, 2], with the modeling of matter at the atomic scale being a field in which they have been adopted early [3–6] and with great success [7–9]. Machine-learning models that are meant to reproduce the relationship between a structure and its properties inherit the constraints, and hence the symmetries, of the underlying physics. For instance, the potential energy, the target of so-called machine-learning interatomic potentials (MLPs), is invariant to atom label permutations, as well as translations, rotations, and reflections. Ensuring that MLPs respect the inherent symmetries of the problem has long been considered essential [4, 5, 10–12]. The simplest approach to constructing invariant models is to use invariant features from the start, for instance interatomic distances or angles, that however leads to models with reduced descriptive power [13]. Alternatively, models can rely on an equivariant architecture [14]: Internal features are constructed to transform with the coordinate frame, and can then be combined into invariants for the final energy prediction. Most state-of-the-art MLPs, for instance Nequip [15], Mace [16], TensorNet [17], or So3krates [18], are based on this type of architecture. However, ensuring equivariance imposes severe constraints on model architectures [19–21], and general equivariant operations can become computationally costly in practice.

For this reason, there has been growing interest in 'unconstrained' models that relax the requirement of global invariance (and internal equivariance), and that are used widely, and with great success, in computer science for tasks involving the classification of point clouds [22–24]. Even in the field of atomic-scale modeling, recent work has shown that unconstrained models can achieve competitive accuracy on benchmark datasets when compared with invariant models, both for constructing MLPs [25] and for tasks involving the prediction of the secondary structure of polypeptides [26]. However, good predictive performance on static tests is not sufficient to evaluate the practical usefulness of a given model architecture [27]. Symmetries are associated with conservation laws that are beneficial for the numerical stability of algorithms [28], and whose violation can occasionally lead to manifestly absurd simulation outcomes [29, 30]. This work investigates the impact of relaxing the rotational invariance constraint in MLPs, to what extent approximate invariance can be learned from the data, and whether it is sufficient to avoid artefacts in the results of simulations.

2. Methods

We use the Point Edge Transformer (PET) architecture [25] that is exactly invariant to translations and atom index permutations but not to rigid rotations to train a MLP for bulk water. We use the training set from [31] that contains 1593 configurations computed at the revPBE0 [32, 33] level of theory, including D3 dispersion corrections [34]. The details of the model and the training protocols are discussed in the supplementary material, and can also be found in the data record associated with this publication. For the purpose of this study, it is important to stress that—as in [25], and as standard practice in fields using non-symmetric architectures—rotational data augmentation is performed during training: For each epoch, a different random orientation is chosen for every structure. The resulting model achieves a smaller error test-set than a state-of-the-art equivariant model: 74 meV energy mean absolute error (MAE) and 17 meV Å⁻¹ force MAE, against 120 meV and 21 meV Å⁻¹ for Nequip [35] (see also the supplementary material for a more thorough comparison). It also exhibits a very small deviation from equivariant behavior (about 8 meV for energies and 2 meV Å⁻¹ for forces).

In addition, to assess the impact of this small symmetry-breaking on the model accuracy without changing its architecture, we implement an inference-time approximate symmetrization scheme, i.e. averaging predictions over multiple rotations, based on systematically-convergent grids over Euler angles [36]. Taking the base model y(A) (where A identifies an input, e.g. a structure, and y the predicted property, e.g. the energy), a rotationally averaged version is defined as:

where $\hat{R}_k$ indicate the uniformly-distributed rotation operators and w_k the associated quadrature weights, so that $\hat{R}_k A$ is a rotated version of the structure for which we are making a prediction. For PET, this means that all interatomic displacements $\mathbf{r}_{ij}$ , which are the model inputs, are multiplied with a rotation matrix $\hat{R}_k$ ; predicted forces are then rotated back into the original frame by multiplying with its inverse $\hat{R}_k^{-1}$ . This kind of transformation is inexpensive, and can be implemented easily at the level of the model, or directly into a simulation engine such as i-PI [37]. We indicate the grid size with the notation $N[{\mathrm{i}}]$ , where $N\unicode{x2A7E} 2$ is an integer that indicates the subdivision of the Euler angles, and ${\mathrm{i}}$ indicates that the grid is duplicated to also include the corresponding improper rotations $\{-\hat{R}_k\}$ . Grids labeled by 2, $2{\mathrm{i}}$ , 3, $3{\mathrm{i}}$ contain 18, 36, 75, 150 rotations respectively. By increasing the grid size, the model can be made as close to exactly equivariant as desired, at correspondingly increased computational cost. For instance, equivariant force errors are reduced 20-fold, to below 0.1 meV Å⁻¹ when using a 2 ${\mathrm{i}}{}$ grid (see also the supplementary material). We note that several approaches [25, 38–45] achieve exactly rotationally invariant or equivariant predictions by defining coordinate systems rigidly attached to the input configuration and invoking the (possibly non-symmetric) model for each of them. Since these coordinate systems rotate synchronously with the structure, the final predictions remain exactly invariant. However, practical issues include lack of smoothness [39–42], the need for a stochastic formulation [45], or being limited to rigid bodies [38] or finite systems. Extending most of these methods to periodic bulk configurations remains an unresolved issue. The symmetrization scheme in [25] supports periodic configurations but is computationally efficient only for local models. For message-passing schemes like PET, it incurs high computational cost or requires modifications of the model architecture (see the supplementary material).

3. Results

3.1. Isolated molecule

As a first, and perhaps the most extreme, test, we run constant-energy molecular dynamics (MD) simulations for a water molecule in vacuum. Given that the model is trained exclusively on bulk structures, this amounts to a deep extrapolative regime, and allows us to test the most direct consequence of the lack of rotational invariance—break-down of angular momentum conservation. A well-known result of classical mechanics states that for a system whose energy does not change under rotation, the total angular momentum L is constant, so that its time derivative (the torque τ) is zero.

A first observation is that the potential is very stable despite the extrapolative conditions, and can be run for several nanoseconds with energy conservation consistent with the time step of 0.5 fs and the use of single-precision arithmetic. It is worth stressing that PET is exactly conservative (as the forces are computed as the derivative of the potential energy) and that it fulfills exactly the constraints of translational and permutational invariance. The symmetry breaking is however apparent in the precession of the angular momentum L (figure 1(a)) that is a consequence of the non-zero torque acting on the molecule despite the absence of an external potential (figure 1(b)). The torque τ is almost orthogonal to the angular momentum, so the angular velocity is almost constant. The small fluctuations of the total momentum are an indication of the coupling of the non-equivariant terms with the internal degrees of freedom of the molecule. Rotational averaging mitigates symmetry breaking, and systematically reduces the magnitude of $|\boldsymbol{\tau}|$ —which does not eliminate precession, but slows it down dramatically, and effectively eliminates the fluctuations of $|\mathbf{L}|$ .

Figure 1. Refer to the following caption and surrounding text. — **Figure 1.** Simulations of a water molecule using a rotationally non-equivariant PET model. (a) Trajectories of the angular momentum components L_α (dashed lines) and modulus (solid lines) during constant-energy molecular dynamics, for the model without symmetrization (red) and with rotational averaging over a $2{\mathrm{i}}$ grid. (b) Mean value of the torque τ acting on the molecule over a constant-temperature simulation, for different orientation grids (using the notation $N[{\mathrm{i}}]$ ). (c) Power spectrum computed from the autocorrelation function $c_{VV}(\tau) = \int V(t) V(t-\tau) \, \textrm{d} t$ of the potential energy, and on the non-equivariant part of the potential Δ (computed as the difference between the raw model and a $2{\mathrm{i}}$ average). (d), (e) Orientational free energy for the water molecule computed over a long constant-temperature simulation without (d) and with $2{\mathrm{i}}$ rotational averaging (e).
Download figure:
Standard image High-resolution image

$|L|$ — **Figure 1.** Simulations of a water molecule using a rotationally non-equivariant PET model. (a) Trajectories of the angular momentum components L_α (dashed lines) and modulus (solid lines) during constant-energy molecular dynamics, for the model without symmetrization (red) and with rotational averaging over a $2{\mathrm{i}}$ grid. (b) Mean value of the torque τ acting on the molecule over a constant-temperature simulation, for different orientation grids (using the notation $N[{\mathrm{i}}]$ ). (c) Power spectrum computed from the autocorrelation function $c_{VV}(\tau) = \int V(t) V(t-\tau) \, \textrm{d} t$ of the potential energy, and on the non-equivariant part of the potential Δ (computed as the difference between the raw model and a $2{\mathrm{i}}$ average). (d), (e) Orientational free energy for the water molecule computed over a long constant-temperature simulation without (d) and with $2{\mathrm{i}}$ rotational averaging (e).
Download figure:
Standard image High-resolution image

Another important observation is that the non-equivariant component of the potential (estimated as the difference between the single and rotationally averaged predictions of the model for each structure) shows fluctuations that are not only much smaller than those of the actual potential, but also slowly-varying (figure 1(c)). This means it is possible to apply multiple time-step (MTS) methods [46] and avoid evaluating the averaged model, which is computationally more demanding, at every MD step. In all of the constant-temperature simulations performed in this work that use rotational averaging, we use the MTS implementation in i-PI [47], with an inner time step of 0.5 fs for the base model and evaluate the rotationally averaged forces every 10 steps, so that the effective overhead for the 2 ${\mathrm{i}}{}$ grid is reduced from 36 × to about 4 ×.

Angular momentum precession for an isolated system is a telltale sign of $\mathrm{SO}(3)$ symmetry breaking, but precise classical dynamics is only relevant in few molecular applications, such as the study of gas-phase chemical reactions [48, 49]. A much more common scenario involves simulations that sample a thermal distribution and compute statistical averages over the trajectory. In this case, a clear signature of broken symmetry would be a preferential absolute orientation of the water molecules in space. We assess this by computing a histogram of the polar angle of the molecular orientation (defined as the vector connecting the oxygen atom with the mid-point of the two hydrogen atoms), over a long (10 ns) trajectory that is supplemented with an efficient colored-noise thermostat [50] to sample a classical Boltzmann distribution at $T = {300}\,\textrm{K}$ . The histogram can then be re-cast as a free energy that should be constant throughout the spherical coordinate system. As shown in figure 1(d) there is indeed a significant (but tiny) inhomogeneity, of the order of a fraction of $k_{\mathrm{B}} T$ . Rotational averaging brings the anisotropy down to the level of statistical noise (figure 1(e)), which is consistent with the sharp reduction in the value of the torque seen in figure 1(b).

3.2. Liquid water

We now move to the more typical use case of simulations of bulk water, to assess whether the small, but measurable, violation of isotropy for the isolated molecule has a more significant impact on the collective behavior of matter in the condensed phase. We run classical MD trajectories (10 ns in total) of a relatively large box (512 molecules) at room temperature, in the NVT ensemble at $T = {300}\,\textrm{K}$ , using a stochastic velocity rescaling thermostat [51] that has a negligible effect on dynamical properties [52]. Given that periodic boundary conditions make it impossible to define and monitor a conserved angular momentum, we look for a signature of non-equivariant behavior in the absolute orientation of the water molecules. The free energy profile is almost perfectly isotropic even without rotational averaging (figures 2(a) and (b)), indicating that not only there are no collective effects that generate spurious molecular orientations, but that for thermodynamic conditions that are well-represented in the training data, the PET model is even closer to being exactly equivariant. We can further assess indirect effects of a small symmetry breaking by computing structural properties of the liquid, such as the pair correlation function and dipole–dipole correlations. These quantities, which depend subtly on the relative position and orientation of pairs of water molecules, are essentially left unchanged by the application of inference-time averaging (figures 2(c) and (d))—indicating that the lack of exact equivariance in the raw PET predictions is inconsequential. Even though it appears that structural properties of water are perfectly converged without rotational averaging, one may wonder if dynamical properties, which are strongly dependent on the height of energy barriers, would be more sensitive to the broken rotational symmetry. Figure 3 shows that this is not the case: Both translational and orientational diffusion are identical within the statistical error, regardless of whether the PET model is made more equivariant by averaging over a grid of rotations.

Figure 2. Refer to the following caption and surrounding text. — **Figure 2.** Structural properties of liquid water at $T = {300}\,\textrm{K}$ , simulated with a PET model with and without $2{\mathrm{i}}$ rotational averaging. (a), (b) Orientational free energy for the water molecule computed over a long constant-temperature simulation without (a) and with (b) $2{\mathrm{i}}$ averaging. (c) O–O pair correlation function. (d) Molecular orientation correlation function, computed separately for the longitudinal, i.e. parallel to a vector r connecting both molecules, (full lines) and transverse, i.e. orthogonal to r, (dashed lines) components.
Download figure:
Standard image High-resolution image

**Figure 2.** Structural properties of liquid water at $T = {300}\,\textrm{K}$ , simulated with a PET model with and without $2{\mathrm{i}}$ rotational averaging. (a), (b) Orientational free energy for the water molecule computed over a long constant-temperature simulation without (a) and with (b) $2{\mathrm{i}}$ averaging. (c) O–O pair correlation function. (d) Molecular orientation correlation function, computed separately for the longitudinal, i.e. parallel to a vector r connecting both molecules, (full lines) and transverse, i.e. orthogonal to r, (dashed lines) components.
Download figure:
Standard image High-resolution image

Figure 3. Refer to the following caption and surrounding text. — **Figure 3.** Dynamical properties of liquid water at $T = {300}\,\textrm{K}$ , simulated with a PET model with and without a $2{\mathrm{i}}$ rotational averaging. (a) Oxygen mean-square displacement curves, whose slope is proportional to the diffusion coefficient. (b) Dipole autocorrelation function, which is indicative of the rotational dynamics of water molecules.
Download figure:
Standard image High-resolution image

3.3. Hexagonal ice

As a final test, we consider the energetics of proton disorder in hexagonal ice. This amounts to an intermediate degree of extrapolation: Even though the training set contains only amorphous structures, it has been shown that models trained on liquid water are also capable of describing, with good accuracy, the solid portion of the phase diagram [53]. It is also a problem for which small energy differences matter and a case in which a small preference for a particular orientation could easily lead to macroscopic distortions upon relaxation. We consider 9 proton-disordered cells from [54], and optimize the geometry using a raw PET model and a $2{\mathrm{i}}$ rotational average. Once again, the practical impact of approximate equivariance is negligible. The forces on the initial structures (that are of the order of 1 eV Å⁻¹) differ by less than 1 meV Å⁻¹ between standard and rotationally averaged PET. Even though individual proton-ordered structures have energies that differ from each other by only 0.3 meV/molecule [55], relative energies are predicted by the base model with an error that is an order of magnitude smaller, about 0.02 meV/molecule. The relaxed geometries have a minuscule root mean squared distance (RMSD) o of about 0.001 Å/atom.

4. Discussion

Our tests show that applying random rotations during training, i.e. standard data augmentation, can be sufficient to achieve a very high degree of approximate equivariance. There are essentially no measurable effects on the static and dynamical properties obtained in the interpolative regime; the potential remains stable and very close to equivariant even when extrapolating to a completely different thermodynamic state point, from bulk water to a single gas-phase molecule. We suggest that rotational averaging during inference (either using a regular grid as we do here, or with exact symmetrization techniques that can restore rigorous equivariance [25, 56]) can be used as a safeguard and a sanity check. The associated overhead can be reduced by using a multiple-time-step integrator, or by only computing the symmetrized potential occasionally to monitor the discrepancy with the non-symmetric model. Furthermore, there are several strategies one could apply to obtain an equivariant description avoiding this inference-time overhead entirely. For example, one can apply a random rotation before each PET evaluation, similar to what is done during training, so that during a simulation the potential is on average independent of the absolute orientation. This introduces a (small) noise that disrupts energy conservation, but can be controlled with gentle thermostatting—a strategy that is used routinely in atomistic simulations to control errors due to incomplete convergence of self-consistent algorithms [57] or sampling errors in quantum Monte Carlo [58]. This process of random rotations leads to simulations of bulk water that are free of preferential orientation effects. The small level of noise on the force is perfectly compensated by a mild, dynamics-preserving, stochastic velocity rescaling thermostat, which leads to structural and dynamical observables that are indistinguishable from those obtained with explicit rotational averaging, as shown in the supplementary material. Another possibility that would be relevant where obtaining a high level of equivariance is more important than the sheer accuracy of the energy and force predictions, is to modify the training loss to explicitly penalize symmetry breaking, e.g. evaluating the same structure over multiple orientations and requiring each prediction to match the rotational average. This can help to push the degree of equivariance below the residual regression error and can also be done for out-of-sample structures for which reference properties have not been computed, serving as a form of regularization [59].

Obviously, our observations are specific to the PET architecture and the systems we considered, but they add to a growing body of empirical evidence indicating that the practical impact of neglecting rotational symmetry is usually small. The success of unrestricted models in other applications of geometric deep learning and computer vision [22, 60] is a clear example, as well as the minute effects resulting from the application of an exact symmetrization scheme on validation errors in a previous study of the PET architecture [25]. The fact that rotations form a compact group with a low dimension may contribute to the ease by which $\mathrm{O}(3)$ symmetry can be learned from relatively small data sets. One should however keep in mind that no amount of testing can guarantee that there are no corner cases, or adversarial examples, in which a broken-symmetry model would lead to grossly unphysical predictions. The angular momentum precession of the isolated water molecule is a clear—although perhaps contrived—example. This study provides some confidence to computational physicists investigating promising non-equivariant architectures, and demonstrates simple schemes to monitor and improve the compliance with symmetry constraints at little to no cost. Despite the unquestionable appeal of incorporating fundamental physical concepts in the architecture of machine-learning models, it might be beneficial—and it certainly is not as detrimental one would expect—to just let the models learn.

Acknowledgments

M L and M C acknowledge funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme Grant No. 101001890-FIAMMA. S P and M C acknowledge support from the NCCR MARVEL, funded by the Swiss National Science Foundation (SNSF, Grant Number 182892) and from the Swiss Platform for Advanced Scientific Computing (PASC).

Data availability statement

The data that support the findings of this study are openly available. The PET code is freely available at https://github.com/spozdn/pet/. Templates for the different tests we present, post-processing scripts, raw figure data, and weights for the trained PET model, are available at doi: 10.24435/materialscloud:kz-3b [61] and https://github.com/sirmarcel/eqt-archive. Additional information can also be found at https://marcel.science/eqt. A tutorial introduction to the equivariance tests presented in this work is available in the Atomistic Cookbook at https://atomistic-cookbook.org.

Please wait… references are loading.

Supplementary data (0.6 MB PDF)