CECAM Flagship Workshop: Machine Learning of First Principles Observables
Dr. Simone Köcher

Workshop Overview
Recently, Machine Learning (ML) methods have penetrated almost all research areas in materials modelling and high-throughput materials screening. And yet the ML triumph has so far mainly focused on developing surrogate models for the potential energy surface (PES) with superior computational efficiency while retaining first principles accuracy. The approach to learn observable properties directly is just emerging and is challenged by several issues, which we intend to address.
The event is meant to support the development of a new collaborative, international network connecting different fields of research and integrating the young researchers community with the help of a scientifically diverse, interactive workshop.
General Information
Organisers:
- Simone S. Köcher (IEK-9, Forschungszentrum Jülich GmbH)
- Angela F. Harper (Fritz-Haber Institut der Max Planck Gesellschaft)
- Hanna Türk (École Polytechnique Fédérale de Lausanne)
- Elena Gelzinyte (Fritz-Haber Institut der Max Planck Gesellschaft)
Topics / Sessions:
ML of electron density and Hamiltonians
ML of electronic observables
ML of mechanical & magnetic observables
ML of spectroscopic observables
ML of reaction networks
Theoretical and experimental databases
Objectives
The majority of materials modelling with ML methods represents the (PES) of a material based on the assumption, that the total energy of the system can be decomposed into atomic contributions, which in a first approximation are described as a function of the local atomic environment [1,2]. However, several observables such as charge transfer, dipoles, and the material’s properties in an applied electric field are inherently non-local properties. First approaches [3,4] are able to include long range interactions in models for interatomic potentials. It remains an open question if or how these or other methods can be adapted for ML models of non-local observables.
In contrast to the scalar potential energy of a system, many properties are either vectors, like dipoles, or high rank tensors, like electric field gradients, or more complex properties, like density of states. First implementations encode tensorial properties in either rotationally invariant or equivariant representations [5,6,7]. Other approaches aim to learn the entire electron density in order to derive the observables [8,9,10]. However, the practical application to physical observables is still very limited. An open discussion of the concepts is necessary and will provide an essential contribution to the dissemination of the methods within the community.
The majority of materials and their observables are unambiguously described by the atomic structure features. Some properties however also depend on spin states, magnetic arrangement or atomic charges. Including information like atomic magnetic vectors or atomic charges in the feature vector is challenging and not many approaches exist [11,12] to solve the problem, which needs to be addressed to model key properties like charge transfer and spin waves.
The accuracy and performance of any ML model depends critically on the extent and diversity of its training dataset. The different available databases [13,14,15,16] of ‘synthetic’ first principle material properties provide a valuable wealth of information, but approaches to combine data have yet to be developed. Other databases [17,18,19] complement the available information by experimental data. The question remains, whether or how to combine experimental and theoretical data on equal footing in order to pool the available resources. The workshop aims to tackle the challenges in the fledgling field of ML of observables covering spectroscopic, electronic, thermodynamic, magnetic, and mechanical properties as well as ML approaches to predict the electron density.
We gratefully acknowledge the support by CECAM, the Psi-k Charity, Deutsche Forschungsgemeinschaft, and the Max-Planck-Gesellschaft.



