WASP Artificial Intelligence

There are two research programs in the artificial intelligence part of WASP: WASP-AI/MLX and WASP-AI/MATH. Their focus is on machine learning and the mathematical foundations of AI.

Today machine learning is the by far most talked about application of AI. It is used in, for example, self-driving cars, for image recognition, and social media feeds. Machine learning is, together with deep learning and next-generation/explainable AI, the research topic in WASP-AI/MLX.

In WASP-AI/MATH, the emphasis is on the mathematical foundations of AI. One aim is to provide a mathematical theory of the fundamental building blocks of AI. The fast progress in AI and ML has been based on best practices. The mathematical understanding has not been able to keep up with recent technological development. The object of WASP-AI/MATH is to develop and formulate underlying mathematical concepts and theorems. Distilling the mathematical ideas behind different successful applications leads to a deeper understanding of the field.

In addition to the two research programs, the artificial intelligence part of WASP (WASP-AI) consists of a recruitment program, an industrial PhD student program, and a track in the WASP Graduate School. The recruitment program contains recruitment of new faculty in machine learning and the mathematical foundations of AI as well as the Wallenberg Chairs in AI program.

Representation learning and grounding
All ML algorithms depend on data representation. Efficient and appropriate data representation can enable a better understanding of parameters that affect variations in the data. Representations can be tailored and are dependent on the domain in which classification or prediction algorithms are deployed.

Recent techniques on representation learning consider unsupervised learning, deep learning, including advances in probabilistic models, auto-encoders, manifold learning and deep networks.
One of the focus areas of WASP-AI/MLX is the development of models that encode various levels of abstraction, that are based on multimodal input data.

Sequential decision-making and reinforcement learning
Traditional ML has focused on pattern mining. Executing actions in the real world requires decision-making in real-time based on multimodal input.

Reinforcement learning enables learning new tasks through experimentation, feedback and rewards. Most of the current reinforcement learning algorithms work well with discrete data, while continuous data still represents a challenge.

Learning from small data sets, GANs and incremental learning
In most realistic systems, new data become available in new situations and as new users use the system. Thus, the input data can be used continuously to extend and update the existing model. This is also closely related to the domains and problems where large amounts of data are not available at the start, but where the system learns, like in reinforcement learning, to balance between exploration and exploitation.

Methods for adding and merging new data in the existing representation need further development in terms of multimodal inputs, a combination of continuous and discrete data, as well as assessing the performance as the system is being updated given the new data.

A significant challenge is to be able to develop systems that can provide meaningful and calibrated notions of their uncertainty, explain the decisions in meaningful ways and learn from negative examples.
In addition, the systems need to be able to pursue long-term goals and reason on which new data is needed to achieve these. In this respect, more recent methodologies such as adversarial methods, and GANs more specifically, are also of interest to address the data generation problem.

Multi-task and transfer learning
Transfer learning is the ability of a learning algorithm to exploit similarities between two different domains in which learning is occurring.
The knowledge or representation can be transferred between domains in order to speed up learning.

Recent work develops frameworks where some auxiliary intermediate domain data is selected to bridge between the given source and target domains and then performs knowledge transfer along the bridge.

Verifiable, rigorous methods
Increasingly AI is used for important task and without a thorough understanding of the inner workings of AI it is impossible to predict a model’s behaviour on previously untested input data. Developing a firm theoretical foundation of reachability analysis for AI is a challenge for mathematics.

Reproducibility/robustness
Large-scale numerical computations that are performed on heterogeneous platforms are known to be non-deterministic, thus producing different results when repeatedly run with the same initial data. Large AI models share this problem.

How can we find an appropriate standard for reproducibility? What is the correct measure to use? Which properties will guarantee that a system produces reproducible results, and which properties will prevent this? How robust is a model to small perturbations of its parameters, or of small changes to the training data? These questions are essential to address and require fundamental research.

Optimization
Here the AI-community is facing significant challenges. Today, the workhorse of all training algorithms is based on gradient descent. Despite thousands of academic papers on this topic and countless ways of applying the method, the process still relies on trial and error. Matters are not helped by the apparent disconnect between the researchers in the field of AI and large parts of the optimization community. In AI there seems to be very little use of the highly advanced theory of nonlinear constrained optimization already developed within the optimization community. Promoting a transfer of knowledge to the field of AI would certainly improve matters.

Complexity
Understanding the computational complexity of a given class of problem instances helps us find – in a quantifiable manner – the right balance between training set size, model complexity, and prediction quality.

Complexity results in terms of worst-case and average-case scenarios will lead to a deeper understanding of the absolute potential of methods used for AI. Here, classical approximation theory has a significant role to play. Applying this large field of mathematics to AI will open for new possibilities and can give the necessary tools for solving these problems.