Why are artificial neural networks black boxes

Concept Whitening brings light into the black box: New technology enables
Insight into deep learning models

Deep neural networks can do wonderful things thanks to their extremely large and complex network of parameters. But their complexity is also their curse: the inner workings of neural networks are often a mystery - even for their creators.

In parallel with the expansion of deep learning in various areas and applications, there has been a growing interest in developing techniques that attempt to explain neural networks by examining their results and learned parameters. But these explanations are often flawed and misleading. In addition, they offer little information on how to correct possible errors that are embedded in deep learning models during training.

In a paper published in the journal Nature Machine Intelligence, Duke University scientists suggest "Concept whitening " in front. This technique can help steer neural networks towards learning specific concepts without compromising performance. Concept whitening enables the interpretability of deep learning models instead of searching for answers in millions of trained parameters. The technique that can be applied to convolutional neural networks is showing promising results and can have a huge impact on how we perceive future research in artificial intelligence.

Post hoc explanations of neural networks

Many deep learning explanation techniques are post hoc. H. they try to make sense of a trained neural network by examining its output and its parameter values. Although these methods are helpful, they still treat deep learning models like black boxes and do not paint a clear picture of how neural networks work.

The aim of Concept Whitening is to develop neural networks whose latent space is aligned with the concepts relevant to the task. This approach makes the deep learning model interpretable and makes it easier to find out the relationships between the features of an input image and the output of the neural network.

Burning of concepts into neural networks

Deep learning models are usually trained on a single data set with annotated examples. Lightening up concepts introduces a second set of data that contains examples of the concepts. These concepts are related to the main task of the AI ​​model.

In concept whitening, the deep learning model goes through two parallel training cycles. While the neural network adjusts its overall parameters to represent the classes in the main task, concept whitening adjusts specific neurons in each layer in order to match them with the classes contained in the concept dataset.

The result is an unraveled latent space in which the concepts in each layer are neatly separated and the activation of the neurons coincides with their respective concepts; the neural network becomes less prone to obvious errors.

The architecture of Concept Whitening can be easily integrated into existing deep learning models.

More about this under