Review on The Most Intriguing Paper on Deep Learning

Posted by Mohamad Ivan Fanany

Printed version

This writing summarizes and reviews the most intriguing paper on deep learning: Intriguing properties of neural networks.


  • Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks.
  • Their expressiveness is the reason they succeed but also causes them to learn uninterpretable solutions that could have counter-intuitive properties.

Addressed problem:

  • Report two counter-intuitive properties of deep learning neural networks.
  • The first property is concerned with the semantic meaning of individual units.
  • The second property is concerned with the stability of neural networks with respect to small perturbations to their inputs.

Dataset: MNIST, ImageNet (AlexNet), 10M images sampled from Youtube (QuocNet).

Previous works:

  • Previous works analyzed the semantic meaning of various units by finding the set of inputs that maximally activate a given unit.
  • The inspection of individual units makes the implicit assumption that the units of the last feature layer form a distinguished basis which is particularly useful for extracting semantic information.
  • Previous works considers a state-of-the-art deep neural network that generalizes well on an object recognition task can be expected to be robust to small perturbations of its input, because small perturbation cannot change the object category of an image.
  • Traditional computer vision systems rely on feature extraction: often a single feature is easily interpretable.
  • Previous works also interpret an activation of a hidden unit as a meaningful feature. They look for input images which maximize the activation value of this single feature [6, 13, 7, 4].

Inspiration from previous works:

  • Hard-negative mining, in computer vision, consists of identifying training set examples (or portions thereof) which are given low probabilities by the model, but which should be high probability instead [5].
  • A variety of recent state of the art computer vision models employ input deformations during training for increasing the robustness and convergence speed of the models [9, 13].

Key Ideas:

  • It is the entire space of activations, rather than the individual units, that contains the bulk of the semantic information.
  • By applying an imperceptible non-random perturbation to a test image, it is possible to arbitrarily change the network’s prediction.
  • These perturbations are found by optimizing the input to maximize the prediction error. The perturbed examples are termed as “adversarial examples”.
  • If we use one neural to generate a set of adversarial examples, we find that these examples are still statistically hard for another neural network even when it was trained with different hyperparemeters or, most surprisingly, when it was trained on a different set of examples.
  • The paper proposes a scheme to make input deformation process adaptive in a way that exploits the model and its deficiencies in modeling the local space around the training data.


  • No distinction between individual high level units and random linear combinations of high level units, according to various methods of unit analysis.
  • It is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.
  • Deep neural networks learn input-output mappings that are fairly discontinuous to a significant extend.
  • We can cause the network to misclassify an image by applying a certain imperceptible perturbation, which is found by maximizing the network’s prediction error.
  • The specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.
  • Deep neural networks that are learned by backpropagation have nonintuitive characteristics and intrinsic blind spots, whose structure is connected to the data distribution in a non-obvious way.

Proves that the individual units has no semantic meaning:

  • Base on experiments using convolutional neural networks trained on MNIST and AlexNet.
  • The experiments put into question the notion that neural networks disentangle variation factors across coordinates.
  • The results shows that the natural basis is not better than a random basis in for inspecting the properties of a last layer output unit.
  • The paper visually compared images that maximize the activations in the natural basis and images that maximize the activation in random directions. In both cases the resulting images share many high-level similarities.
  • The compared images appear to be semantically meaningful for both the single unit and the combination of units.

Reasonings that Deep NN is not stable to small perturbation on its input:

  • Unit-level inspection methods had relatively little utility beyond confirming certain intuitions regarding the complexity of the representations learned by a deep neural network.
  • Global, network level inspection methods can be useful in the context of explaining classification decision made by the model.
  • The output layer unit of a neural network is a highly nonlinear function of its input.
  • When the output layer unit is trained with the cross-entropy loss (using the softmax activation function), it represents a conditional distribution of the label given the input (and the training set presented so far).
  • It has been argued [2] that the deep stack of non-linear layers in between the input and the output unit of a neural network are a way for the model to encode a non-local generalization prior over the input space. In other words, it is possible for the output unit to assign non-significant probabilities to regions of the input space that contain no training examples in their vicinity.
  • Such regions can represent, for instance, the same objects from different viewpoints, which are relatively far (in pixel space), but which share nonetheless both the label and the statistical structure of the original inputs.
  • It is implicit in such arguments that local generalization—in the very proximity of the training examples—works as expected.
  • This kind of smoothness prior is typically valid for computer vision problems, where imperceptibly tiny perturbations of a given image do not normally change the underlying class.
  • Based on the some experiments in the paper, however, the smoothness assumption that underlies many kernel methods does not hold.
  • Using a simple optimization procedure, the authors are able to find adversarial examples, which are obtained by imperceptibly small perturbations to a correctly classified input image, so that it is no longer classified correctly. This can never occur with smooth classifiers by their definition.
  • We found a way to traverse the manifold represented by the network in an efficient way (by optimization) and finding adversarial examples in the input space.
  • The adversarial examples represent low-probability (high-dimensional) “pockets” in the manifold, which are hard to efficiently find by simply randomly sampling the input around a given example.


  • Deep neural networks have counter-intuitive properties both with respect to the semantic meaning of individual units and with respect to their discontinuities.
  • The existence of the adversarial negatives appears to be in contradiction with the network’s ability to achieve high generalization performance.
  • Indeed, if the network can generalize well, how can it be confused by these adversarial negatives, which are indistinguishable from the regular examples?
  • The explanation is that the set of adversarial negatives is of extremely low probability, and thus is never (or rarely) observed in the test set, yet it is dense (much like the rational numbers, and so it is found near every virtually every test case.

My notes and review:

  • The formal description of how to generate adversarial examples is given.
  • Spectral analysis on stability of deep NN is also given.
  • This paper is very enlighting for two reasons: (1) Two images that we see as similar are actually can be interpreted as totally different images (objects), and vice versa, two images that we see as different are actually can be interpretated as the same; (2) The deep NN still does not see as human sees. It seems human vision is till more robust and error tolerant. What actually makes us better than deep NN in this respect?
  • Even though it is stated that such adversarial images in reality are rarely observed, it is challenging to propose algorithms that can effectively handle the adversarial examples.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s