27.05.2020 12:15 Reinhard Heckel (TUM):
Early stopping in deep networks: Double descent and how to mitigate it(using Zoom, see http://go.tum.de/410163 for more details) (Parkring 11, 85748 Garching)

Over-parameterized models, in particular deep networks, often exhibit a ``double-descent'' phenomenon, where as a function of model size, error first decreases, increases, and decreases at last. This intriguing double-descent behavior also occurs as a function of training time, and it has been conjectured that such ``epoch-wise double descent'' arises because training time controls the model complexity. In this paper, we show that double descent arises for a different reason: It is caused by two overlapping bias-variance tradeoffs that arise because different parts of the network are learned at different speeds.