Over-parameterized models, in particular deep networks, often exhibit a ``double-descent'' phenomenon, where as a function of model size, error first decreases, increases, and decreases at last. This intriguing double-descent behavior also occurs as a function of training time, and it has been conjectured that such ``epoch-wise double descent'' arises because training time controls the model complexity. In this paper, we show that double descent arises for a different reason: It is caused by two overlapping bias-variance tradeoffs that arise because different parts of the network are learned at different speeds.