🔗 Source: arXiv

Align, then memorise: the dynamics of learning with feedback alignment

Mechanism: Identifies a sequential “align-then-memorize” dynamic where forward weights first adapt to random feedback weights to approximate true gradients, then shift focus to data fitting.
Nuance: Moves beyond static alignment assumptions by quantifying how matrix conditioning and layer-wise progression dictate success on Transformers versus failure on CNNs, introducing a “degeneracy breaking” convergence property absent in prior analyses.

Proves DFA naturally converges to solutions maximizing gradient alignment; derives that well-conditioned error statistics are necessary for learning; empirically validates bottom-to-top layer-wise alignment progression on MNIST and CIFAR10.

Theoretical derivations rely on the infinite-data online learning limit and linear network assumptions; empirical validation is confined to standard vision benchmarks without large-scale architectural scaling tests; performance degrades with correlated targets or label noise.