Direct Feedback Alignment Scaling
đź”— Source: arXiv
Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures
🚀 Technical Novelty
- Mechanism: Replaces backpropagation’s transposed weight matrix with a fixed random projection of the global error to each layer, enabling parallel credit assignment under synaptic asymmetry.
- Nuance: Overcomes prior limitations that confined DFA to simple fully-connected networks by demonstrating its viability across complex, modern architectures (Transformers, Graph Convolutions, NeRFs) in diverse domains.
đź’ˇ Yield
- Achieves backpropagation-parity performance across 8 tasks and 11 architectures; proves challenging tasks are solvable without weight transport when optimizer hyperparameters (e.g., Adam’s β2, LR schedules) are carefully adapted.
⚠️ Limitations
- Still requires an implausible global feedback pathway; maintains a performance gap with BP in complex settings like Transformers without extensive fine-tuning; retains minimal weight transport for embeddings and attention matrices.