HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
- Sparse cost functions: each step of gradient descent only updates a small fraction of the weight vector (e.g. sparse SVMs)
- Metrics of sparsity
- maximum size of index set
- maximum fraction of nodes that contain a common node
- maximum fraction of nodes that overlap a common index set
- Algorithm:
- Randomly sample a vector
- Compute gradient
- Update appropriate entries
- Convergence is based on the sparsity metrics