18.
Parallel Matrix Factorization
§ Partition data into m partitions
§ For each partition
run MCEM algorithm and
get
.
§
§ Ensemble runs: for k = 1, … , n
– Repartition data into m partitions with a new seed
– Run E-step only job for each partition given
§ Average over user/item factors for all partitions and k’s to
obtain the final estimate

19.
Key Points
§ Partitioning is tricky!
– By events? By items? By users?
§ Empirically, “divide and conquer” + average over
obtain work well!
to
§ Ensemble runs: After obtained , we run n E-step-only
jobs and take average, for each job using a different useritem mix.

21.
Matrix Factorization For User Profile
§ Offline user profile building period, obtain the user factor
for user i
§ Online modeling using OLR
– If a user has a profile (warm-start), use
– If not (cold-start), use
as the user feature
as the user feature

33.
Large Scale Logistic Regression
§ Naïve:
– Partition the data and run logistic regression for each partition
– Take the mean of the learned coefficients
– Problem: Not guaranteed to converge to the model from single
machine!
§ Alternating Direction Method of Multipliers (ADMM)
– Boyd et al. 2011
– Set up a constraint that each partition’s coefficient = global
consensus
– Solve the optimization problem using Lagrange Multipliers
§ All-Reduce from Vowpal Wabbit (VW), Langford et al.
– Reducers talk to each other so that precise gradient can be
computed by aggregating all computations from each partition
(reducer).