Human dressing assistance tasks present a multitude of privacy, safety, and independence concerns for the daily lives of a vast number of individuals across the world, providing strong motivation for the application of assistive robotics to these tasks. However, cloth dynamics are complex and predicting the results of planned interactions between a robot and garment can be challenging, invalidating manual controller design options. Additionally, the requirement of close proximity between the robot and human during dressing tasks introduces risk during design and testing and further increases the cost of collecting real world interaction data. We propose to simulate the assistive dressing task, and apply deep reinforcement learning to train both human and robot behavior policies represented by neural networks. We intend to do so by leveraging recent advances in physics simulation to synthesize large quantities of otherwise difficult to obtain interaction data. Our proposed approach is to model a number of human dressing strategies for each of a number of garment dressing tasks by training multiple distinct behavior policies with different reward functions, or by parameterizing a single behavior policy such that modulating the input parameters results in a space of behaviors (i.e. a universal policy). We will then consume these human behavior policies during training of a single assistive robot behavior policy for each dressing task capable of generalizing to a large space of modeled human behavior. Given that we have the luxury to choose an assistive robot in simulation, we propose to aggregate several simulated LBR iiwa robot arms, fixed to a virtual gantry around the human, into a robotic system controlled by a single behavior policy. We further propose to compute a volumetric representation of the controllability of a single LBR iiwa arm with respect to its root and use this to determine the relative placement of the arms in order to provide maximum assistance with individual tasks.