I am Dhaval Adjodah, a 5th year PhD candidate in the Human Dynamics group at the MIT Media Lab. My main advisor is Prof. Alex 'Sandy' Pentland and my committee also includes Neil Lawrence, Esteban Moro and Tim Klinger. I am also working with Prof. Yoshua Bengio at MILA. I do research in cognitive science (social cognition), machine learning (reinforcement learning) and network science.

A common technique to improve speed and robustness of learning in deep reinforcement learning (DRL) and many other machine learning algorithms is to run multiple learning agents in parallel. A neglected component in the development of these algorithms has been how best to arrange the learning agents involved to better facilitate distributed search. Here we draw upon results from the networked optimization and collective intelligence literatures suggesting that arranging learning agents in less than fully connected topologies (the implicit way agents are commonly arranged in) can improve learning. We explore the relative performance of four popular families of graphs and observe that one such family (Erdos-Renyi random graphs) empirically outperforms the standard fully-connected communication topology across several DRL benchmark tasks. We observe that 1000 learning agents arranged in an Erdos-Renyi graph can perform as well as 3000 agents arranged in the standard fully-connected topology, showing the large learning improvement possible when carefully designing the topology over which agents communicate. We complement these empirical results with a preliminary theoretical investigation of why less than fully connected topologies can perform better. Overall, our work suggests that distributed machine learning algorithms could be made more efficient if the communication topology between learning agents was optimized. Paper link

Symbolic Relation Networks for Reinforcement Learning

In recent years, reinforcement learning techniques have enjoyed considerable success in a variety of challenging domains, but are typically sample inefficient and often fail to generalize well to new environments or tasks. Humans, by contrast, are able to learn robust skills with orders of magnitude less training. One hypothesis for this discrepancy is that humans view the world in terms of objects and relations between them. Such a bias may be useful reducing sample complexity and improving interpretability and generalization. In this paper, we present a novel relational architecture which has multiple neural network sub-modules called relational units which operate on objects and output values in the unit interval. Our model transforms the input state representation into a relational representation, which is then supplied as input to a Q-learner. Experiments on a goal-seeking game with random boards show better performance over several baselines: a multi-headed attention model, a standard MLP, a pixel MLP and a symbolic RL model. We also find that the relations learned in the network are interpretable. Paper link

Bayesian Models of Cognition in the Wisdom of the Crowd

There is disagreement in the literature as to the role of social learning and influence on collective accuracy. We believe that a missing piece of this line of research is to model of how humans update their belief distributions. Using a novel dataset (of 17K predictions from 2K people) that was collected in a series of live online experimental studies, we use models inspired by the literature of cognitive science to investigate how individuals learn from each other, and how they build belief distributions of the future prices of real assets (such as the S&P 500 and WTI Oil prices). We find that individuals make strong distributional assumptions on the data used to update their beliefs, and that simple parametric learning models significantly outperform more sophisticated semi-empirical models. We then investigated what information individuals used to make such predictions, and we observe that individuals prefer using the distribution of their peers' belief estimates, instead of the past price distribution of the asset. Specifically, we find that using the social data available to users to construct likelihood and prior distributions to predict the posterior belief update of users leads to smaller residuals than when using the past price distribution. Finally, we create a metric that estimates how much each individual prefers to use peer belief distributions instead of the past price distribution to update their belief, and we use this metric to filter for more `social-learning' individuals in the group to see if their aggregate estimate is more accurate than that of the whole `crowd' in predicting the future asset price, their objective. We observe that filtering using our novel metric outperforms other previous works' metric of resistance to social influence - that we also reproduce - which indicates that the ability to process social information is beneficial to group accuracy. We extend our findings to a very different domain dataset where there is no explicit social information exposure, showing that our insights are likely general to collective learning tasks.

There is evidence that the network structure between agents can influence their performance. In this work, 1) we investigate the emergence of network structure by observing in-situ how agents choose which other agents to follow, 2) we observe the cognitive limitations associated with such decision processes, and 3) we demonstrate how to engineer bots that could be used to overcome those limitations and improve individuals’ performance. We do so using two very large and detailed social trading platforms (eToro and Darwinex) datasets where we can observe both the network structure and the performance of more than 100K individuals. We choose to study financial domains because of the clear performance signal (RoI) available at high frequency, and because of the need for traders to find peers who are running novel and profitable strategies. We found that, as in social contexts, traders are limited by cognitive bounds (Dunbar number) to follow only a number of given traders/strategies. Furthermore, their trading performance seems to depend on their exploration ability, i.e. how they dynamically manage this limitation by creating/destroying imitation relationships. Finally, we build synthetic strategies on the market (trading bots) that implement the same distributed Bayesian imitation model of human traders but have access to more relevant information, and observe that if humans were to cooperate with the bots by accepting their recommendations of which agents to learn from, they would obtain higher profits. Our results show that even in domains where timely processing of information using novel strategies is critical, cognitive limits exists but that their limitations can be overcome using appropriate machine intervention.