Interests

Working Experience

KCG Holdings, Inc/Virtu Financial.

Quant Intern

As a quant intern in its Client Market Making Group, I improved the speed and functionality of its internal client market making trading simulator, did research on the price impact module of the simulator, as well as partially implemented this module. The code was written in Python/Cython/C++.

Celect, Inc. Jun 2016–Aug 2016 Boston, MA

Data Scientist/Algorithm Developer Intern in a start-up based out of MIT.

Final Project for Performance Engineering of Software Systems (6.172): Leiserchess. The project starts with a slow serial game-playing AI for Leiserchess-—a chess-like game with lasers. We speed up this AI by redesigning board / pieces representation using bit hacks, implementing parallel Principal Variation Search (NegaScout) which includes various optimization on synchronization with / without Locks, coarsening, caching, etc., implementing opening / endgame book, etc. Finally our bot achieves roughly 23 million nps and search depth 8 on average using 8 cores in Microsoft Azure. Finally tournaments were run among 26 bots contributed by 26 teams. Our bot was ranked 6/26 in the first submission and 4/26 in the second submission. The code was written in C/C++.

Clustering U.S. Census Data. We found demographic clusters in census data by building a categorical mixture model and fitting it using K-means, K-means++ and Expectation Maximization (EM) algorithms. Also, we trained the Gaussian Mixture Model (using
EM) of the demographics for each state in U.S. Using these models, we developed a novel way of measuring the similarity of two states’ populations. The code was written in Python (Numpy and Pandas).

Serial Dynamic Memory Allocation. We implemented a binned free list structure to provide a memory management unit, which consists of four major functions: init(), malloc(), free(), realloc(). Due to a lot of untyped pointer manipulation which makes dynamic memory allocators notoriously tricky to program correctly and efficiently, we also write a heap checker that scans the heap and checks it for consistency. Finally, our memory allocator was evaluated by space utilization and throughput against a number of memory allocation traces. Although our implementation consumes slightly more space than the implementation based on the single free list, our implementation is much faster. The code was written in C/C++.

Collision Detection: optimize a graphical screensaver program for multicore processors using the Cilk Plus parallel programming interface. The code was written in C/C++.

Bit Hacks: improve performance of programs using the perf tool and experiment with word-level parallelism. The code was written in C/C++.