I graduated from IIT Kharagpur with a Masters Degree majoring in Data Science and Machine Learning, under the supervision of Prof. Animesh Mukherjee. Before joining UC San Diego, I was a Research Engineer at Walmart Labs building large-scale NLP and Machine Learning applications for eCommerce. I actively collaborated with CNeRG, the NLP-Complex Network Group of IIT Kharagpur for research on NLP and CL.

I'm interested in building NLP models to tackle the case of low-resource and to achieve sample-efficiency. I try to design, develop and analyze generalized models for machine reading tasks to cater to domain and language adaptation. I also seek to obtain language representations which are domain-agnostic and can aid in low-resource machine reading. My previous research on NLP includes sequence labeling, sequence generation, and natural language parsers. I also worked on statistical modeling, game theory, and machine learning applications.

Selected reseach projects are listed here. The complete list of my publications can be seen from the Google Scholar page.

We propose a post-OCR text correction approach for digitising texts in Romanised Sanskrit. We find that the use of copying mechanism (Gu et al., 2016) yields a percentage increase of 7.69 in Character Recognition Rate (CRR) than the current SOTA model in solving monotone sequence-to-sequence tasks (Schnober et al., 2016)

This work presents the use of Adaptor Grammar, a non-parametric Bayesian approach for learning (Probabilistic) Context Free Grammar productions from data. We discuss the effect of using Adaptor grammars for Sanskrit language at word-level supervised tasks such as compound type identification, identification of source and derived words from the corpora for derivational nouns and sentence-level structured prediction.

We demonstrate the potential of neural recurrent structures in product attribute extraction by improving overall F1 scores, as compared to the previous benchmarks (More et al., 2016) by at least 0.0391. This has made Walmart e-commerce achieve a significant coverage of important facets or attributes of products.

This work was done at Walmart Labs and was followed by a US patent from Wal-mart.

We processed 18 million transactions consisting of unique 325,548 products from 1,551 categories to obtain vector representations which preserve product analogy. These representations were effective in identifying substitutes and complements.

How similar are the dynamics of meme based communities to that of text based communities? We try to explain the community dynamics by categorising each day based on temporal variations in the user engagement.

We seek to explore the domain adaptability of machine learning models in a sentiment classification task. We train our model on reviews from one domain and perform a study on how that works on reviews from a different domain. Our goal is to achieve a generalized representation of the input text in a way that it captures the general sense of discriminative words or expressions for sentiments irrespective of domains.