Musical Instrument Retrieval in Video

This project proposes a framework for the automatic annotation of musical instruments in video. Concretely, given a recording of a live performance or a music video, our system aims to provide bounding box annotations for any musical instruments within each frame, along with their fine-grained product categories. This is achieved by localizing instruments through object detection, followed by similar instance retrieval over an instrument dataset.