We have collected a database of musical features from radio broadcasts (N > 100.000). The database poses a number of hard modeling challenges including: Segmentation problems and missing metadata. We describe our efforts towards cleaning the database using signal processing and machine learning tools.