The video data (medical) I have is unlike data from Imagenet, so I think it best to train the weight from scratch.

Most medical imaging data, funnily enough, actually does benefit from fine-tuning from imagenet. The early layers learn to recognize various geometric shapes and patterns that are useful for most types of image data.