MixSub, a text based video remixer

MixSubs, a text-based video mixer

Videos, as compared to text, have always been known to be more engaging and more easily consumable by large section of society. With internet speeds increasing around the world the ecosystem is ripe to exploit the real potential of videos as tool for expression. I am a big fan of Jon Stewart and his style of juxtaposing several videos to expose the lies and hypocrisy in media and politics. It is highly entertaining as well as enlightening. I tried to do the same thing for the 2016 election, following is an example of what I did:

As I was doing this and several other videos I realized that the video editing tools I used, Adobe Premier Pro in the above case, were making my job really hard. For example in the above video I had to make several cuts of people saying short sentences. It took me a long time just to cut the videos at the right time. This was very frustrating, why can’t I just copy paste the sentence from the video and be done with it. Another problem was watching many videos completely in order to find if it has anything I can use. Sifting through the subtitles would be so much quick. So I decided to create a video remixer that works on subtitles.

Initial Design

Here is a quick initial sketch of what I wanted to create:

The idea was that user would:

Bring in any video(s) he wants from Youtube.

Copy words or sentences from the subtitle file of the video, made available below the video itself

Paste that on a text editor kind of panel where he can sequence the entire thing and generate a video in the end.

Proof of concept

After some initial experimentation with Youtube Data API, I knew that I will have to download the videos in order to be able to manipulate them. So I decided to use Node.js to handle the server side. Here is how the technical process would work for POC:

User would enter a youtube URL.

I will download the video and subtitles from youtube using the NPM package youtubedl. I need the subtitles in srt format, but youtube gives the subtitles in vtt. So I’ll use npm package fluent-ffmpeg to convert the subtitle files from vtt to srt.

The interface will give a preview window for viewing the downloaded video.

Here is what an srt file looks like:

Therefore I know the start and end times of sentences rather than words, but that’s good enough for the POC purpose. The user will get a space to view and select these sentences to include in the final video.

Each sentence selected by the user will be stored in the backend as a Clip object which would contain information like video name, start time, end time etc.

User can do this for multiple videos i.e. select multiple clips from multiple videos. He can then rearrange the order of these clips and click on “generate video”.

I will then used the information in the clip object to combine the clips into one video using fluent-ffmpeg.

I was able to do all of the above and the editor worked great. Here is a video of how it is working now:

User testing with POC

Next Steps:

The POC was successful, now its time to convert this into a stable, well designed product which I can put out for everyone to use. So there are many next steps as the scope of the features that could be built on top of this could be enormous.

For Design

Come up with an MVP feature list. The project has been tech heavy up to now. But now that the tech POC is done, its time to go back to sketchbook and bring user back in focus.

Solve the user on-boarding issue. It is difficult for first time user to understand how to navigate the interface.

Better design for navigating the subtitle file. Adding and and deleting clips. A lot of design changes will be required when the ability to do a word level selection is added.

For Tech

Design will bring in a lot of tech requirements. Apart from those the performance of the system can be improved by cutting down on several processes. That needs to be investigated.

That’s it for the POC. Excited to make it into a fully functional product!