Philip Hodgetts’ unique blend of business and production knowledge gives him insight into the current state of the industry, and a remarkably accurate look forward. Here he shares his thinking, and points to articles of interest from other sites, with context as to why they're interesting.

Maybe 10 Years is Enough for Final Cut Pro X

On the night of the Supermeet 2011 Final Cut Pro X preview I was told that this was the “foundation for the next 10 years.” Well, as of last week, seven of the ten have elapsed. I do not, for one minute, think that Apple intended to convey a ten year limit to Final Cut Pro X’s ongoing development, but maybe it’s smart to plan obsolescence. To limit the time an app continues to be developed before its suitability for the task is re-evaluated.

Final Cut Pro X was the outcome of a multi-year process with Apple to “reimagine” what an NLE should be in a post-tape and post film world. I understand it was not smooth sailing during the process. Ultimately though some very clever thinking went into the design (as you will see in Bradley Olsen’s Off the Tracks documentary, Premiering in LA this week).

In the seven years since FCP X was released we’ve had another sea change that will be as important as the digital transition away from tape and celluloid. Artificial Intelligence, or more accurately Machine Learning.

Adobe have already included some of Adobe Sensei (their branding for applied Machine Learning in their ecosystem) into recent releases of Premiere Pro CC for seemingly mundane things. While we’ve been trying to work out whether or not “AI” is going to replace editors, Adobe have shown how it can help editors with everyday tasks, like color or audio matching.

Which raised the question: How much can be integrated before the app falls apart. I don’t mean specifically Premiere Pro CC, but in general, how much intelligent assistant can be built into an NLE in a way that integrates with existing structures?

Now, FCP X isn’t badly positioned for smart metadata extraction. Thanks to the inclusion of Content Auto Analysis in the first release, the basic plumbing for analyzing content and returning keyword ranges is there and could – I think – quite easily be repurposed for smarter Content Auto Analysis running on Machine Learning models via CoreML running on the GPU. For a start it would be fast enough to make it useful!

I also have no doubt that engineers at Adobe, Blackmagic Design and even Avid could make things work, but would it be the right approach.

As I get an increasingly clear vision of how Machine Learning is going to first integrate, I do see it as being more a human/machine partnership. Trained Machine Models embody a certain (limited) expertise. Adobe’s color matching toolset appears way better than anything I would be capable of, but probably not better than a professional colorist could do.

But if get 95% of the way with a button click (and a machine trained for the task) how many people are going to do it manually. (I will add that Adobe has full manual over-ride of the results.) If I can automate ducking music under dialog, would I bother doing it manually?

Take this a little further. Taking some of the tools working in a lab setting now, and extrapolating a little, I can see a future tool that has computer vision to recognize the content of shots – identifying people and tagging the b-roll; it has speech to text integrated and full keyword, concept, product, and entity identification; most of the more technical processes have been automated under an intelligent assistant built into the app (and likely build on OS foundation technology).

With this (as yet imaginary) toolset the editor could (literally) ask for all shots where person x spoke about subject D in a timeline. You would be able to ask the assistant for other shots containing the person, or the concept, or word, that you’re looking for.

You would be able to ask for matching b-roll that has open space to the right (because you want to put something graphic there).

With one click or request, the color would be matched across your timeline with the color setting you chose (or copied from a stock image, movie, etc). Audio would similarly have levels and tonality matched upon request.

There will be so much more that our built in assistant editor will be able to help with, that the editor will be free to focus on the creative. This will be a huge assist to most editors, and will be completely the opposite of what “Hollywood Professional Editors” want, and I’m OK with that. It’s an important niche market that requires very specialized tools, and Avid is the entrenched provider and will be so for the foreseeable future. It’s not where the millions of users of Premiere Pro CC and FCP X are.

The millions of users not working in “Hollywood” will be all over this next generation of intelligently assisted professional editing tools. (Of course I expect their consumer counterparts to go even further with the intelligent assistant concept into smart templates.)

But I don’t think any current NLE is ready to have all that retrofitted. So, maybe ten years from an app is enough time before it too, needs to be reimagined for a new generation. (Final Cut Pro classic was 12 years from launch to death notice.) Our industry is evolving ever faster. Why shouldn’t our tools?

No tags

4 comments

“With this (as yet imaginary) toolset the editor could (literally) ask for all shots where person x spoke about subject D in a timeline”

100% demonstratable TODAY. Not just in a timeline, but across an entire storage topology. Not Just Person X, but, PersonX while in Location Z that says “this particular sentence” somewhere close to “this other sentence”.

Matching B-roll is a little farther out, but people are working HARD on that one right now.

These tools are generally too expensive today to build into an individual NLE, but are certainly available for purchase NOW from your friendly local reseller. They are implemented at a higher level, inside of the big asset management systems that join multiple editors together into more cohesive teams.

Where this becomes more of an editors-tool is going to be in a SAS model. We will see rentable time on cloud servers (actually already available as well) to scan and return massive troves of metadata that local editors will use to increase the throughput and quality of their productions (whether the rough is put together by an algorithm or not).

This situation pretty well mirrors an earlier transition for production professionals… the introduction of automatic circuitry in video cameras. Prior to a certain time — a very similar threshold in technology development — virtually every parameter of a professional-grade camera was adjusted manually, from the overlay alignment of the three color channels, to the inherent contrast ratio (“gamma”) of the medium itself. Terms like white and black clippers, beam voltage adjustments for white “blooming”, and “image enhancer” setup to add apparent sharpness were routine on-set tweaks on every production. As automatics debuted, similar questions were asked: robbed of the rigors of “proper” engineering skills, would this new level of automatic idiot-proofing serve to dumb-down a generation of camera operators, DP’s and directors?

I suspect the answers to that old controversy and this new one are similar: Yes, you’ll successfully complete your projects if you solely rely on automatic adjustments and AI-assisted editing. But if you learned those long-form, old school skills, you’ll posses a depth of understanding and intuition that will carry you well beyond those who don’t.