Poor performance, where's the bottleneck, what to upgrade?

I'm running Hitfilm Express 2017 on a Lenovo 320s laptop with 8gb ram, i5-8250u and geforce 920mx. I've undervolted the cpu and increased the max Watt in an attempt to make it run higher clock speeds. The 8250u base speed is 1,6ghz, boost is 3,4ghz. But I'm getting poor performance, and I'm not able to utilize cpu/gpu simultaneously.

A few examples. All drivers are latest version.

1.A 52sec video. Two 1080p/30fps composite clips reduced to 50%, covering 50% of the screen running in parallell. The rest of the screen is static text and white background.1.1 gpu 920mx, time 11m30secs, cpu 1,6ghz almost the entire time.1.2 gpu uhd620, time 7m30s, cpu at 2.8-3.2ghz most of the time

2.A 5m50sec film. All video 1080p/60fps composite. Quite a bit of static images and text, a 15sec sequence of 3 video clips in parallell and 3 clips of text tracking a point.2.1 gpu 920mx, time ~30m, cpu at 1.6 almost all the time2.2 gpu uhd620, time ~1h30m, cpu at about 3.0ghz and 25% load.

When exporting simple clips, like static background with text and raw 1080p/60fps clips, the UHD620 performs better than the 920mx. I'm assuming it's because the overall system temperature is lower with the UHD620, which allows the cpu to operate at a higher temperature. I've undervolted the cpu by 0,105V, that helps with the UHD620, but it makes no difference with the 920mx. Is there any way I can optimize the hardware performance on my laptop to run above base cpu speed while taking advantage of the gpu with the extra ram?

If not then I plan to get a dekstop/tower PC. And I guess I need some advice on how to balance cpu/gpu. My budget is limiting me to AMD Ryzen 5 1600 or Intel Core i5-8400.

Comments

You're focusing on the CPU and GPU and ignoring multiple other bottleneck factors.

What's your storage, SSD or HDD? This is important, especially if you're working in 4k. If you're working with 4k footage on an HDD then that's a bottleneck. Are you on a single drive? Reading multiple video files and writing to the same drive at once is another bottleneck.

What codec and container are your original footage? If you're bringing in some mp4 footage from a DSLR, screen recorder or phone you've got your source footage in a format that performs poorly. Performance improvements from CPU and GPU would happen by transcoding your footage to ProRes, Cineform or DNxHD--but those files will be significantly larger, so, see above paragraphs.

What are your export settings? The smaller you cram things (lower bitrate) the longer it takes to render.

Additionally, and bluntly, you're running a terrible graphics card. The 920m is basically at the same level as the intel HD620, and the HD620 is noticeably slower than the UHD620. Basically the integrated GPU in your CPU is better than the external card. In fact, the 920m is built using the tech of the 600-series. Other than having dedicated VRAM it's pretty useless (Future tip, any Nvidia card under the xx60 of its generation is a waste of money).

Now remember that video files are compressed and Hitfilm (any NLE has to decompress the files ti work with them. Uncompressed 1080/30p footage is 178 MEGABYTES of data per second of video. Double that for 60p. Multiply by every layer (after all its being rasterized to RAW pixels first.) Suddenly your "simple" project of three video files in parallel with animation of text, light flares and transitions is using a lot of resources... (Speaking of transitions, those cause slowdown in any NLE as your own test indicates--read your own description of tests 3 and 4 and you realize that last 40% slows down because that's where your transitions animation and composites happen).

So, looking outside CPU/GPU for a moment you can speed up performance and rendering by transcoding footage to one of the above mentioned codecs, having ingested files on one SSD and rendering to a second SSD.

Hitfilm cannot use more than one GPU at a time, so that part of your question is answered. You can't.

So. Your CPU is current generation and upgrading that to an i7 is not going to yield big performance increases. CPUs just aren't getting THAT much faster (an i7 8550u is about 5% faster). The GPU is another story. My laptop's 980m is about 700% faster than your 920m. An Nvidia 1070 is about twice as fast as my 980m. Upgrading that 920m to a 1070 is nearly a 15-fold increase of GPU power.

So. Here's more or less what Hitfilm does..

CPU reads files from drive. CPU decodes files. CPU passes all graphics data to GPU. GPU renders frame and passes back to CPU. CPU encodes frame and writes to drive. Repeat. This means you'll never see your CPU and GPU pegged to the wall at the same time as one is always idle.

Finally, video takes a lot of resources. Realtime rendering pretty much never happens. I've been doing this for over 20 years on machines ranging from no-GPU laptops to workstations with $30k of GPU, and once compositing starts happening, nothing renders real-time. Given your hardware, your render times don't seem excessively slow. Again, you just don't have good GPUs. Either one. I argue the GPU is usually the most important component. I actually put CPU third under storage. My general advice for purchasing a PC is to get the most GPU and SSD possible. If that means an i3 over an i7, vs HDD or SSD get the i3 and SSD. GPU comes first, however. Otherwise, 8GB of RAM is minimum. More is better, but RAM is cheap and can be upgraded later.

I tag @NormanPCN and @Aladdin4d. They're both stronger on tech than I, abd, if they contradict me, they're likely correct.

Hi, thanks for your reply. I'll read it thoroughly in a few hours. I just wanted to mention a couple of things. SSD sata disk, mp4 1080p for youtube mostly. That may change. Most footage from Gopro 5 blck hd.

Thanks again for your detailed reply. Yeah, the transitions, color corrections etc are definitely what's slowing down the export. 6mins for the first 60-65% and 25mins for the remaining. This is of course what I need to improve upon.

I assumed disk performance was not an issue since the disk utilization doesn't exceed 5% according to task manager / performance. But I guess it makes sense to have one disk for reading and one for writing, and watching the disk performance might not be a good indication of this kind of bottlenecking. My SSD is rated Internal Data Rate: 530 MBpsInternal Data Rate (Write): 500 MBps https://www.cdw.com/shop/products/Micron-1100-solid-state-drive-256-GB-SATA-6Gb-s/4220665.aspx

My gpu is the 920mx, it scores 40% higher than the 920m according to cpubenchmark and G3D Mark. The UHD620 scores about 9% higher than the 920mx though. Test 2. described in my initial post, which I've run a few times, has lead me to believe that the 920mx can seriously outperform the UHD620, possibly due to the VRAM. But further testing does not confirm that performance advantage, neither does the G3D Mark scores.

It's really interesting what you're saying about CPU vs GPU. I've read about what tasks are performed by which procesing unit, but I have noe idea what work load those tasks represent. After searching forums and youtube for a few days, I came to the conclusion that the money is best spent on the cpu. Someone claimed that for video editing in general, I could go AMD Ryzen 6 core HT (12 core) and Intel HD/UHD graphics. But I guess that depends on which software I'm running. So I guess Hitfilm will benefit from gpu. And they don't use Cuda, so Nvidia hasn't got an edge over AMD...?

I'm seriously considering building a dekstop. As I've said, I've been told CPU and cores are good, so I've considered the Ryzen 1600 6 core 12 thread. But I can very well find a cheaper CPU and a better GPU. Which will be better, more cores/threads or faster clock speed? I could go i3-8100 3.6ghz 4 core/thread, AMD Ryzen 3 1200 3.2ghz 4core/thread or AMD Ryzen 5 1400 3.2ghz 4core/8thread.

I was planning getting a Samsung 960 EVO 250GB M.2 PCIe SSD with readMax 3200MB/s and writeMax 1500. But I might be better off using two disks to separate the read and write tasks?

I was initially considering GTX 1050, both the 2gb and the Ti. But I guess the $ saved on CPU should push me towards the GTX 1060 or Radeon RX 570.

I/O bottlenecks are generally easy to figure. Windows resource monitor is probably good enough. Even simple math is probably good enough.

The system disk cache can cloud the issue, but in a good way. Improving I/O throughput in short looping situations. I once did a test with UHDp30 Cineform and composited 4 independent streams each from a separate file. This to demonstrate the good performance of Hitfilm 2017+ with Cineform. Those 4 streams are beyond the I/O throughput capability of my 7200rpm drive, but once cached, that data is effectively from a "ram" drive.

How to speed up Hitfilm. Faster CPU, especially in clock rate, and faster GPU. Basically faster everything. Don't diminish the need for a fast CPU. The CPU gets the timeline moving. If the timeline cannot make speed, then no amount of GPU power is going to help. The CPU also does file encoding (export).

Looking at utilization numbers? Well, Hitfilm has a hard time pegging those numbers in many circumstances, especially in export and ram preview. It is what it is, regardless of what reasons we might want to speculate on.

You can do simple tests on existing hardware. Doth bother with ram preview and export for tests.

Take your representative media. Make a few copies of a given example media file. Composite 3 or four items something like this.

The stagger gives clear incremental steps of one item of extra work at a time. If your I/O bottlenecks at 2/3/4 composite streams you will see it. If your CPU bottlenecks you will see it. In this situation Hitfilm can peg the CPU to the wall if you throw enough work/layers at it. What you are looking for is when playback starts to stutter. That is the point where you have hit a limit of some kind. If you don't commonly composite that many streams then don't bother with the test, except just to understand the limits of your machine+Hitfilm+media combo.

GPU becomes a real problem to test. There is a HUGE variation in the compute overhead/speed of different effects. Simple grading stuff like Curves, wheels, HSL, brightness, contrast and such are ultra fast. Even a very basic GPU can do those on 4kp30 work. Then you have something like Glow. Ouch! that thing really works your GPU hard.

In a test here just have a single media item and pack on the effects until playback stutters. I've never bothered with tests of simple effects. I have tested just how much my machine can do with Glow. In other words, how many Glows can be added before the system is overloaded. I only did this out of curiosity and a relational comparison to another NLE. I've not setup a test for anything else.

I recently created the Spooky Eyeball from a @Treim23 tutorial. That thing could not go real time playback, 1080p30, on my GTX 1080 GPU. I was disappointed and curious about what was up. It turns out it was the lightning & electricity effect. I had a *lot* of branches in the lightning. Cutting the branches count back a little got the timeline to real time speed.

Some effects are CPU bound and performance killers in LongGOP media. Most effects in the temporal group.

The point of all this is that if you are looking for a bottleneck then you have to construct a test to isolate something and look for a specific possible bottleneck while keeping other factors free flowing. If you just measure export performance of some project setup then all you can conclude is that ProjectX+Hitfilm+machine combo gives some performance in total time. Nothing can concluded about all the individual elements going on inside that black box combo.

Use playback stutter as a test factor. Where it plays smooth util you have added one more thing and then it breaks down. Timeline playback is clamped to timeline frame rate. 24 fps is easier than 30 fps which is easier than 60 fps. 1 second of 60p video has to do twice the compute work as 30p, and it is still 1 second. So test at the timeline frame rate you normally use. Ram preview and export are not frame rate clamped but they have their own things going on that can/do limit ultimate speed. Of course Export has the obvious additional work of saving+compressing of the video file.

All this testing cannot really tell you, buy this CPU or GPU. It can give you a better idea about your current hardware capabilities and how close your current projects are pushing it.

I'm not sure if I have been coherent here. I've rambled a bit(lot). I'll shut up now.

Thanks for another detailed reply. Using playback to determine performance is a great tip, less time consuming. I was hoping I was able to get a bit more performance from my laptop, but it really is pointless. So now I'm focusing on what parts to get for a desktop. I think this is a decent suggestion:

I will consider spending a tiny bit more on either a CPU-Ryzen 5 1600 (more cores/threads, less ghz) -Intel I3-8350K (same ghz, same cores/threads)-Intel I5-8400 (more ghz, less cores/threads)-Intel I3-8100 (same ghz, less cores, no HT)or a GPU-GTX 1060 3bg gpuwhichever will make the biggest (theoretical) improvement. I guess I will start editing higher resolution clips sooner or later, but I don't expect to do 4k any time soon. I don't think I'll be adding 3d elements any time soon, but I will be using color correction, light adjustments, etc. And I will occationally run several clips side by side.

I found this useful when I did my last build. You can do a lot of what if's and price comparisons before you spend anything. It really helps you see what you can get for your budget. Plus you can look at builds that other people have done, read reviews on parts, etc.

But don't scrimp on CPU, GPU, amount of Ram or the Power Supply. Make sure you have plenty of fans and airflow. What you are building is a video editing workstation, not a gaming system (even though, HF uses gaming features of the GPU to do most of it's business).

Good (and I mean good, not great) video edit stations are not cheap. Buy the meanest screaminest components you can buy.

A difficult decision this. The i3-8100 can of course be upgraded later on, both the i3-8350k and i5-8400 are cheaper than even the GTX 1060.

But if more cores are better than fewer cores with more mhz, then Ryzen 5 1600 (6c/12t) should be ideal. A similarly specced computer with Ryzen 5 1600 and GFX 1060 will be 6-7% more expensive than the i3-8100/gtx1070 deal. The Ryzen build will have a bigger/faster Samsung 250 SSD though.

How many cores do you need ? Enough. Yes, that is a bit of a crappy answer. Asking how many core you need is a vague question so a vague answer is an expected response.

Since Hitfilm does mostly all the heavy effect work on the GPU, the CPU comes down to media file decode and file encoding (export). Having lots of cores does not mean Hitfilm, or any app will just use them all. There are limits to doing real work in the real world. Just throwing extra threads at something can actually cost performance. Not a lot of stuff gets a win, win situation. The win needs to be bigger than the cost. It's a balancing act.

What this comes down to, simplistically, is that only a certain number of threads are used to decode a media file of given specs. e.g. HD vs UHD resolution. UHD will use more threads, and thus cores, than HD. I have constructed a test where my 4GHz CPU stutters playback and the utilization is nowhere near fully utilized. Only a faster core helps here. If you want to directly edit AVC/H.264 media then you want all the CPU clock rate you can get. Other formats like Cineform are much more forgiving on CPU performance, but they ask more from your hard disk than AVC.

In Hitfilm, to really want/use a lot of cores with media you need to have many simultaneous media streams going at the same point in time. Note that a single NLE track with a transition is two media streams during the transition period. Composite shots are easier to figure. One stream per layer active at a given/same point/time on the timeline.

I am loathe to make specific recommendations but I feel safe saying that a 4-core CPU, with enough clock rate, should be good for HD work and for 4K work you really need to be looking at 8-cores.

On "NormanAVC" fast decode AVC files. My machine gets to about 67% CPU utilization with four streams being composited. Single stream about 15%. With the same sources in Cineform High the CPU utilization is about 39% with four streams being composited.

With the Cineform files on the first pass through the media there is some small/brief stuttering due to the single hard disk seeking across the four files. These intermediates are >4x the size and therefore 4x+ required throughput. However on the second loop of the timeline things cleaned up since the files were in the system disk cache.

Assuming a linear performance scale, a flat 3Ghz CPU, with all else equal, will get pretty near saturated CPU performance on the NormanAVC material. It may or may not be perfectly smooth but it should be close. Cineform should be just fine.

The system disk cache is one reason to have some/enough excess RAM in your system. When we edit we are commonly looping over a region and here the disk cache helps with a simple I/O system. My single HD is certainly "simple". Hitfilm itself is very stingy with RAM consumption. RAM preview is what normally eats the ram and you have control over that setting and preview (full/half). You can easily tell how much RAM Hitfilm wants for any of your typical projects. Open Hitfilm, open the project and play the timeline a bit. Then stop and look at Hitfilm in the system task manager.

When it comes to export, file encoding, the encoders are largely using only a single or a couple of threads. Excluding the AVC/H.264/MP4 export option. It can use a lot of cores. Probably more than 4 efficiently. However, the utilization the encoders can really contribute depends how fast the Hitfilm engine can render your video frames and pass them to the encoder.

Simple video can feed them/frames fast. I recently posted some numbers on this forum where a 16 second timeline (480 frames) took a little over 5 minutes (300 sec) to export. To Cineform or AVC it did not matter. The export file encode was not the performance factor in that circumstance. The CPU was inconsequential. Another simple test has the AVC encoder really ramping up the CPU.

Thanks a lot. I understand, my questions can't really be answered precisely. But you're all really helping me to understand how this works. So I'be able to make a more informed decision.

Below is an example of the complexity level I'm at. It's 3 1080p/60fps clips, play back speed 1/4. So far I've only done the auto color corrections (contrast/color/levels). Each video stream is a composite shot with 3 layers (vid/plane/text). So that's a total of 3 composite shots and 9 layers + 3 streams of audio simultaneously.

Another example. Here's a few sections of tracking in a composite shot, 4 layers simultaneously for video and one for audio.

All the video files I use is H.264 from the Gopro5 Black, and I'm exporting them with the Youtube1080pHD preset. So it's H.264 for both decoding and encoding. When using H.264 source files with 3-4, or even more layers, I will actually benefit from having more than 4c/4t? (If I understand correctly?) I can't imagine I'll ever start doing entire videos of 3d animations btw. So... I was just about to order the i3-8100/GTX1070/8gbRAM, but I'm having doubts, does the balance of this setup suit my needs...? Would it be better with at least a 4c/8t CPU, and the GTX 1060 or even 1050ti as they can handle the complexity of my video streams? OTOH, I could buy the i3-8100/GTX1070 and later upgrade to the 6c/6t i5-8400/8600, if 6c is sufficient. Ram size will most likely be upgraded after a while.

Hmm, I wrote a pretty long reply, when I edited it for typo it disappeared

Thanks for your replies. I know I won't get a precise answer, but you're really helping me make an informed decison.

A few examples of what I'm doing. First one is 1080p/60fps. Three clips simultaneously at 1/4 speed, each one is a composite shot with 3 layers (video/text/plane), and they got the auto corrections (color/contrast/level).

The other one is a composite shot with a video 1080p/60fps video clip with a bit of text tracking it. 4 layers simultaneously.

All my source files come from a Gopro 5 Black, and I export using the Youtube1080pHD-preset. So both decoding and encoding is H.264. If I understand correctly then I will benefit from having a CPU with more than 4c/4t? For my use, the i3-8100/GTX1070 balance isn't really well suited? I can't see myself ever making 3d animated films. I was seriously considering getting that setup, but I'm having doubts now. I could upgrade the CPU to the i5-8400 (or 8600) 6c/6t, but I could be better off getting at least 4c/8t?

You're actually doing rather complex and resources intensive work (from the computing end) here. You're slowing 60fps footage to 15, which is pretty intensive, and auto color/contrast is resource intensive since every frame is different and requires analysis and decision making before the effects render. Of course GoPro mp4 is a slow format to decode as well (and the time stretch makes that much worse). Now your render times really don't seem excessive to me!

Are you familiar with Hitfilm's proxies? This allows you to "prerender" a lossless quality file to a drive to speed editing and render. So, for example, as you finish up shot A (bike, text, speed, color) you can start that proxying as you create shot B, and so on. So, by the time you finish your edit much of these composites have basically already been collapsed into video clips. Then, by using the proxies on output render the final renders will go much faster. Once a project is completed you can clear its proxies to regain drive space.

@knut7 FWIW, I built up a system to handle Hitfilm Pro 4 a few years back and I got an 8 core AMD FX 8350 4GHz and 32 GB RAM but a medium end GPU VisionTek AMD Raedeon 6570 1GB and when I started throwing smoke and particles at it in Hitfilm it really started having stutters and lags and render times were long. I upgraded to a GTX 1060 and got way better results. So for me, it seemed to be more about the GPU than the CPU.

Yeah, I've read about prerendering. Handbrake can do that, right? There's a lot to learn in Hitfilm, so I haven't gotten around to pretreating the media in any way. Up untill now I've been to impatient to get started editing (: I will check out the proxy for output rendering as well.

A Composite Shot is procedural media, because any layer is subject to change. In this case a proxy/prerender is deciding the Composite is done (you're not going to mess with it anymore) and rendering it to a drive. But, in the proxy system it doesn't replace the comp on the timeline with the video/media file, it just looks at the video instead of the comp. Making further changes to the comp invalidates and deletes the proxy. The proxy is a proprietary format that only Hitfilm can read. They exist to render out a complex composite that's "locked" so it isn't being rendered on the fly to speed editing performance. But, the proxy is a high-quality, lossless file, so it's suitable for final rendering.

So, for my Essential Hitfilm series on YouTube, the Composite Shots for the logo animation and "Essential Hitfilm" titles are proxys. That way, when I do a new episode, the only thing Hitfilm calculates is the episode title. Everything else is done. The reason I used proxies instead of just rendering out the BG is the channel logo incorporates all past Hitfilm logos and gets updated every time a new version of Hitfilm comes out. With v6, I have to incorporate the new Pro logo. Then I reproxy only that element.

I think you should also consider what else you plan on doing with this machine besides HitFilm. If you're going to be doing some gaming too then go for the i5 8400. Hands down it's a beast of a gaming CPU at its price point. It's arguably the best gaming CPU released in years and it's no slouch in other areas.

The 1600/1600x is probably a viable option if you're not going to be doing any gaming and the overall costs give you something else more, like the SSD you mentioned, that won't fit your budget for an i5 build.

The i3 8100 isn't necessarily a bad processor. The higher clock speed is actually pretty nice but since it is only a quad core, it might run out of steam before the other two with things that can utilize more cores effectively but that's going to depend on what other apps you're using.

Aah, get it. I think. Yeah, i was thinking of transcoding, to ease the decoding job for hitfilm. Currently I haven't got any intro or such that can be pre-rendered and reused. But I'm sure it has other uses as well, will look into the proxies.

I won't be using the computer for gaming at all. Probably. The last thing I played was Day of Defeat some 13 years ago. Oh and Dice Wars (1) I will use Gimp though, in addition to light office applications and notepad!

The offer on the i3/1070 will last another week, so I've some time to consider. Another option is a prebuilt pc with th i5-8400, gtx 1060 and presumeably a quicker ssd system disk. It is some 12% more expensive though. I've got the impression my use has above average CPU activity?

I ran a preview of the film screenshotted above and monitored the CPU load with "Intel Extreme Tuning Utility". Both the sequences from the pictures above involved all 4 cores of my i5-8250u. I'm not able to see details for each tread here, but the taskManager/Performance claimed all 8 threads were active simultaneously at times. The Tuning Utility gave me "Active Core Count: 4" during the heavy sections and all cores were reading 60-75% utilization at times. So I seem to be using more than 4 threads.

So, I made a decision. I'm building a computer with Ryzen 6 + GTX 1060 3GB. Will be overclocking from 3.2 to 3.7-3.8GHz, hopefully. Some 8% cheaper than the i3-8100 + GTX 1070, and I get more+faster storage/ram/cpu.

I think you overall have an amazing rig for under a grand. You have a great price/power ratio going. You could get better hardware, but at a rapidly diminishing rate of return.

Only major thing I see is storage. Do you have existing drives to add to the machine? If not boot that SSD to 512GB and/or put in a secondary 1TB 7200RPM HDD. Video takes a lot of space. You'll need it.

Yeah, the price was pretty great. GPU was down 30% from the equivivalent of 300 USD, PSU down 50%. And the Seagate I found in my spares box. 1TB should do, I've got a Seagate Personal Cloud to store projects after they're done.

That's what I'm thinking now. But I may change my mind after having used the computer. Thanks for all the help.

*Edit: I found a few more HDDs. A Samsung Spinpoint F1 1TB (7200/3.5") and a Seagate Momentum 5400.6 512GB (5400/2.5"). Both SATA 3Gb/s. Hopefully the SATA 3Gb/s is fast enough, I guess I shall see.