After Effects and MFR
With the release of After Effects version 22.0.0, Adobe has introduced Multi-Frame (MFR) rendering to improve performance for render intensive projects. We have tested MFR and the results are impressive. At RE:Vision we have also been studying how we might take advantage of systems running multiple GPU processors. We are pleased to announce some great results. This document explains what we have done and provides you with a set of materials and test projects to test the advantages of having MFR and Multi-GPU when using our plug-ins. However, there are edge cases you should be aware of. The Multi-Frame GPU performance is only realized when you install the latest version of our plug-ins and activate Multi-Frame Rendering in AE.
TESTING the Benchmark with AE Verson 22 MFR We include a project with 4 setups to test CPU and GPU rendering when Adobe After Effects Multi-Frame Rendering (MFR) is applied. The test project uses a set of our plug-ins, but you don’t need to own a license of our plug-ins to use these tests. Just download Effections, our bundle. Without a license, it will draw a watermark, and as we cascade effects this might create visual artifacts if you were to look at the final render, but it won’t really affect the render times and behavior which is what we want to measure here.
Software download link – https://staging2.revisionfx.com/products/effections/after-effects/#downloads
Project Download link: https://spaces.hightail.com/space/E7kHd35u1R Note some media files are big (over 1 GB), so the Sources can be individually downloaded if you have a slower internet connection. You can just execute the render queue in AE 22 with MFR enabled (in Edit | Preferences | Memory and Performance pref. Then run again and turn off MFR to compare the results. It should take an hour + to process (depending on your rig). It’s a real test. For aerender, this should just work as well, we haven’t tested the new argument to turn on and off MFR with aerender yet.
Note you should between the MFR and no MFR pass either restart AE or Edit|Purge|Image Memory to avoid false results from render times (frames in persistent cache). For the same reason, turn off Composition | Preview | Cache Frames when Idle to avoid AE speculatively rendering frames in your back and then distorting your render times. send us a screenshot of your render queue with your machine specs, this helps a lot understand what is going on.
Make sure you saved everything before playing with these projects, particularly on a machine with 32 GB of RAM or less. You might freeze.
What we are testing: These projects were initially selected to test edge cases based on initial Adobe stated intentions to address concurrent rendering in their application. The purpose here is not to show off how good we are or provide how-to tutorial to make cool looking videos. The selected test cases might also not be the best general-purpose set to compare a particular hardware setup with another. But it does show if this works as advertised on your computer, and allows you to measure what you should expect on different hardware. We see in average over 20% speedup (1.2X but much more on multi-GPU systems), particularly on systems with a large number of cores. Note our CPU code is typically highly multi-threaded and vectorized so getting even 20% less time is good. We like concurrent rendering as it usually works if one is not too aggressive against resources. In a sequence render context where no one controls all the parts (the IO does whatever it does, and some other tools are involved too) after a few frames all tends to auto-balances naturally.
Also, of course the PCI-E bandwidth matters here, as you know PCI-E 5 is 2X faster than PCI-E 4 which is 2X faster than PCI-E 3… and the number of PCI-E lanes matter, unless you have the big Threadripper PRO (I think up to 64 lanes), XEON gen 3 (up to 40) or dual-processor config (less common these days), you are likely restricted to 20 lanes or so. Better such motherboard which are the most common automatically switches to 8X lanes per GPU when you have 2 GPU installed, so half the bandwidth. So, then it’s whether you are typically IO bounded or Process bounded. But if you already have dual GPU because your 3D renderer supports that, why not use this extra ressouce :)
Known Issues – Pay Attention! It came to our attention in final release that there can be an issue of AE over-threading (trying to render too many frames at once) causing issues with their memory cache recycling, creating situations where way too much RAM is requested which can potentially bring your system to a bad place. Sometimes the only way out is to turn off MFR, but sometimes there are safe MFR workaround, notably with large size frames and systems with less than 64B of RAM, we get lucky with leaving more memory to other apps to avoid that… (e.g. allow only 24 GB to AE in a 32 GB system). We will work with AE team to help this works better.
We will as we discover work arounds push them here:
Follow this album.
GPU: Our plug-ins will benefit from the Multi-GPU upgrade when rendering (not in interaction mode). It is important to note that multi-GPU only works when the GPUs are the exact same vendor and model.In practice the best performance for us in isolation with video IO involved would be to cap at 2 frames at once per GPU. Beyond that it’s not very useful to our effects unless it helps other tools in your project go faster. But there is not really a way to control that, aside perhaps the obscure leave % CPU to other apps. For our GPU tools note that they initially need to get recompiled for your specific GPU which creates a small delay. If you open the comp before launching the render queue, this will already have been done.
Bandwidth: Note, these projects are 32b float which takes a lot of memory, but we do support 8 and 16 bits per channel (bpc) which can reduce the IO and internal bandwidth a lot. IO is relatively more an issue for light compute projects (so more IO bounded). The FRT 1080P project shows that. We provide our results to include the IO times so we are not factoring only OUR speedups in itself (as IO is not much changed by the way the processing is disseminated). And we tested with Quicktime ProRes 422 LT as output not to add another GPU dimension to the testing (yes some codecs like Red and mp4/h264 might be hardware accelerated which can impact of course the GPU comp timing results – do not change Mercury Engine Project setup and Import Hardware Accelerated settings in Import Preferences settings in the middle of a batch test render to be able to do a fair comparison. As they say your results may vary. (smile)
Memory: A rule of thumb that seems robust at least for the render queue, appears to be to check how much memory is available before starting AE, then subtract 2.5 GB from that (4 GB on 32 GB system if an option, we here set it to leave 8 GB) and make the value of RAM available for AE to that. We know it’s counterintuitive but the more RAM available for AE, the more it will try to launch additional concurrent frame renders and this is when memory can get over-saturated.
Good idea to run these test projects with the Task Manager – Windows or Activity Monitor on MAC. Keep an eye on the Memory % counter and as soon as it gets red (Windows below) – press Pause in render queue. If there are a lot of apps opened – either close them before continuing or stop, save and continue.
The Test Comps:
Comp A001: This is the 8K project, note as well the project is set to 32b float. This project can sometime cause memory over-requests by AE, usually on initial first batch of frames. Additional factors to consider: The 8K files were generated from a Red Camera (and trimmed with Cine-X by Red which seems to popup a warning in AE, bug?). Note if your computer supports different GPU rendering mode (e.g Cuda and OpenCL) that AE decided if you have a discrete and an embedded GPU, if you set Mercury Engine to OpenCL and have an nVidia card it will hardware accelerate using Intel embedded GPU and if you select Cuda it will use the nVidia card.
Comp DEF: This comp only has one effect but we did set DEFlicker HighSpeed in most compute- heavy possible setup. The source video might not need it in this case. This setup works much better on systems with a lot of RAM.
Comp FRT: This is a frame rate conversion with Twixtor followed by a distortion effect. Interesting the GPU speedup vs CPU here (we need to check that). This is one comp where we lost something with MFR as we can’t cache intermediate motion vectors when slowing down. This is particularly true on smaller rigs where the added speed does not catch up. We have some work to do here to improve that.
Comp DRS: This comp uses 4K in and 6K out. It should be a better example to compare multi-GPU scaling. Twixtor is used but with a speedup a bit over 100% so there is no difference caching or not intermediate results.
provided by nVidia, 56 cores PC, 192 GB of RAM, 4 A6000 GPU
Origin laptop, 6 cores, 16 GB of RAM, 1 RTX 2800 *: freezes with MFR on comp A01 – bug filed at Adobe
Macbook Pro 2017, 4 cores, 16 GB of RAM *: Negative impact on DEF GPU comp with MFR
Provided by Troika Film, 32 cores Threadripper, 256 GB of RAM, RTX 3090 GPU Total time to render tests – for 16 comps(8 with MFR, 8 without): 1.27 hours.
AE 21 CPU: 1407 sec, AE 22 CPU: 1969 sec.
This system averages 1969/1407 = 1.4X faster in AE 22 on CPU with our tools.
AE 21 GPU: 849 sec, AE 22 1 GPU: 784 sec., AE 22 2 GPU: 530 sec
This system with 1 GPU averages 849/784 = 1.1X faster in AE 22 with our tools with GPU processing.
This system with 2 GPU averages 849/530 = 1.6X faster in AE 22 with our tools with GPU processing . And over 2X faster with 3 GPU.
Trying on older system, 12 cores PC (24 threads), 32 GB of RAM, Firepro 8100 GPU *: DEF comp crashed before Leaving 8 GB of RAM to other apps Total time to render tests – for 16 comps(8 with MFR, 8 without): 2 and 1/4 of hour. no MFR: 7435, MFR: 6170 = 1.21X average faster in AE 22 using our tools.