For the APAC live event, I gave a presentation comparing VP9 vs HEVC (H.265). During the session, I discussed the fundamental differences between the two “modern codecs” and tied it off with an early analysis of each codec’s performance. These results were obtained using the open-source encoders libvpx-vp9, x264, and x265. This blog post acts as a follow-up to my tech talk and I will go a little into further detail about the experiment and share the results of my research.
For the test I used the libvpx-vp9 encoder (version 1.8.2) for VP9 encoding, the x264 encoder (tag235ce6130168f4deee55c88ecda5ab84d81d125b) for h.264/AVC encoding and the x265 encoder (version 3.2) for h.265/HEVC encoding. I also compiled libvmaf (version 1.5.1) and ffmpeg (version 4.2.3) to run the encoders and perform PSNR, SSIM and VMAF measurements. If you want to recreate the same execution environment: I used Docker to build it so you can recreate the exact same environment using my Dockerfile which can be found here.
For the test set I used Full HD and 4K sequences from the JEVT SDR test set [1] which was also used in the standardization of VVC. Some of these sequences are well known and were already used in several prior standardization activities. All sequences are 10 seconds long and used in YUV 4:2:0 subsampling. The sequences are as follows:
For encoding, I used default settings with ffmpeg. All encodings implemented 2-pass encoding with a set target bitrate. The corresponding ffmpeg calls look like this:
ffmpeg -i input.yuv -c:v libx264 -preset veryslow -b:v br --pass 1/2 enc.mp4ffmpeg -i input.yuv -c:v libx265 -preset slow -b:v br --pass 1/2 enc.mp4ffmpeg -i input.yuv -c:v libvpx-vp9 -b:v br --pass 1/2 enc.mp4
I used the following presets for each encoder:
These settings were chosen from experience. While they do not yield the highest possible compression performance, they correspond to a very high quality encode with a good tradeoff between encoding time and quality.
The encodings were performed under two scenarios:
For each encoding, multiple different measurements were performed. In the cases where encoding was performed at a lower spatial resolution, the measurement was performed after upscaling the reconstruction back to the resolution of the source. PSNR and SSIM measurements were performed for the three components (Y/U/V) as well as an averaged value. VMAF was calculated as well. For the 4k source files, the 4k VMAF model was applied. For the encoding time, I measured the absolute elapsed time as well as the CPU time per thread. This is a sample plot for the sequence “MarketPlace” in the fixed resolution scenario (hover over image to zoom):
For both scenarios I calculated BD-rate results for the average PSNR, average SSIM, and VMAF values relative to x264 [2]:
As one can see the libvpx-vp9 encoder is able to compete with x265 very well when it comes to coding performance. However, the PSNR and SSIM based BD values are consistently higher for libvpx-vp9 and the VMAF-based BD-rate values are higher for x265 in the fixed resolution scenario. In the bitrate ladder scenario, both encoders show very similar results.
What is surprising is that the default x265 configuration seems to use a much lower QP for the color components (U/V) compared to the other two encoders. However, because of the way the average values are calculated, this does not have a huge impact on the BD results.
For all encodings, I also measured the overall runtime of the encoding as well as the CPU-time per thread. Both of these values can give us an indication of how well the encoders can utilize multiple cores. All tests were performed on an Intel 6 core (12 thread) processor. The results for x265 and libvpx-vp9 were taken relative to the values of x264 and then averaged. The following table displays the relative factors compared to x264:
While both x265 and libvpx-vp9 have higher runtimes compared to x264, we can see that x265 is much better at utilizing available threads efficiently, which results in much lower values for the overall runtime factors. When it comes to the CPU time, libvpx-vp9 has an advantage over x265 in the tested configuration. Similar observations can be made of the table above. So depending on your application this may be a disadvantage or not. For example, since our encoder uses multiple vectors to utilize the available threads efficiently this behavior is not a big disadvantage for us.
Finally, I would like to provide all the files needed in order to recreate the results. Furthermore, the archive includes all the result files that were used to determine my findings. I encourage everybody to double-check them. However, for legal reasons, I can not provide the encoded video sequences or the original uncompressed YUV test sequences. The archive includes the following scripts which should be helpful:
File:
https://drive.google.com/file/d/1wbUA56vB-LeH2H8nV-EGzhJPWW7ikkwx/view?usp=sharing
While this is just a quick and superficial encoder comparison, I tried to keep it close to practical applications. From the VP9 vs HEVC test here, libvpx-vp9 is able to take on x265 when it comes to coding performance. Please note that only these encoders were tested and there are other AVC, HEVC, and VP9 encoders out there which may perform better.
If you have additional inputs to the test please reach out to me! I am very willing to run this again using a different set of settings.
[1] – A. Segall, E. François, W. Husak, S. Iwamura, D. Rusanovskyy – JVET common test conditions and evaluation procedures for HDR/WCG video – JVET-P2011
[2] – Gisle Bjontegaard – Calculation of average PSNR differences between RD-curves – VCEG-M33 Austin, Texas, USA, 2-4 April 2001
Other Content: