NVIDIA card - NVENC performance spec

Hello,

Disclaimer: it is more a Nvidia thread.

Trying to break some mystery

Like other people here, we have to choose a card for transcoding purpose.

Some background:

Here is the list of supported cards : https://developer.nvidia.com/nvidia-video-codec-sdk#gpulist

Consumer card will not work; in fact it is more a driver limitation in my opinion. They are limited to 2 sessions. Beware that mixing card will not work either (high end QUADRO + GeForce on the same system for example will limit you to 2 sessions!).

Source : http://developer.download.nvidia.com/compute/nvenc/v5.0_beta/NVENC_DA-06209-001_v06.pdf page 7 Table 4 and page 12 Table 7.

From my understanding, NVENC use a special engine on the card, a dedicated chip.

Quote: “By using dedicated hardware for the video encoding task, the GPU CUDA cores and/or the CPU are available for other compute-intensive tasks” from https://developer.nvidia.com/nvidia-video-codec-sdk

Quote: “The NVENC engine’s performance is also independent of the graphics performance.”

It is clearly stated that some new hardware has two NVENC engines.

Quote:” In order to support more number of simultaneous encoding sessions an extra NVENC had been added on certain variants of the second generation of Maxwell GPUs.”

I am unable to find precise information about NVENC performance capabilities of various Nvidia card and to what extend more memory or more CUDA core might improve transcoding job.

Here are the questions:

How are we supposed to choose a card from Nvidia?

What is a low end Quadro (which is limited to 2 sessions see above)?

There is a “lot” of Quadro or Tesla NVENC capable card. The price range is quite huge between the low end and the high end. If everything is hardware based, it may not be necessary to buy a powerful card, a Quadro K4000 may be equivalent to a Quadro K6000 for example.

Any one (may be Wowza transcoder dev) have some guidance or info regarding this matter?

I have already try to contact Nvidia (chat : nothing, contact form : waiting, dev forum : similar post unanswered : https://devtalk.nvidia.com/default/topic/774009/?comment=4306663 ).

I have found similar thread here with no clear info.

Finally, newer Maxwell card (GM20x GPUs) will support hardware based HEVC encoding, is support planned with Wowza transcoder ?

Regards,

Guillaume

Hello,

I wanted to share with you some of the NVIDIA information as it pertains to Wowza’s recommendations:

Wowza Transcoder supports Intel Quick Sync and NVIDIA NVENC accelerated encoding on Windows and Linux and NVIDIA CUDA accelerated encoding on Windows. The following articles provide more information about the hardware requirements for each of these technologies:

Server specifications for Intel Quick Sync acceleration with Wowza Transcoder

Server specifications for NVIDIA NVENC and NVIDIA CUDA acceleration with Wowza Transcoder

Important: NVIDIA CUDA encoding acceleration isn’t supported in the latest NVIDIA graphics drivers (340 and greater). CUDA-based accelerated encoding is deprecated in Wowza Streaming Engine™ 4.0.5 and will be removed in a future release of the Wowza Streaming engine software.

You should get transcoding working using the built-in default MainConcept software encoder first before trying to get accelerated transcoding to work. The MainConcept software encoder doesn’t use hardware acceleration. For more information about how to determine if hardware acceleration is available on your Wowza media server, see How to verify which Wowza Transcoder implementation is invoked.

On newer Windows operating systems, Intel Quick Sync and NVIDIA CUDA hardware acceleration may not be available when running Wowza Streaming Engine as a system service due to a security measure called Session 0 Isolation. For more information about how to workaround this issue, see How to enable hardware accelerated transcoding when running as a Windows service.

When using Windows Remote Desktop, Quick Sync acceleration may not be available.

More information is available to you in this how-to article.

Hope this helps clarify some of the mystery.

Regards,

Mac

None of the Haswell X99 consumer processors have a GPU embedded in them so there’s no way Quick Sync would work on them.

Maxwell-based GPUs are supposed to support H.265 encoding, not sure about decoding H.265 and then encoding to H.264.

Currently Wowza Streaming Engine supports H.265 decoding by use of the CPU and encoding to H.264 can be done on hardware,

such as the K5000. If you are looking to encode H.265, that can only be done by Maxwell-based GPUs

Daren

Any coments on the NVIDIA Quadro M4000 ?

its based on the GM204 , i think is the way to go , new for 880usd , seems the best option at the moment.

waiting for comments,

thanks ,

We have been using the Quadro M4000 for a while, it is working well.

We are looking for something with higher performance, but it doesn’t seem that such a thing exists, as mentioned earlier, it seems like NVENC in a generation is the same across all cards, and we’d just have to get more of them

I’m running 4.1.1 using the very latest Nvidia driver.

I have tested a GTX660 which costs nothing compared to the one I’m using now which is a Quadro K5000. The load is the same on the CPU, while the GPU is higher loaded on GTX660 compared to K5000. But far from 100% in my case.

GTX660 is CUDA based and K5000 is NVENC based.

The funny thing is that the Wowza team now removed support completely for CUDA from 4.1.2. So CUDA is dead in Wowza. Not because of Nvidia, but the code was removed from Wowza.

CUDA cards cost like 1/10 of NVENC.

Wowza claims that CUDA was removed from Nvidea version 340. It was removed from the WHQL drivers, but you can easily re-enable CUDA encoding capabilities, or simply just install versions prior 340.52. This way you can save $2000 per GPU and have the same quality and performance.

So dear Wowza team, let us please ourselve decide if we want to use a cheap GPU with an 1 year old driver. Why remove the code?

Thanks in advance for readding CUDA support and letting us save huge amounts of money on NVENC which actually doesn’t perform better than CUDA if you check CPU usage :slight_smile:

Since last summer Intel Quick Sync hasn’t been working on performance personal computers which use Haswell-X99 (LGA2011). How can Intel Quick Sync be used on newer computers?

It is getting very difficult to get proper transcoding speed as resolution goes up. Wowza is maxing out the CPU on 4 x 1080p@30fps based on my tests on new hardware even when decoding and re-encoding is hardware accellerated using Intel QuickSync for decoding and NVENC for re-encoding.

That is exactly my point. Wowza with hardware accellerated decoding/encoding is getting extremely expensive. You will need highend CPU (Intel Xeon E5-26xx) and also highend GPU’s which actually just idle at 5-10% when you run 4 streams.

The Wowza team now removed CUDA support from the code, and Intel removed QuickSync for consumer processors (affortable computers).

Basically I need alternatives. NVENC has a very bad price/performance benchmark.

I have desided to drop NVENC completely. I bought a K5000 2 years ago and hoped to be able to transcode 8 x 1080p@30fps. Unfortunately this didn’t work at all.

Now I’m instead using Intel i7-6700K, which is able with buildin GPU on the motherboard to easy do 8 x 1080p@30fps with transcoding of both video and audio. Using QuickSync obviously.

This setup costs me under half of a computer with K5000.

Just my 2 cents.

Hi,

We do have some Transcoder benchmarks with the Quadro M4000 (and M5000) which you can see here.

Paul

Hello,

Thank you.

I want to share some info that i have gathered (Hardware forum and Nvidia customer relationship).

NVENC capabilities are equivalent on the same GPU class. It may not be necessary to buy the high end one.

NVENC capabilities are different between GPU classes.

There is 3 GPU classes NVENC capable:

• Kepler

• Maxwell Gen 1 (same as Kepler + more H264 perf)

• Maxwell Gen 2 (same as Maxwell Gen 1 + more H264 perf + HEVC hardware encoding)

Some platform are limited to 2 sessions see https://developer.nvidia.com/nvidia-video-codec-sdk#gpulist

A low end Quadro is below K4000 for example.

Some card has 2 or more GPUs (GRID or Tesla etc…). This means that they have 2 or more NVENC engines. 1 GPU = 1 NVENC engine.

For the moment, there is no Maxwell Gen 1 or Gen 2 GPU class without limitations:

Grid and Tesla are Kepler based. Quadro K420, K620 and K2200 are Maxwell Gen 1 based but are “low end” Quadro.

May be it will help others to choose a nvidia card for hardware encoding purpose only.

Feel free to share your opinion and which card you choose.

Regards,

Guillaume

Hello,

I’m not sure if you’ve seen the Transcoder Benchmark article yet but you may find it helpful for comparison. It does not include the K2200 but does show test results with other NVIDIA models.

Best regards,

Andrew

Will this card support H.265 to h.264 transcoding?

Any coments on the NVIDIA Quadro M4000 ?

its based on the GM204 , i think is the way to go , new for 880usd , seems the best option at the moment.

waiting for comments,

thanks ,

Anyone using a NVIDIA Quadro K2200 for transcoding able to share some information on how many streams it can transcode concurrently?

Hi,

We do have some Transcoder benchmarks with the Quadro M4000 (and M5000) which you can see here.

Paul

Unfortunately your transcoder performance benchmark results don’t clearly answer the question whether an M5000 card will outperform an M4000 (transcoding) because you used two very different server hardware configs between Server2 and Server4.

Someone earlier in this thread stated they believed that NVENC capabilities are equivalent on the same GPU class thus an M4000 & M5000 (transcoding using NVENC) should have the same performance.

We are in the process of determining how to cost effectively scale our transcoding “capacity”. The current market price of the M4000 is ~$900 while the M5000 is ~$1800, if their performance is going to be the same (for our transcoding use) then no need to pay the premium for the M5000.

Earlier this week I purchased both an M4000 and M5000 and will perform back to back performance testing of the two cards in the same exact server and will share our results for the benefit of the community.

Paul - While on the subject - does Wowza have any best practice documentation related to transcoder capacity scaling?

Kyle

https://live.racecast.me

We have been using the Quadro M4000 for a while, it is working well.

We are looking for something with higher performance, but it doesn’t seem that such a thing exists, as mentioned earlier, it seems like NVENC in a generation is the same across all cards, and we’d just have to get more of them

Thank you. I think this is the first named graphics card that someone has mentioned to actually work for transcoding. I’ve seen the links to NVIDIA data and recommendations, I go to the NVIDIA site and I’m throughly confused. Wowza recommendations are just as useless, since all this info seems to assume you are an NVIDIA engineer, throwing around card identifiers Kepler, Maxwell 1, Maxwell 2, Tesla, Grid etc., then listing all these additional notes on why various card won’t work. Don’t get me started on CUDA…, which I have no clue what it is, but appears to have been something that worked and was crippled by choice. Why do I have to become an NVIDIA product expert to do transcoding? All I want to do is buy a computer with a supported graphics card that will work. Then I look at NVIDIA card prices! There was mention of a Quadro M4000 being about $900, but I see others that are > $5K, but I can’t even tell if they will work! What does Wowza use? What do they test on? Why can’t Wowza just say buy an NVIDA blah,blah,blah card, we know it will work. Ugh… 3 weeks of time wasted and I still had no clue what NVIDIA to buy. So, I was sweating buying a > $3K card that might not work, but it looks like I might be able to buy (3) Quadro M4000’s that might work. This should NOT have to be this hard. Pardon my rant and frustration, but thanks again for M4000 tip.