NVCUVID decoding with multiple cards

Is it possible to assign NVCUVID decoding to a specific GPU in the same way as with NVENC?

With NVENC, we are able to set the GPUID in the transcoder template to assign a transcode to a specific GPU, e.g. 1.

But that does not appear to work with NVCUID decoding; decoding is always done on GPU 0.

Also, we are seeing CPU utilization drop when using NVCUVID, but only to about 2/3 the CPU used when decoding with software.

E.g. a transcoding session that used 15% CPU with software decoding still uses about 10% CPU when decoding is set to NVCUVID.

I’m guessing this means a significant portion of the decoding still has to be done with software when using NVCUVID.

Is that correct?

Carl

Hi,

This question was answered in support ticket (145801).

For others viewing this thread, please find a summary below:

Currently it’s not currently possible to choose a specific GPU for decoding however I have sent a feature request to our backlog for review.

By default the first GPU found (GPU ID: 0) will be used for decoding. A specific GPU can be chosen for encoding in the transcoding template.

Regards,

Jason

Do you have any fresh news about it? Would be kind of mandatory feature, otherwise this is a bottleneck when using more cards, as only one card will not have enough decoding power.

Hello @Zoltan Szabo, thank you for following up on this feature.

  • Wowza Transcoder can be used with more than one NVIDIA graphics card (both NVENC and CUDA acceleration). You’ll need to directly address the specific card that you want to use in your Wowza Transcoder template by configuring the GPU ID setting (

Wowza have an artilce on class on this, https://www.wowza.com/docs/how-to-load-balance-nvidia-cuda-accelerated-transcoding-across-gpus

Hello!

Thank you for the comment. Unfortunately that setting only affects the “where to encode” question. And that one could be found in the documentation, was working using from the Java API also, but others are not really documented (for scale and source decoding) :frowning:

But the bellow comment from Pooya Woodcock, with docs link, helped to fix the distribution for decoding and scaling processes as well.

Thanks!

Oh great, thanks for letting me know and thanks @Pooya Woodcock for providing a solution.

Thank you for the link!

From that we figure it out, that we have such methods like:

LiveStreamTranscoder.getTranscodingStream().getSource().setGPUID(): affects which GPU will decode the source stream.

LiveStreamTranscoder.getTranscodingStream().getScaler().setGPUID(): affects which GPU will be used for scaling.

LiveStreamTranscoder.getTranscodingStream().getDestinations()[x].setGPUID(): affects which GPU will be used for encoding. This parameter can be setup in the transcoding templates as well (/

After that we managed to distribute every step across GPUs.

Hi @Rose Power-Wowza Community Manager, “the highest capacity GPU”, is presumably based on the GPU’s specs than rather than the current load? And thus if a server has 2 similar GPU cards, a value of -1 will always pick the first one?