Hi,
Let me cover off some of your comments first and then lets get into some of the options for you.
You mentioned the following
I setup two instances of Wowza on a single 10Gbps server. Guess what happened? Each instance would not go over 2.5Gbps each and eventually crashed. How do I know it isn’t a limitation of my server hardware? Because on the same server I installed nginx rtmp and was able to max out the 10Gbps connection without much effort. Yes I understand nginx rtmp and Wowza run on different environments, but I’ve yet to see any proof outside of this forum that there is a 5Gbps limit on Javas Virtual Machine. I’m not asking for the moon, I just want a straight up answer. That’s all.
Each Wowza instance would not know of the other, so the limit you mention being split - so 2.5gbps - for each one suggests a shared resource somewhere. By default Wowza only changes one OS level limitation - ulimit - and does not touch anything else. You may also find the number of threads (already mentioned) needs significantly increasing.
There are many elements which cause a reduction in performance and certainly would require some investigation by yourself, for your hardware, for your environment
-
GC tuning - At very high speeds so approaching 10gbps you need to make sure GC is cycling objects as efficiently as possible. This would mean multiple threads and allocation of a high enough level of RAM to service it. There is a tonne of reading on this subject and many consider it a black art rather than a science. The G1 collector has significant improvements however the default minimum run time may be too high.
-
Thread counts - This will need changing manually (as outlined), you may even need to alter the XML files directly as the manager will limit you to 1024 and comes down to how many cores you have available.
-
IO speeds - This can cause all sorts of weird and wonderful performance issues. For live streams this is not an obvious issue, however even so the amount of logging produced can be VERY high and so certainly something to be aware of ( I am not advocating you turn off logging ! ). I have experience of many deployments of file stores, be they local or remote which certainly can not approach 10gbps for performance. If you are serving on demand content this will cause the most issues and be the hardest to troubleshoot.
The best solution to deploying 10GBps is to use MediaCache, multiple SSDs (not raided as the performance increase for raided SSDs is very little), configured with multiple stores and use the file reader option, so making the requests for content as fast and cacheable as possible.
Regarding logging, for the network speeds you are using, where are the logs being stored, is the IO wait time high/low/not 0! - then something needs to be tweaked.
4 or 5 disks, raided, even 10k RPM are not going to get you close to the performance you need - so you do need to make sure any file storage you have is good enough, speed test them, tweak them, make sure raided sets have efficient block sizes, etc etc.
- OS level changes. If you are using Linux (not mentioned your OS ), Wowza does make one change - ulimit - this has recently been increased from 20k to 65k - you may find increasing even further - say to128k - provides further performance increases. We have seen for some OSs, some environments, this is essential.
Further OS level changes are needed at higher speeds to get more performance
rmem_default
rmem_max
wmem_default
wmem_max
optmem_max
tcp_rmem
tcp_wmem
The last two really need to be changed from default as they are WAY too low for 10gbps and will punish performance and may well be causing your low levels. There is a tonne of reading around them based , on speed, latency, memory available etc. and is not one size fits all.
Many applications do some of this tuning automatically, Wowza does not and will not. You do need some knowledge of networking to tune these properly.
- Memory utilisation - Tune all your applications correctly. When using high speed delivery efficient use of memory means you get the best performance. The packetizers have a default of 10second chunk duration. If you are doing live make sure they are tuned to your GOP size, so if it is 2 seconds, make the duration 2 or 4 seconds, immediately reduces memory footprint, which in turn reduces GC pressure, which reduces GC time etc etc. You can also change this for on demand delivery however YOU MUST HAVE A HIGH SPEED file system. If you are not doing a delivery of a specific HTTP protocol, turn OFF the packetizer!
Andrew.