Performance issues on a local LAN

We’re trying to track down a performance problem and isolate where it would be coming from. Here is the scenario:

  • Wowza is a 4 core Windows box and has 8GB of RAM.

  • We have followed all the steps in the performance tuning guide

  • No firewalls involved for this test as it is all LAN to LAN on the same subnet

  • Wowza Java heap size is set to 5000

  • Wowza has the -server option enabled

  • Java JDE version is 1.7.0_01

  • We do not have garbage collection enabled

  • The Wowza application I’m talking to is a simple default application with the type set to ‘live’ and no other changes beyond the standard config.

  • Wowza server VHost.xml has the send and receive buffer size set to 16000 for low latency.

    We have a MacPro running CamTwist and sending a video to Flash Media Live Encoder 3.2. We have also tested Wirecast on the same MacPro with the same video files. Both encoders show an output of 30fps with 0 dropped frames. We are connecting to Wowza using RTMP.

    When I take VLC and ask it to play the live video through Wowza via RTSP, I get stuttering, really bad looking macroblocking (not encoded) and the buffer has to increase. Wowza’s network connection shows only 9% in use, the server shows around 5.29GB of RAM in use and 30% CPU. We’re all on the same Gig switch, all computers connected via Gigabit full duplex. If I open the VLC video on the same computer that is broadcasting I get the same stuttering.

    If I play back the same video through Wowza via RTMP (jwplayer) it looks better without macroblocking but still has a few dropped frames. The buffer is also about 5 seconds longer than what I have in VLC.

    To ensure that VLC isn’t in a strange state I have reset the preferences to default and see the same issues.

    In no scenario do I get fluid video playing back. So now I’m trying to figure out if we have a network issue, a Wowza issue or an encoder issue. The next thing I’ll try is having the Wirecast box encode a UDP stream and I’ll pick that up without the use of Wowza in VLC to see if it works any better. Wondering if anyone has seen this before? Could it be that I need better garbage collection? Is having 5GB of Java heap too much? Could my encoders be dropping frames and Wowza is hating that?

    Thoughts?

I have some more data.

To help eliminate the encoder I added a second MacPro running Snow Leopard (rather than Lion) and an older version of Wirecast. Interestingly enough this seems to perform better even though it is a much older machine. Playing the exact same video through the same application yields a better result although not perfect.

When watching both streams back side by side I can see the Lion based system throwing a ton of errors in VLC (playback is all on the same Windows system). The Snow Leopard test is mostly clean but once in a while it will stutter and drop frames. When it does stutter both the Lion and Snow Leopard encodes stutter at the same time and throw errors in VLC. I think this points to a network or Wowza issue.

As a test I tried turning on garbage collection, but that didn’t seem to help. Also tried turning on the experimental collection but that didn’t seem to work either. Any other ideas as to what could be causing this?

Didn’t you have this problem in the past and you worked it out. What is different about this setup?

Charlie

Hi Ben,

Sounds like a network/hardware issue to me.

What were the results of playing the stream from your encoder directly in VLC?

Ben,

Posting your analysis and insights does help very much. Some of us do not have access to 30 camera streams. So, thanks for the info about apps and chunking.

Encoder directly to VLC looks great. No errors whatsoever.

Encoder to Wowza to VLC produces errors.

All of this is on a contained 1Gbps LAN. Also tried grabbing a new switch and added all of the nodes on a direct connection on their own segment. No change, direct connect to the encoder works, connection to Wowza does not.

Looking at multiple performance issues. The item we worked out in the past was camera site a -> wan -> Wowza site b which we do have working on a different server. That had to do with the way the network was configured and we saw errors on TCP and UDP data flow over the WAN. Fixed that up and as you can imagine the video locked right up. The issues we have today are:

encoder -> wowza -> client all on the same private LAN (no firewalls, no other boxes, just these three objects). Tried multiple encoders, multiple switches all failing. The only common object here appears to be Wowza.

We also have an issue with:

camera site a -> wowza site a -> wan -> wowza site b

This also fails when we pass the camera through two Wowza nodes. Pass it through just one and it works or if we take the camera direct over the wan it works. We actually assumed this was a network issue and not a Wowza issue. But in our testing we were able to successfully pull video over the WAN and so long as we bypass Wowza we can get beautiful video every time. Even if we do flow through Wowza at site B it will look great with no dropped frames. Add a second Wowza box in and BAM, it all becomes nasty bad. However, this particular issue is a different issue from the one mentioned in this thread.

Worth pointing out that all of our stuff is in low latency mode. Not just Wowza but VLC as well. As we increase the latency to a second or so stuff starts to stabilize. But ideally we would be well under 500ms. If we skip Wowza we can get latency to around 70ms. Ideally we would stay around 200ms to 250ms.

Thought I would update this thread with what I found. Nearly all of our performance issues were related to garbage collection. The performance tuning guide helped get us started, but was missing some garbage collection data that the forum may find useful.

Here are the settings we ended up using:

-Xmx8000M -Xms8000M -XX:+UseConcMarkSweepGC -XX:NewRatio=1 -Xss256k -XX:+AlwaysPreTouch -XX:+CMSScavengeBeforeRemark -XX:ParallelCMSThreads=4 -XX:ParallelGCThreads=4

Although each server will likely have different settings. For those who would like it, a breakdown of each setting:

-Xmx8000M: This is the maximum amount of memory that the JVM is allowed to use

-Xms8000M: This is the initial heap size of the JVM. Notice that we set it to match the maximum

-XX:+UseConcMarkSweepGC: This will tell the Garbage Collector to use the Cuncurrent Mark Sweep method which means it will always be scanning for and removing old items from Eden as required.

-XX:NewRatio=1: This sets the ratio of Eden vs Old storage. Think of Eden as a temporary place where items are stored and Old as a place where more permanent items are stored. Since Wowza don’t need a whole lot of data in Old and the Eden storage gets used quickly we set this ratio to ‘1’. The performance Tuning guide suggests using the ‘NewSize’ option but we found that the automatic calculation of ‘NewRatio’ seemed to work better for us.

-Xss256k: This is the size of the stack. Since not a lot of data needs to flow through Wowza and stay there, the smaller the number the better. Don’t go too small though, or the stack won’t have enough memory to run. We found that 256k worked well and are testing 128k with pretty good results now.

-XX:+AlwaysPreTouch: This allocates the RAM in advance so that Java doesn’t have to do it on the fly.

-XX:+CMSScavengeBeforeRemark: This causes a minor garbage collection to happen before the remark itself. This helps reduce the garbage collectors pauses and as such Wowza’s pausing as well.

-XX:ParallelCMSThreads=4: We had a 4 core system, so this option allows 4 threads to be used for Concurrent Mark Sweeps

-XX:ParallelGCThreads=4: Same deal, we had 4 cores so we open up 4 threads for garbage collection

By doing these items we reduced our garbage collection pauses and slow buildup of system memory. One other thing that helped is that we moved all of our cameras to a handful of applications. Used to be that we had 70 applications with 1 camera each. Now we have 1 application with 70 cameras, and the garbage collector likes that a LOT more.

A couple of other performance tuning items we did:

-For any Wowza server talking over the WAN we use a live-lowlatency application and not a Liverepeater. The Liverepeater always uses TCP typically on port 1935 whereas a live RTP session will use UDP on some port it negotiates. By using RTP we get the advantage of UDP without the overhead of TCP and much, much smoother video.

-We set QoS on our network to auto-tag video and make it a priority.

-We offloaded all transcoding to another box so that one bad transcoder won’t take down the entire server.

Obviously these settings will need to be tweaked a bit to fit your system, but hopefully these settings help others as we spend a great deal of time tweaking and testing and testing some more to get here. I’ll admit that we’re not 100%, but the system works a lot better now that we have these settings in place.

Thanks,

Benjamin

Hi,

I did not realize the GC could affect your system that much. May I ask if you see any lower CPU usage after tuning the GC? Thanks

We over-built the box, so CPU was never a huge concern for us. I did note that having a bunch of applications does radically increase garbage collection and CPU usage. So 30 different cameras in 30 different applications is bad. Those same 30 cameras in 1 or 2 applications worked much, much better and our resources were happier.

I have found that chunking the stream (cupertino, sanjose, etc.) does impact the CPU quite a bit. Also make note that how you set your GC will impact your CPU. For example I told my GC to use all of the available cores and always scan for old data (to prevent stop-the-world collection). This increases CPU but creates a more stable stream.

Not sure if that helped any.

Thanks,

Benjamin

Hi,

I did not realize the GC could affect your system that much. May I ask if you see any lower CPU usage after tuning the GC? Thanks