Bug: wowza crash after a period of time

Wowza crashed after a period of time, let’s say 1 week. It crashed on several edge servers, on my edge-origin servers. It has happened to at least my 20 servers.

I am using version 2.1.2 and as of now I consider it a bug unless there is other configuration tuning.

I do the following tuning on my linux servers.

[PHP]JAVA_OPTS="-Xmx3000M"

JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+UseParNewGC"

ulimit -n 20000

fs.file-max=20000

HostPort/ProcessorCount: 2x[total-core-count] (maximum of 24) Note: The HostPort/ProcessorCount field in the Admin HostPort (/Port “8086”) should not be modified.

IdleWorkers/WorkerCount: 2x[total-core-count] (maximum of 24)

NetConnections/ProcessorCount: 2x[total-core-count] (maximum of 24)

RTP/DatagramConfiguration/UnicastIncoming/ProcessorCount: [total-core-count] (maximum of 12)

RTP/DatagramConfiguration/UnicastOutgoing/ProcessorCount: 2x[total-core-count] (maximum of 24)

RTP/DatagramConfiguration/MulticastIncoming/ProcessorCount: [total-core-count] (maximum of 12)

RTP/DatagramConfiguration/MulticastOutgoing/ProcessorCount: [total-core-count] (maximum of 12)

HandlerThreadPool/PoolSize: (300x[total-core-count])/5 (maximum of 480)

TransportThreadPool/PoolSize: (200x[total-core-count])/5 (maximum of 320)

[/PHP]

That is all I did with fine tuning on all edge servers.

Can you tell me what else should I try?

Also, how to I setup the wowza server to email me everytime it is crashed, or stop working?

Thanks

First, be sure you are running the latest Java VM (1.6 Update 21). Also, comment out this line:

JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+UseParNewGC" 

Are there any errors in logs? Any errors in system logs? What do you mean by crash?

I do not know of a way to email.

Charlie

Is there anything in your service_log regarding jvm?

You may want to watch the server using JConsole/JMX. Watch the memory usage within the VM. Be sure that it does not run out of memory. At the OS level the Java VM will grab all the memory specified as the max heap size and it will hold on to it. So watching using top or other OS tools does not give you a real picture of what is going on. I suspect that memory is not the issue. It is more likely that the server is locking up due to a thread locking issue. If this is the case then a stack trace can really help us debug the problem. When a server locks up, follow these instructions to take a stack trace:

https://www.wowza.com/docs/how-to-create-a-java-stack-trace-on-wowza-media-server

Send stack trace with description of problem to support@wowza.com. Are you running any custom code or modules. This can lead to problems as well. The thread locking in the server is tricky. Are you doing any database requests within modules. This can lead to problems as well.

Charlie

The solution is to identify and fix the problem. I need your help to do that. I am suggesting that you collect information to help debug the problem. I never suggested that you just ignore the issue. Since there are no errors in the logs and the server just seems to stop then I can only assume the server is locking up (thread synchronization issue). A stack trace will help me to identify why this is happening. If there are errors or warning in the wowza logs or system logs, then please send them to me. I will see if I can identify the issue.

There is no way for me to know what is wrong without being able to collect more detailed information.

Charlie

BTW, if you want to monitor the server for failures, then I would suggest that you employ a simple watchdog script that queries an HTTPProvider on the server to test for failures. Trying to get a failing server to send email is probably not the right approach for notification since you do not know how or why it is going to fail. It seems like a better approach to test the server periodically from the outside to determine a failure.

Charlie

Just this:

STATUS | wrapper | 2010/07/20 21:44:03 | Launching a JVM…

There is a lot of detailed info about Java in the startup lines in the access log.

Richard

I have Java 64 bit latest version 1.6-21. I commented that line out as stated in the general tuning.

Wowza crash is something like this:

  • It is an edge server (live repeater) but it can’t get the stream from origin server

  • It does not send signal to load balancer listener. Load balancer listener ingores this egde server. Load balancer listener shows connections to this edge server are redirected,

  • SSH Command “service WowzaMediaServer status” shows “WowzaMediaServer started”. However it already stopped working. If I restart wowza, it will work again with no problem

The error_log only shows:

[PHP]LiveMediaStreamReceiver.doWatchdog: streamTimeout: Resetting connection[/PHP]

I have checked the error_logs for sometimes but never found any useful thing. Maybe I have to enable more detailed logs?

Wowza is a low cost solution and it is pretty good. Maybe it needs more tweak in a high traffic environment. All of my servers have 1Gpbs and it is maxed out sometimes a day, for 3-5 hours

Thanks

Edge servers do not have anything else but a live repeater application. That is all for edge servers.

My origin server has a module to connect to my site database, but this is origin server and it almost never crashed before. I did crash due to memory leak when I was using 2.1.0 version but since the upgrade to 2.1.2, origin server works perfectly.

If there is no other solution, I will have to manually keep my eye on those servers. In the next release, you should have a notification system that will send out emails when wowza crash, or the load balancer listener will send out emails o admin when it does not get signal from edge server in 10 seconds. Something like that will help us a lot reducing down time. I have lost lots of users and money just because I was not a ware my edge servers were down

Thanks