Maxing out a c3.8xlarge instance: 500, 1000 or 2000 simultaneous viewers?

Can you help me estimate how many connected users Amazons most powerful instance can handle?

Use case: One encoder in Copenhagen streams to the c3.8xlarge instance (32 vCPU, 60 GB RAM and 10 Gigabit*4) in Ireland with 5000 Kbit/s. The 5000 Mbps stream is transcoded on the instance to three bitrates, 500 Kbps, 1000 Kbps and 2000 Kbps.

500 simultaneous viewers

500 simultaneous viewers residing in Denmark, consume the stream in the three bitrates. One third at 500 Kbps, one third at 1000 Kbps and one third at 2000 Kbps totalling 250 GB during this hour.

1000 simultaneous viewers

1000 simultaneous viewers residing in Denmark, consume the stream in the three bitrates. One third at 500 Kbps, one third at 1000 Kbps and one third at 2000 Kbps totalling 500 GB during this hour.

2000 simultaneous viewers:

2000 simultaneous viewers residing in Denmark, consume the stream in the three bitrates. One third at 500 Kbps, one third at 1000 Kbps and one third at 2000 Kbps totalling 1000 GB during this hour.

  • When would I hit the limit? Before 500 users? 1000 users? 2000 users?
  • Does it matter if the streams are RTMP, HTTP or a mix of the two?
  • Also, does the number of simultaneous viewers actually matter a whole lot, or is it simply a matter of throughput: Does 100 viewers consuming a 2 Mbit/s stream equal to 1000 users consuming a 200 Kbit/s stream?
  • Lastly, if the number of simultaneous viewers rise above what a single server can handle, can you lay out the difference between Dynamic Load Balancing AddOn and Elastic Load Balancing? What are use cases for the Dynamic Load Balancing AddOn vs. the use cases for Elastic Load Balancing? And when would you combine the two?

Hi,

Wowza does not have a hard limit to the cumber of connections it can handle, this is based on the hardware and bandwidth available to the server.

500 kbps + 1000 kbps + 2000 kbps = 3500 kbps | 3500 kbps /3 = 1166.6 Kbps 
500 connections with an average bitrate of 1166.6 Kbsp = 583300 Kbps == 583.3 Mbps == 262 Gig per hour
1000 connections with an average bitrate of 1166.6 Kbsp = 1166600 Kbps == 1.166 Gbps == 524 Gig per hour
2000 connections with an average bitrate of 1166.6 Kbsp = 2333200 Kbps == 2.332 Gbps == 1048 Gig per hour

The only hard limitation I can see is the network interface which as you can see, your usage is far below what the interface can handle.

Having a large number of connections for a low bitrate stream is marginally more load as Wowza is handling more connections with the same bandwidth but this will not be noticeable.

Having HTTP clients rather than RTMP clients is also slightly more load but has an incredibly small difference which again wouldn’t be noticeable.

The Dynamic Load Balancing AddOn and Elastic Load Balancing would not be used together as they are different methods of load balancing.

Dynamic Load Balancing would be used with an origin edge configuration, the origin and edge servers would already be configured to receive the load directed from the load balancer.

Elastic Load Balancing creates instances to handle the load you have but this would be better described by a member of staff at Amazon.

Jason

These huge servers might not be the best solution. An edge cluster can be many m1-small instances. Here is instance types to compare The throughput on smaller sizes is not specified exactly. You may only get 150mbs on an m1-small, but we do not have hard numbers because AWS does not publish them

Richard

Here is the link to EC2 instance types

Many smaller edges, if managed well, might be more cost effective, and is certainly more flexible. It is best to have uniform edge cluster, all edges the same size and approx throughput

Richard

Let us know if you find out anymore from AWS if you wil.

Thanks for the report,

Richard

Hi,

What real world performance should I expect from the c3.8xlarge 10 Gigabit*4 network interfaces?

Java has a limitation of 5 Gbps so this would be a limitation as to how many concurrent connections you can have which will vary based on the bitrate of the streams being viewed.

Am I right in assuming that the use of the Dynamic Load Balancing AddOn has the potential of escalating my Wowza license costs, as well as the number of Amazon instance hours? Instead of needing a single Wowza license like I would if I could stay on a single instance, I would need four licenses if I wanted three Edge/Load Balancer Senders:

The origin and each edge would require a license so yes this would increase licensing costs and would be a total of four with one origin and three edges.

You can use Devpay or BYOL which is up to your personal preference.

Then there is the natural limitation of the Dynamic Load Balancing AddOn. It does balance the number of connected viewers between a fixed number of instances, and does this very well as far as I can tell from the feedback I have read and heard. It does not however, do anything when it comes to an origin instance that’s struggling to cope with too many incoming streams that has to be transcoded on the server.

You can have an origin which is also an edge at the same time by setting the StreamType to “liverepeater-edge-origin” this can have load directed to it like any of the other edges.

That’s correct, the elastic load balancing solution may be better for your needs but I recommend asking the Amazon staff for more information on this as I haven’t used it myself.

But AWS Elastic Load Balancing + Auto Scaling would, right? I’m not quite sure if I understand the concepts of Amazons Elastic Load Balancing and Auto Scaling fully, but I imagine that it can indeed help the instance handling the incoming streams and transcoding to never trespass a CPU-utilization limit set in CloudWatch.

Is that also correct?

I think questions regarding Amazon’s load balancing and network solutions are better directed at the Amazon support staff, although they are using Wowza we do not support their network and so can’t give accurate information on server load capabilities for an individual server type.

What about cost for an AWS Elastic Load Balancing + Auto Scaling solution? I imagine I would only pay for the instance hours when they are actually needed? And what about Wowza licenses? Would the solution mean that only a single Wowza license is needed? Or would each instance that “helps” need to be preconfigured with a license?

Again this would be better directed at the Amazon support staff who will be able to answer these questions more accurately, I would not like to comment on what you may or may not get billed for.

Jason

Hi,

The RTMP Load Test Tool is very good solution for estimating server capacity. I will note that when you run the tool as a large volume test, you need to keep a close eye on the test computer/server performance. Be sure that the number of tests created/executed do not exceed the capability of the box or the network. I do see some reference to this in your earlier post. It might be a good idea to create multiple instances to be sure.

More importantly, it seems that Wowza is performing normally and that your issue is with available bandwidth. We are unable to support this element of your workflow. I would suggest that you explore network support options from AWS, who certainly can speak to this issue directly.

Link: AWS - Submit A Case

-Tim

The only hard limitation I can see is the network interface which as you can see, your usage is far below what the interface can handle.

What real world performance should I expect from the c3.8xlarge 10 Gigabit*4 network interfaces?

The Dynamic Load Balancing AddOn and Elastic Load Balancing would not be used together as they are different methods of load balancing.

Am I right in assuming that the use of the Dynamic Load Balancing AddOn has the potential of escalating my Wowza license costs, as well as the number of Amazon instance hours? Instead of needing a single Wowza license like I would if I could stay on a single instance, I would need four licenses if I wanted three Edge/Load Balancer Senders:

Instance 1: Combined origin and load balancer listener

  • 1 Wowza license

  • 1 Transcoder AddOn

    Instance 2: Edge/Load Balancer Sender #1

  • 1 Wowza license

    Instance 3: Edge/Load Balancer Sender #2

  • 1 Wowza license

    Instance 4: Edge/Load Balancer Sender #3

  • 1 Wowza license

    I would also need to quadruple the number of instance hours:

    Instance 1: Combined origin and load balancer listener

  • 1 instance hour

    Instance 2: Edge/Load Balancer Sender #1

  • 1 instance hour

    Instance 3: Edge/Load Balancer Sender #2

  • 1 instance hour

    Instance 4: Edge/Load Balancer Sender #3

  • 1 instance hour

    So, while it is not entirely accurate (traffic out is not quadrupled and only a single Transcoder AddOn is needed), this would almost multiply my cost by four. This is well worth the money when the solution is actually needed, but for us this might be the case in only 10 out of 251 working days. The vast majority of events we stream will not have simultaneous viewer counts exceeding 500.

    Then there is the natural limitation of the Dynamic Load Balancing AddOn. It does balance the number of connected viewers between a fixed number of instances, and does this very well as far as I can tell from the feedback I have read and heard. It does not however, do anything when it comes to an origin instance that’s struggling to cope with too many incoming streams that has to be transcoded on the server.

    It that correct?

    The CPU in a c3.8xlarge instance will be taxed ~ 15 % when taking one incoming stream and transcoding it to three other bitrates. This means, to stay below 50 % CPU utilization which is your recommendation when user the Transcoder AddOn, the server can cope with three incoming streams. Now this is fine for us for the events we stream the majority of the year. But there might be these 10 out of 251 working days where we need, say six incoming streams. This would cause a CPU load of 90 % and the Dynamic Load Balancing AddOn won’t make the the origin instance capable of decreasing the CPU load.

    But AWS Elastic Load Balancing + Auto Scaling would, right? I’m not quite sure if I understand the concepts of Amazons Elastic Load Balancing and Auto Scaling fully, but I imagine that it can indeed help the instance handling the incoming streams and transcoding to never trespass a CPU-utilization limit set in CloudWatch.

    Is that also correct?

    What about cost for an AWS Elastic Load Balancing + Auto Scaling solution? I imagine I would only pay for the instance hours when they are actually needed? And what about Wowza licenses? Would the solution mean that only a single Wowza license is needed? Or would each instance that “helps” need to be preconfigured with a license?

The only hard limitation I can see is the network interface which as you can see, your usage is far below what the interface can handle.

What can the interface 10 Gigabit*4 handle?

From tests such as this showing 10 GbE speeds of up to 900.6MB/s, would we then be able to multiply this by four and then assume this throuhput:

900.6 MB/s * 4 = 3,5 GB/s  

3,5 GB/s amounts to 28 Gbps, so the c3.8xlarge instance with its 10 Gigabit*4 should be possible to handle 25296 viewers each consuming 1166.6 Kbsp:

25296 connections with an average bitrate of 1166.6 Kbps = 29510313,6 Kbps == 28.144 Gbps == 12665 Gig per hour

This is insane. :mad: Please correct me.

Edit: I just noticed that the *4 refers to a footnote and not that the instance has four 10 Gigabit interfaces.

So I should not multiple 900.6 MB/s with four. It’s just 900.6 MB/s == 0.879 GB/s == 7.036 Gbps == 7377715 Kbps.

7377715 Kbps / 1166.6 Kbps = 6324 concurrent connections.

Thank you Jason.

Java has a limitation of 5 Gbps so this would be a limitation as to how many concurrent connections you can have which will vary based on the bitrate of the streams being viewed.

This makes it pretty straightforward then, to calculate how many concurrent connections a c3.8xlarge instance can handle:

5 Gbps equals 5242880 Kbps

Then we divide the limit of 5242880 Kbps with the average bitrate.

5242880 Kbps / 1166.6 Kbps = 4494

4494 conccurent connections.

Since there is this 5 Gbps Java limitation, it does not matter if my instance has 5 GbE, 10 GbE or even 10 GbE * 4 like in the case with the c3.8xlarge instance. Am I correct in this, and the 4494 concurrent connections?

These huge servers might not be the best solution.

Why is that? Because of the fact that the Java limitation prevents the saturation of 10 GbE?

Here is "]’ to compare

Should there be a link a here?

… if managed well …

And that’s one of the reason I would prefer to stay on a single instance, as I then only have to monitor and maintain a single Wowza Server. One of the other reasons being that I would then avoid buying an excessive number of Wowza licenses.

If one high performance instance like the c3.8xlarge can handle 4494 concurrent connections (see the calculations in #6), that is actually enough for us. But can it?

No one seems to have experience with the largest instances such as the c3.8xlarge, so I will use your Load Testing Tool to find out when it maxes out.

I will simulate 4494 simultaneous connections to 1166,6 Kbps streams, to see if a c3.8xlarge can handle a load of 5242880 Kbps == 5120 Mbps == 5 Gbps. If it can’t cope, I will lower the number of connections until I’m not overloading the instance.

I have already tested the c1.medium and c3.2xlarge and they both top out at ~ 800 simultaneous connections to 1166.6 Kbps streams, meaning that these instances can handle a load of 933280 Kbps == 911 Mbps == 0.89 Gbps. This seems to be pretty consistent with Ian’s findings here.

The maximum bandwidth for an c3.8xlarge instance seems to be 1.7 Gbps.

2000 connections @ 5160 Kbps (simulated with Load Testing Tool)

CloudWatch reports 14.041.551.213 Bytes
WMSPanel reports 1775958353 bps

As 1775958353 bps only equals 345,5 connections @ 5160 Kbps, I tried to simulate just 345 connections and got similar numbers as when simulating 2000 connections.

This is far from the limit of the 10 GbE interface, and it’s not even close to the 5 Gbps Java limitation earlier mentioned in this thread. Shouldn’t I be able to simulate 1000 connections @ 5160 Kbps and see a 4.9 Gbps throughput?

you need to keep a close eye on the test computer/server performance

The Java heap size on the c3.8xlarge client and host have been set to a 8000M. The core count have been set to 32.

At the time the bandwidth is capped at 1.7 Gbps the CPU-load on the c3.8xlarge client is 15 % max. The CPU load on the c3.8xlarge host is 6 % max.

Be sure that the number of tests created/executed do not exceed the capability of the box or the network.

It is actually the capability of c3.8xlarge instances that I’m trying to measure. From the specs alone, I see nothing that should prevent me from reaching the 5 Gbps Java limitation.

It might be a good idea to create multiple instances to be sure.

I have tried splitting the connections between two clients instead of just one. I still hit the same limit.

it seems that Wowza is performing normally

To verify that my Wowza setup/configuration isn’t to blame for this 1.7 Gbps cap, maybe I should try to do a bandwidth/throughput test that bypasses Wowza altogether. Any suggestions on how to best do that?

… it seems that Wowza is performing normally and that your issue is with available bandwidth.

I believe you’re right as I have just tried a couple of iperf tests thus bypassing Wowza settings and Java limitations altogether:

Iperf server/host

c3.8xlarge

Iperf clients

t1.micro = 141 Mbits/sec
c1.medium = 950 Mbits/sec
c3.8xlarge = [B][COLOR="#FF0000"]1.73 Gbits/sec[/COLOR][/B]

The same limit of ~ 1.7 Gbps applies outside of the Wowza enviroment. I found this thread where other users are trying to saturate a 10 GbE, but experience limits of around 3 Gbps. There are mentions of increasing a linux ulimit as well as doing modifications to the network driver. Does it seem likely that it is the same factors that is holding the c3.8xlarge back?

I have now created an Amazon support ticket and will update this thread if Amazon finds a solution.

Until now support has suggested that I create the instances in the same placement group. I tried that without any improvements in throughput. They also suggested testing several times at different times of the day. I have already tried that and it doesn’t change the 1.73 Gbps limit.

Amazon has finally recognized that something caps the throughput at 1.73 Gbps. My findings was initially received with a fair amount of skepticism, but accepted after they agreed to test themselves. Support has promised to perform further tests to find our why the instance cannot archive a higher throughput when connecting to its public IP.

There is something to note however. This limit is only seen when testing against an instance’s public IP. When testing against an instance’s private IP, which of course cannot be tested form outside Amazons environment, you will not see cap.

Support performed these tests connecting to the private IP of an instance:

For ami-8e987ef9 (Ubuntu 12.04 PV based AMI), we tested iperf with the private IP of an instance. We received 6.74 Gbits/sec of bandwidth. Below are the results of iperf:
CLIENT
iperf -c 172.31.46.167
------------------------------------------------------------
Client connecting to 172.31.46.167, TCP port 5001
TCP window size: 96.7 KByte (default)
------------------------------------------------------------
[ 3] local 172.31.46.168 port 59347 connected with 172.31.46.167 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 7.85 GBytes 6.74 Gbits/sec
SERVER
iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 172.31.46.167 port 5001 connected with 172.31.46.168 port 59347
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 7.85 GBytes 6.73 Gbits/sec
=================================================================================
For ami-8c987efb (Ubuntu 12.04 LTS for HVM instances), we had to enable enhanced networking, and update the driver to ixgbevf. We tested iperf with the private IP of an instance. We received 8.25 Gbits/sec of bandwidth. Below are the results of iperf:
CLIENT
iperf -c 10.0.0.124
------------------------------------------------------------
Client connecting to 10.0.0.124, TCP port 5001
TCP window size: 96.7 KByte (default)
------------------------------------------------------------
[ 3] local 10.0.0.45 port 38101 connected with 10.0.0.124 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 9.60 GBytes 8.25 Gbits/sec
SERVER
iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 10.0.0.124 port 5001 connected with 10.0.0.45 port 38101
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 9.60 GBytes 8.24 Gbits/sec
For ami-360bea41 (Ubuntu 13.10 for HVM instances), we enabled enhanced networking only as there is no need to update the driver. We tested iperf with the private IP of an instance. We received 9.65 Gbits/sec of bandwidth. Below are the results of iperf:
CLIENT
iperf -c 172.31.33.1
------------------------------------------------------------
Client connecting to 172.31.33.1, TCP port 5001
TCP window size: 96.1 KByte (default)
------------------------------------------------------------
[ 3] local 172.31.33.2 port 55845 connected with 172.31.33.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 11.2 GBytes 9.65 Gbits/sec
SERVER
iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 172.31.33.1 port 5001 connected with 172.31.33.2 port 55845
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 11.2 GBytes 9.65 Gbits/sec

Yes, of course. I had a dialogue with AWS Support and they confirmed the 1.73 Gbps limit. There seems to be no way around this, so when it comes to actual throughput the c3.8xlarge (or other 10 GbE instances) offer terrible value compared to instances with “High” network capabilities that performs very close to 1 Gbps.

Below is the most relevant snippets from our dialogue:

ME - When I test my own ‘10 Gigabit’ instances (c3.8xlarge) with iperf I won’t see transfer rates exceeding 1.73 Gbps. I have tested at various times of the day over the span of seven days. I have run the tests multiple times but each time I’ll hit the very same limit of 1.73 Gbps. I have tried setting the windows size to 64KB, 128KB and 512KB. I also tried setting the number of parallel client streams to 2 and 10. These settings offered no real improvements to the measured throughput. I have also tried testing with Wowza Load Testing Tool. I have simulated 2000 connections to a 5160 Kbps stream, but at 345 connections traffic out maxes out. 345 x 5160 Kbps equals ~ 1.7 Gbps, so I basically hit the same roof. This is at least four times worse than what a blogger at scalablelogic reports where tests show results of 7 Gbps and 9.5 Gbps. I’m testing between two c3.8xlarge instances located in the same zone and region, so these should be optimal benchmarking conditions. The one c3.8xlarge acts as iperf server and the other as an iperf client. I have tried with instances launched with Amazon Linux AMI 2013.09.2 (64-bit) as well as Ubuntu Server 13.10 (64 bit). I have also tried launching instances in the same placement group, but I still hit the same limit.

Why am I seeing such poor results? What should I look at if I want to improve throughput? Is this a limit that can be dealt with? There’s plenty of forum members that would be interested in knowing this. Also, it would be great to know what causes this limit. If it is indeed the case that this limit is known to Amazon and maybe even artificially imposed, I think you should specify that a 10 GbE instance is only capable of 1.7 Gbps (just 1.8 x more than a “High” instance) when not using internal IP’s.

SUPPORT - I appreciate your patience on this issue. I reached out to our operations folks and asked them about what you are seeing. They advised me that there are no specific caps on the amount of bandwidth that our enforced by AWS, that speed is based on a number of variables including network equipment, regional considerations, etc. Additionally, they advised me that the 10 gig network as sold is based on transfer rates within the data center/local network to the instance. They did similar testing and their tests showed varying transfer rates depending on the datacenter and availability zone they used.

One suggestion is that in order to sustain higher rates of transfer to the public Internet that scaling horizontally with more instances might perform better than scaling vertically with a larger instance. As this is not a “bandwidth as a service” speeds to the public Internet can not be guaranteed.

I understand that this might not be the solution you are looking for, and I apologize that this product doesn’t meet your needs for this particular case. Please let us know if you have further questions.

ME - Of those variables, what exactly is thought to be the most prominent bottleneck? Since we both see the 1.73 Gbps limit consistently it should be possible to find out. Do you have any plans for improving external throughput from these 10 GbE instances? The 10 GbE may refer to internal transfer rate, but it is worth to note that network performance for “Low” to “High” instances has a 1:1 relationship between internal and external throughput. External throughput on 10 GbE instances seems to be 5.5 times slower than internal throughput. Very noticeable.

SUPPORT - The placement groups bandwidth apply for the private subnet network. What you do there is causing the connection to leave the subnet go out of the network and renter again which is suboptimal at best. The problem here lies that since it will be routed outside of our network and then back in again it will traverse through different network equipment than in a private local subnet.

Unfortunately there is nothing we can do here but rather advise you to use the private IP addresses to get the benefit of the full 10G link. Everything else will simply not work. Furthermore we do keep information of our internal network confidential and we cannot possible troubleshoot further or explain why you constantly see 1.73G. It actually makes a lot of sense why a local dedicated subnet would achieve higher throughput rather pipes that are shared because they are routed outside the placement group and get routed outside of our network and inside again. So you are basically asking why you get lower bandwidth by using a public pipe on a router than in the same subnet with optimal conditions and no shared resources. Also please keep in mind different network equipment use different medium characteristics as well as configurations.

While I understand this answer may not be satisfactory and may not explain a lot in detail we certainly cannot give you any information of the internals of our network. In case you are interested enterprise customers have the benefit of Non-Disclosure Agreements which we can share more information of our internal systems.

If this seriously impacts your business logic we can certainly work with you towards a solution (e.g Direct Connect)and point you to the right direction.

Hello,

As Tim D. pointed out earlier in this thread, if you’d like to test the capabilities of your system configuration, we recommend that you configure the AMI and use the Wowza RTMP Load Test Tool to determine playback performance.

Best regards,

Andrew