Transcoding an incoming RTP stream

Hello,

I’m trying to build an application that receives an RTP stream and makes it available to flash player clients. The RTP source sends ulaw encoded PCM data and I want to transcode it to Speex before sending it to the flash player client.

I’m using VLC to send the RTP stream (I’m capturing from my microphone). So far, I’ve managed to set up VLC, send the stream to Wowza and play it in my flash player client, provided that the stream is encoded such that flash player can decode it.

Now, the problem is that I don’t know how to access the samples in the RTP stream and transcode and republish. I have tried with the VideoPassThru example (https://www.wowza.com/forums/showthread.php?t=464) but, I can only get this sample to gain access to streams that are captured by Flash Player, not to RTP streams.

Can anyone help me gain access to the audio data in an RTP stream?

Since you already have a process that is using VLC you should do the transcoding in VLC and send AAC or MP3 to Wowza. There is no need to do the trancoding in Wowza.

Charlie

It is going to be a bit tricky to implement. The publishing part is easy:

https://www.wowza.com/forums/showthread.php?t=7100

The packet interception part is not too hard either:

https://www.wowza.com/downloads/forums/videopassthru/VideoPassThru.zip

Putting them together might be a bit tricky.

Charlie

This is just not going to be very easy to do. It would be better to do the transcoding outside of Wowza. You should be able to create a simple RTP re-streaming application that receives RTP, transcodes the audio and sends the stream back out over RTP to Wowza. This is going to be a much better architecture.

Charlie

Yes and yes.

Charlie

RTP packets are received. They are unpacked and injected into a stream. If you transcode on injection then you don’t really need to know any more about the chain below this. There is no need to intercept the packets at any other location in processing. You simply need to unpack the RTP audio stream, transcode and inject the Speex packets into the server. The Speex packetizer should give you a good idea of how it is done. Just know that Flash only support Speex wideband 16Khz streams.

This discussion is starting to moving beyond what is available by means of free support.

Charlie

Another approach is to intercept the RTP stream earlier in the process. You can write your own RTP depacketizer for ulaw stream, depacketize the audio stream, transcode and inject Speex packets into the stream. This is the cleanest solution. Here is the source code for the Speex depacketizer:

import java.net.*;
import com.wowza.util.*;
import com.wowza.wms.logging.*;
import com.wowza.wms.rtp.model.*;
import com.wowza.wms.rtp.packetizer.RTPPacketizerBase;
import com.wowza.wms.vhost.*;
public class RTPDePacketizerSpeex extends RTPDePacketizerAudioBase implements IRTPDePacketizer, IRTPTimecodeProvider
{
	private RTCPEventHandlerGeneric rtcpEventHandler = new RTCPEventHandlerGeneric();
	private RTPSequence seq = new RTPSequence();
	
	private long lastTimecode = -1;
	private RolloverLong timecode = new RolloverLong(32);
	private RTPPacket workingPacket = null;
	private int packetCount = 0;
	private boolean bitrateWarning = true;
	
	public void init(RTPContext rtpContext, RTPDePacketizerItem rtpDePacketizerItem)
	{
		super.init(rtpContext, rtpDePacketizerItem);
		if (debugLog)
			WMSLoggerFactory.getLogger(null).debug("RTPDePacketizerSpeex.init");
	}
	
	public boolean canHandle(RTPTrack rtpTrack)
	{
		if (rtpTrack.isAudio())
		{
			while (true)
			{
				String sampleType = rtpTrack.getSampleType();
				if (sampleType == null)
					break;
				
				if (sampleType.toLowerCase().startsWith("speex"))
					return true;
				break;
			}
		}
		
		return false;
	}
	public void handleRTCPPacket(SocketAddress socketAddr, RTPTrack rtpTrack, byte[] bytes, int offset, int len)
	{
		if (debugLog)
		{
			int dsize = Math.min(len, 16);
			WMSLoggerFactory.getLogger(null).debug("rtcp["+rtpTrack.getTrackId()+":"+len+"] {"+DebugUtils.formatBytesShort(bytes, offset, dsize)+"}");
		}
		if (!checkRTCPSSRC(socketAddr, rtpTrack, bytes, offset, len))
			return;
		rtcpHandler.handleRTCPPacket(socketAddr, rtpTrack, bytes, offset, len);
	}
		
	public void handleRTPPacket(SocketAddress socketAddr, RTPTrack rtpTrack, byte[] bytes, int offset, int len)
	{
		if (debugLog)
		{
			int dsize = Math.min(len, 16);
			WMSLoggerFactory.getLogger(null).debug("rtp["+rtpTrack.getTrackId()+":"+len+"] {"+DebugUtils.formatBytesShort(bytes, offset, dsize)+"}");
		}
			
		if (!checkRTPSSRC(socketAddr, rtpTrack, bytes, offset, len))
			return;
				
		seq.handleRTPPacket(rtpTrack, bytes, offset, len);
		//hande lower level rtpTrack.getRTPStream().incrementMediaInBytes(len);
		long timeval = BufferUtils.byteArrayToLong(bytes, offset+4, 4);
		timecode.set(timeval);
		int timescale = rtpTrack.getTimescale();
		int channels = rtpTrack.getChannelCount();
		
		setAudioCodecId(rtpTrack, IVHost.CODEC_AUDIO_SPEEX);
		try
		{
			
			int rtpHeaderSize = skipRTPExtensions(bytes, offset, len, RTPPacketizerBase.RTPHEADERSIZE);
			int index = 0;
			if (workingPacket == null)
			{
				workingPacket = new RTPPacket();
				workingPacket.setType(IVHost.CONTENTTYPE_AUDIO);
				workingPacket.setCodec(IVHost.CODEC_AUDIO_SPEEX);
	
				int frameType = 2;
				frameType += channels-1;
				
				if (timescale != 16000 && bitrateWarning)
				{
					WMSLoggerFactory.getLogger(null).warn("RTPDePacketizerSpeex.handleRTPPacket: Flash only supports SPEEX at a bitrate of 16000");
					bitrateWarning = false;
				}
	
				workingPacket.setFrameType(frameType);
				workingPacket.setTimecode(timecode.get());
			}
			
			int tailOffset = 0;
			for(int i=1;i<3;i++)
			{
				if (bytes[offset+len-i] == (byte)0xff || bytes[offset+len-i] == (byte)0x7f)
				{
					tailOffset++;
					if (bytes[offset+len-i] == (byte)0x7f)
						break;
					else
						continue;
				}
				break;
			}
			RTPPacketFragment packetFragment = new RTPPacketFragment(bytes, offset+rtpHeaderSize+index, len-(rtpHeaderSize+index+tailOffset));
			workingPacket.addFragment(packetFragment);
			packetCount++;
			
			while(true)
			{
				boolean timeSyncReady = rtcpHandler.isTimeSyncReady(rtpTrack, workingPacket.getTimecode());
				if (!timeSyncReady)
				{
					if (!timeSyncReady)
						checkRTCPMissingWarning();							
					break;
				}
				long adjTimecode = rtcpHandler.convertTimeSyncTimecode(timecode.get(), timescale);
				
				if (debugLog)
					workingPacket.setDebugLog(true);
				workingPacket.write(rtpTrack, adjTimecode);
				break;
			}
			
			workingPacket = null;
			packetCount = 0;
		}
		catch (Exception e)
		{
			WMSLoggerFactory.getLogger(null).debug("RTPDePacketizerSpeex.handleRTPPacket: "+e.toString());
		}
	}
	
	public void startup(RTPTrack rtpTrack)
	{
		rtcpEventHandler.setTimecodeProvider(this);
		rtcpHandler.addEventListener(rtcpEventHandler);
		setupAppInstanceRTCPEventHandler(this, rtcpHandler, rtpTrack);
	}
	public void shutdown(RTPTrack rtpTrack)
	{
	}
	
	public long getAdjTimecode(RTPTrack rtpTrack)
	{
		if (lastTimecode != -1 && rtcpHandler.isTimeSyncReady(rtpTrack, lastTimecode))
		{
			long adjTimecode = rtcpHandler.convertTimeSyncTimecode(lastTimecode, rtpTrack.getTimescale());
			return adjTimecode;
		}
		
		return -1;
	}
}

You will need to add your class to the [install-dir]/conf/RTP.xml de-packetizer list. You should be able to play around with this class to figure out how to intercept the stream. If you return true from the canHandle method then you will receive that streams RTP packets.

Charlie

Dear Charlie,

Thank you for your reply. We have tried putting the VLC example and the VideoPassThru example together, but as Etienne notes, while a stream coming from Flash Player correctly runs through the MediaStreamPassThru object, an incoming RTP stream does not.

It is unclear to me how Wowza decides that a class is initiated and a stream is fed to it. In the VideoPassThru example, the videopassthru application’s Application.xml file tells the system there’s a StreamType called passthru, which is defined in Streams.xml and tells Wowza to use MediaStreamPassThru as ClassBase. However, this only works of course if the application is loaded. This would explain why the flash player client stream is ‘seen’ in de logs.

An incoming RTP stream (as far as I can tell) wouldn’t launch any application. So how can we tell Wowza what the stream type is? Or, how can we restream an RTP stream within the context of an application?

Dear Charlie,

Is the list of depacketizers traversed in the order of occurence in RTP.xml until one of them returns true to canHandle?

So we’d need to create a new class, put it in the top of the list in RTP.xml and make sure it’s canHandle method returns true for ulaw streams?

Sander.

Thanks Charlie, I’m starting to get a better understanding of the route of a stream through Wowza. Would it be possible for you to give me an idea of the whole chain that an RTP stream follows?

Do I understand correctly that the depacketizer receives byte[] and then writes the unpacked data to the RTPTrack object?

It would be interesting then to know how the data in the RTPTrack object ends up in a call to addAudioData on a MediaStreamLive object or in an RTPStream.

Could you tell me how the chain of objects that handle a stream is built up and how this chain is influenced by the conf files?

Regards,

Sander.

Dear Charlie,

Back in 2010, we have managed to create a depacketizer that transcodes mulaw and alaw to Speex. We recently upgraded our server to Wowza 3.0.3. Since then we have some problems with the transcoding. We never updated our code because it compiled perfectly against the new 3.0.3 API. But the audio we get contains terrible noise, probably an encoding artefact.

Can you tell me if the code for the Speex depacketizer has changed from version 2 to 3? If so, would you be so kind to post the v3 code here so I can check if I need to update anything in my code?

Sander.

Since you already have a process that is using VLC you should do the transcoding in VLC and send AAC or MP3 to Wowza. There is no need to do the trancoding in Wowza.

Charlie

Hello Charlie,

Thanks for your fast answer and sorry for stepping in the thread, but I work with skruger on this project.

In fact there’s a need for doing the transcoding inside wowza because the vlc process is just used to mock another machine up ( a phone gateway feeding wowza with multiple sound channels coming from pstn devices ( mu-law coded)).

We do not want to modify that machine, so somehow, the sound must be transcoded inside wowza.

Thanks again for your support !

Etienne

It is going to be a bit tricky to implement. The publishing part is easy:

https://www.wowza.com/forums/showthread.php?t=7100

The packet interception part is not too hard either:

https://www.wowza.com/downloads/forums/videopassthru/VideoPassThru.zip

Putting them together might be a bit tricky.

Charlie

Thanks again for your answer,

I already spotted those two samples …

I managed to deploy the videopassthru application and make it work using the flash client contained in the videopassthru.zip file. I can “see” the samples flowing into the stream ( a lot of log file entries)

( btw: the videopassthru.html tries to load videochat.swf but the file is named videopassthru.swf in the zip file)

I made a test bed like this ( goal: obtaining log entries showing that the samples are available).

  • vlc streams a mp4 video in rtp to the wowza server.

  • I published this stream via the streammanager web application ( using an sdp file) in the videopassthru application.

  • using the client from the videopassthru application, after connecting to the stream , I can see the video coming from the rtp streaming, but in this case, I have no log entries for the samples…(eg: addVideoData is not called …)

Is this clear enough for you to understand ? :wink: ( sorry for my poor “esperanto” english

Thanks !

Etienne