RTSP Live Streaming

Frank Wang  |  Software Engineer

ABSTRACT — With the rapid evolution of the IP camera industry, almost every IP surveillance camera supports Real Time Streaming Protocol (RTSP) video streaming, which means the user can operate a media player to watch live views from anywhere. RTSP provides a way for users to control video and audio. Real-time Transport Protocol (RTSP) does not actually provide for the transfer of video signals and audio signals; it controls how packages are delivered.

I – Introduction

RTSP is a network control protocol that is designed for watching a live feed and controlling media sessions between end points. The transmission of streaming data itself is not a task of RTSP. Instead, RTSP uses a combination of reliable transmission over TCP (used for control) and best-efforts delivery over UDP (used for content) to stream content to users. Most RTSP servers use the Real-time Transport Protocol (RTP) in conjunction with Real-time Control Protocol (RTCP) for media stream delivery.

II – Analysis Of Network Packet

Let’s consider an interaction where the client and server will use a combination of TCP-based RTSP and UDP-based RTP and RTCP to deliver a video stream.

We capture the network traffic between our IP camera and clientside by Wireshark. The parameters of experimental scenario are shown in the table below.


RTSP is a protocol used for transferring real-time multimedia data (e.g., audio and video) between a server and a client. It is a streaming protocol; this means that RTSP attempts to facilitate scenarios in which the multimedia data is being simultaneously transferred and rendered (i.e., video is displayed, and audio is played).

The server needs to maintain a session state to be able to correlate RTSP requests with a stream. The state transitions are depicted in Fig.1.

Figure 1: State machine transition diagram of RTSP server.

The basic RTSP requests are described as follows:

1. Options

The client will establish a TCP connection to port 554 on the server. An OPTIONS request returns the request types that the server will accept.

2. Describe

A DESCRIBE request includes an RTSP URL and the type of reply data that can be handled. This response includes the presentation description, typically in Session Description Protocol (SDP) format.

3. Setup

SETUP request specifies how a single media stream must be transported. The request contains the media stream URL and a transport specifier.

transport specifier. This specifier typically includes a local port for receiving RTP data (audio or video), and another for RTCP data (meta information). The server reply usually confirms the chosen parameters, and fills in the missing parts, such as the server’s chosen ports. track1 (audio):

– Client RTP port is 59842, client RTCP port is 59843
– Server RTP port is 6970, server RTCP port is 6971

track2 (video):

– Client RTP port is 59844, client RTCP port is 59845
– Server RTP port is 6972, server RTCP port is 6973

4. Play

A PLAY request will cause one or all media streams to be played.

5. Teardown

A TEARDOWN request is used to terminate the session. It stops all media streams and frees all session-related data on the server.


RTP is designed to carry the encoded audio and video data. RTP adds a header to each packet, which is then passed to the UDP for further processing.

RTP also provides a time-stamping function that allows multiple streams from the same source to be synchronized. Each form of payload (i.e., video and audio) has a specific way of being mapped into RTP.

Each source inserts time stamps into outgoing packet headers which can be processed by the receiver to recover the stream’s clock signal that is needed to correctly play video and audio clips.

As can be seen in Fig. 2, an entire frame can be identified as a sequence of packets ending with a packet having the RTP marker bit set.

Figure 2: Video and audio RTP packets.

For voice packets, the marker bit indicates the beginning of a talkspurt. Beginnings of talkspurts are good opportunities to adjust the playout delay at the receiver to compensate for differences between the sender and receiver clock rates as well as changes in the network delay jitter.


RTCP is a bidirectional UDP-based mechanism to allow the client to communicate stream-quality information back to the server. This connection always uses the next incremental UDP port of the RTP source port. Fig. 3 shows how the three protocols work together in our experimental scenario.

Figure 3: The three main application protocols used in real-time streaming.

III – Conclusion

The RTSP provides a means for the user to control video and audio. RTSP does not actually provide for the transport of video signals and audio signals; instead, it allows these signals to be controlled by the user. Like a dispatcher for a delivery service, RTSP does not actually deliver packages; it controls when and how packages are delivered by other protocols, such as RTP.


1. M. Syme and P. Goldie, Optimizing Network Performance with Content Switching: Server, Firewall and Cache Load Balancing: Server, Firewall, and Cache Load Balancing, Prentice Hall, 2003.

Download PDF >

See all Technical Papers >