ZeePedia

TECHNOLOGICAL BACKGROUND:Components, Terminal, Protocols, SIP

<< INTRODUCTION:Goal, Reasons for writing this document, How to read this document
IP TELEPHONY SCENARIOS:Long-distance least cost routing, Integration of VoIP and videoconferencing >>
[IP Telephony Cookbook] / Technological Background
Technological Background }
2
This chapter provides technical background information about the protocols and components
used in IP Telephony. It introduces the relevant component types, gives detailed information
about H.323, SIP and RTP as well as information about media gateway control and vendor
-specific protocols.
} 2.1 Components
An IP Telephony infrastructure usually consists of different types of components.This section
gives an overview of typical components without describing them in a protocol-specific context.
} 2.1.1 Terminal
A terminal is a communication endpoint that terminates calls and their media streams. Most
commonly, this is either a hardware or a software telephone or videophone, possibly enhanced
with data capabilities.There are terminals that are intended for user interaction and others that are
automated, e.g., answering machines.
An IP Telephony terminal is located on at least one IP address.There may well be multiple
terminals on the same IP address but they are treated independently. Most of the time, a terminal
has been assigned one or more addresses (see Section 2.1.5), which others will use to dial to it.
If IP Telephony servers are used, a terminal registers the addresses with its server.
} 2.1.2 Server
Placing an IP Telephony call requires at least two terminals, and the knowledge of the IP address
and port number of the terminal to call. Obviously, forcing the user to remember and use IP
addresses for placing calls is not ideal and dynamic IP addressing schemes (DHCP) make this
requirement even more intolerable.
As mentioned before, terminals usually register their addresses with a server.The server stores
these telephone addresses along with the IP addresses of the respective terminals, and is thus able
to map a telephone address to a host.
When a telephone user dials an address, the server tries to resolve the given address into a
network address.To do so, the server may interact with other telephony servers or services.
It may also provide further call routing mechanisms like CPL (Call Processing Language) scripts
P.11
[IP Telephony Cookbook] / Technological Background
or skill-based routing (e.g., route calls to `WWW-Support' to a list of persons who are tagged to
be responsible for this subject).
Finally, a telephony server is responsible for authenticating registrations, authorising calling parties
and performing the accounting
{ 2.1.3 Gateway
Gateways are telephony endpoints that facilitate calls between endpoints that usually would not
interoperate. Usually this means that a gateway translates one signalling protocol into another (e.g.
SIP/ISDN signalling gateways), but translating between different network addresses (IPv4/IPv6)
or codecs (media gateways) can be considered gatewaying as well. Of course, it is possible that
multiple functionalities exist in a single gateway.
Finding gateways between VoIP and a traditional PBX is usually quite simple. Gateways that
translate different VoIP protocols are harder to find. Most of them are limited to basic call
functionality.
{ 2.1.4 Conference bridge
Conference bridges provide the means to have 3-point or multi-point conferences that can either
be ad-hoc or scheduled. Because of the high resource requirements, conference bridges are
usually dedicated servers with special media hardware.
{ 2.1.5 Addressing
A user willing to use a communication service needs an identifier to describe himself and the
called party. Ideally, such an identifier should be independent of the user's physical location.The
network should be then responsible for finding the current location of the called party. A specific
user may define to be reached by multiple contact address identifiers.
Regular telephony systems use E.164 numbers (the international public telecommunication
numbering plan). An identifier is composed of up to fifteen digits with a leading plus sign, for
example, +1234565789123.When dialling, the leading plus is normally replaced by the
international access code, usually double zero (00).This is followed by a country code and a
subscriber number.
The first IP Telephony systems used the IP addresses of end-point devices as user identifiers.
Sometimes they are still used now. However, IP addresses are not location-independent (even if
IPv6 is used) and they are hard to remember (especially if IPv6 is used) so they are not suitable as
user identifiers.
Current IP Telephony systems use two kinds of identifiers:
- URIs (RFC2396);
- Numbers (E.164).
P.12
[IP Telephony Cookbook] / Technological Background
Some systems tried to use names (alpha-numeric strings), but this led to a flat naming space and
thus limited zones of applicability.
A Universal Resource Identifier (URI) uses a registered naming space to describe a resource in a
location-independent way. Resources are available under a variety of naming schemes and access
methods including e-mail addresses (mailto), SIP identifiers (sip), H.323 identifiers (h.323,
RFC3508) or telephone numbers (draft-ietf-iptel-rfc2806bis-02). E-mail-like identifiers have
several advantages.They are easy to remember, nearly every Internet user already has an e-mail
address and a new service can be added using the same identifier.The user location can be found
with a Domain Name System (DNS).The disadvantage of URIs is that they are difficult or
impossible to dial on some user devices (phones).
If we want to integrate a regular telephony system with IP Telephony, we must deal with phone
number identifiers even on the IP Telephony-side.The numbers are not well suited for an
Internet world relying on domain names.Therefore, the ENUM system was invented, using
adapted phone numbers as domain names. ENUM is described in Chapter 7.
{ 2.2. Protocols
{ 2.2.1 H.323
The H.323 Series of Recommendations evolved out of the ITU-T's work on video telephony
and multimedia conferencing. After completing standardisation on video telephony and
videoconferencing for ISDN at up to 2 Mbit/s in the H.320 series, the ITU-T took on work on
similar multimedia communication over ATM networks (H.310, H.321), over the analogue Public
Switched Telephone Network (PSTN) using modem technology (H.324), and over the stillborn
Isochronous Ethernet (H.322).The most widely-adopted and hence most promising network
infrastructure - and the one bearing the largest difficulties to achieve well-defined Quality of
Service - was addressed in the beginning of 1995 in H.323: Local Area Networks, with the focus
on IP as the network layer protocol.The primary goal was to interface multimedia
communication equipment on LANs to the reasonably well-established base on circuit-switched
networks.
The initial version of H.323 was approved by the ITU-T about one year later, in June 1996,
thereby providing a base on which the industry could converge.The initial focus was clearly on
local network environments, because QoS mechanisms for IP-based wide area networks, such as
the Internet, were not well established at this point. In early 1996, Internet-wide deployment of
H.323 was already explicitly included in the scope, as was the aim to support voice-only
applications and, thus, the foundations to use H.323 for IP Telephony were laid. H.323 has
continuously evolved towards becoming a technically sound and functionally rich protocol
platform for IP Telephony applications.The first major additions to this end were included in
H.323 version 2, approved by the ITU-T in January 1998. In September 1999, H.323v3 was
approved by the ITU-T, incorporating numerous further functional and conceptual extensions to
enable H.323 to serve as a basis for IP Telephony on a global scale and as well as making it meet
requirements in enterprise environments. Moreover, many new enhancements were introduced
into the H.323 protocol.Version 4 was approved on November 17, 2000 and contains
enhancements in a number of important areas, including reliability, scalability, and flexibility.
P.13
img
[IP Telephony Cookbook] / Technological Background
New features help facilitate more scalable Gateway and MCU solutions to meet the growing
market requirements. H.323 has been the undisputed leader in voice, video, and data conferencing
on packet networks, and Version 4 endeavours to keep H.323 ahead of the competition.
{ 2.2.1.1 Scope
As stated before, the scope of H.323 encompasses multimedia communication in IP-based
networks, with significant consideration given to gatewaying to circuit-switched networks (in
particular to ISDN-based video telephony and to PSTN/ISDN/GSM for voice communication).
Internet / Intranet
ISDN
H.320
H.323
Terminal
H.323
Gatekeeper
PSTN
H.324
H.323
MCU
H.323
ATM
Gateway
H.310, H.321
H.323
Terminal
Internet/Intranet
SIP
Figure 2.1 Scope and components defined in H.323
H.323 defines a number of functional / logical components as shown in Figure 2.1:
- Terminal
Terminals are H.323-capable endpoints, which may be implemented in software on
workstations or as stand-alone devices (such as telephones).They are assigned to one or more
aliases (e.g. a user's name/URI) and/or telephone number(s);
- Gateway
Gateways interconnect H.323 entities (such as endpoints, MCUs, or other gateways) to other
network/protocol environments (such as the telephone network).They are also assigned one or
more aliases and/or telephone number(s).The H.323 Series of Recommendations provides
detailed specifications for interfacing H.323 to H.320, ISDN/PSTN, and ATM-based networks.
Recent work also addresses control and media gateway specifications for telephony trunking
networks such as SS7/ISUP;
- Gatekeeper
The gatekeeper is the core management entity in an H.323 environment. It is, among other
things, responsible for access control, address resolution and H.323 network (load) management
and provides the central hook to implement any kind of utilisation / access policies. An H.323
environment is subdivided into zones (which may, but need not be congruent with the
underlying network topology); each zone is controlled by one primary gatekeeper (with
optional backup gatekeepers). Gatekeepers may also provide added value, e.g., act as a
P.14
img
[IP Telephony Cookbook] / Technological Background
conferencing bridge or offer supplementary call services. An H.323 Gatekeeper can also be
equipped with the proxy feature. Such a feature enables the routing through the gatekeeper of
the RTP traffic (audio and video) and the T.120 traffic (data), so no traffic is directly exchanged
between endpoints. (It could be considered a kind of IP-to-IP gateway that can be used for
security and QoS purposes);
- Multipoint Controller (MC)
A Multipoint Controller is a logical entity that interconnects the call signalling and conference
control channels of two or more H.323 entities in a star topology. MCs coordinate the (control
aspects of) media exchange between all entities involved in a conference.They also provide the
endpoints with participant lists, exercise floor control, etc. MCs may be embedded in any H.323
entity (terminals, gateways gatekeepers) or implemented as stand-alone entities.They can be
cascaded to allow conferences spanning multiple MCs;
- Multipoint Processor (MP)
For multipoint conferences with H.323, an optional Multipoint Processor may be used that
receives media streams from the individual endpoints, combines them through some
mixing/switching technique, and transmits the resulting media streams back to the endpoints;
- Multipoint Control Unit (MCU)
In the H.323 world, an MCU is simply a combination of an MC and an MP in a single device.
The term originates in the ISDN videoconferencing world where MCUs were needed to
create multipoint conferences out of a set of point-to-point connections.
{ 2.2.1.2 Signalling protocols
H.323 resides on top of the basic Internet Protocols (IP, IP Multicast,TCP, and UDP) in a similar
way as the IETF protocols discussed in the next subsection, and can make use of integrated and
differentiated services along with resource reservation protocols.
Audio
Conference
Gatekeeper
Data Applications
Video
Control
T.120
RTP/RTCP
RAS
H.225.0
H.245
RSVP
Relaiable MC
UDP
TCP + RFC 1006
IP / IP Multicast
Intergrated / Differentiated Services Forwarding
Figure 2.2 H.323 protocol architecture
For basic call signalling and conference control interactions with H.323, the aforementioned
components communicate using three control protocols:
P.15
[IP Telephony Cookbook] / Technological Background
- H.225.0 Registration, Admission, and Status (RAS)
The RAS channel is used for communication between H.323 endpoints and their gatekeeper
and for some inter-gatekeeper communication. Endpoints use RAS to register with their
gatekeeper, to request permission to utilise system resources, to have addresses of remote
endpoints resolved, etc. Gatekeepers use RAS to keep track of the status of their associated
endpoints and to collect information about actual resource utilisation after call termination.
RAS provides mechanisms for user/endpoint authentication and call authorisation;
- H.225.0 Call Signalling
The call signalling channel is used to signal call setup intention, success, failures, etc, as well as
to carry operations for supplementary services (see below). Call signalling messages are derived
from Q.931 (ISDN call signalling); however, simplified procedures and only a subset of the
messages are used in H.323.The call signalling channel is used end-to-end between calling
party and called party and may optionally run through one or more gatekeepers (the call
signalling models are later described in the `Signalling models' Section).
Optimisations: Since version 3, H.225.0 supports the following enhancements:
-
Multiple Calls - To prevent using a dedicated TCP connection for each call, gateways can
be built to handle multiple calls on each connection.
-
Maintain Connection - Similar to Multiple Calls, this enhancement will reduce the need
to open new TCP connections. After the last call has ended, the endpoint may decide to
maintain the TCP connection to provide a better call setup time for the next call.
The primary use of both enhancements is at the communication between servers (gatekeeper,
MCU) or gateways.While, in theory, both mechanisms were possible before, beginning with
H.323v3, the messages contained fields to indicate support for the mechanisms;
- H.245 Conference Control
The conference control channel is used to establish and control two-party calls (as well as
multiparty conferences). Its functionality includes determining possible modes for media
exchange (e.g., select media encoding formats that both parties understand) and configuring
actual media streams (including exchanging transport addresses to send media streams to and
receive them from). H.245 can be used to carry user input (such as DTMF) and enables
confidential media exchange and defines syntax and semantics for multipoint conference
operation (see below). Finally, it provides a number of maintenance messages. Also, this logical
channel may (optionally) run through one or more gatekeepers, or directly between calling
party and called party (please refer to the `Signalling models' Section for details).
It should be noted that H.245 is a legacy protocol inherited from the collective work on
multimedia conferencing over ATM, PSTN and other networks. Hence it carries a lot of fields
and procedures that do not apply to H.323 but make the protocol specification quite
heavyweight.
Optimisations:
The conference control channel is also subject to optimisations. Per default, it is transported
over an exclusive TCP connection but it may also be tunnelled within the signalling connection
P.16
img
[IP Telephony Cookbook] / Technological Background
(H.245 tunnelling). Other optimisations deal with the call setup time.The last chance to start an
H.245 channel is on receipt of the CONNECT message which implies that the first seconds
after the user accepted the call, no media is transmitted. H.245 may also start parallel to the
setup of the H.225 call signalling, which is not really a new feature but another way of dealing
with H.245.Vendors often call this Early Connect or Early Media. Since H.323v2, it is
possible to start a call using a less powerful but sufficient capability exchange by simply offering
possible media channels that just have to be accepted.This procedure, called FastConnect or
FastStart, requires less round-trips and is transported over the H.225 channel. After the
FastConnect procedure is finished or when it fails, the normal H.245 procedures start.
A number of extensions to H.323 include mechanisms for more efficient call setup (H.323 Annex
E) and reduction of protocol overhead e.g., for simple telephones (SETs, simple endpoint types
and H.323.Annex F).
{ 2.2.1.3 Gatekeeper discovery and registration
An H.323 endpoint usually registers with a gatekeeper that provides basic services like address
resolution for calling the other endpoints.There are two possibilities for an endpoint to find its
gatekeeper:
- Multicast discovery
The endpoint sends a gatekeeper request (GRQ) to a well-known multicast address
(224.0.1.41) and port (1718). Receiving gatekeepers may confirm their responsibility for the
endpoint (GCF) or ignore the request
- Configuration
The endpoint knows the IP address of the gatekeeper by manual configuration.While there is
no need for a gatekeeper request (GRQ) to be sent to the preconfigured gatekeeper, some
products need this protocol step. If a gatekeeper receives a GRQ via unicast, it must either
confirm (GCF) the request or reject it (GRJ).
When trying to discover the gatekeeper via multicast, an endpoint may request any gatekeeper or
specify the request by adding a gatekeeper identifier to the request. Only the gatekeeper that has
the requested identifier may reply positively. (see Figure 2.3)
Endpoint
Gatekeeper
Gatekeeper
h323:prelle
id1
id2
GRQ:id1
GRQ:id1
GCF
GRJ
RRQ:prelle
RCF
Figure 2.3 Discovery and registration process
P.17
[IP Telephony Cookbook] / Technological Background
After the endpoint discovers the location of the gatekeeper, it tries to register itself (RRQ). Such
a registration includes (among other information):
- The addresses of the endpoint - for a terminal, this may be the user ids or telephone numbers.
An endpoint may have more than one address. In theory it is possible that addresses belong to
different users to enable multiple users to share a single phone - in practice, this depends on the
phones and the gatekeeper implementation;
- Prefixes - if the registering endpoint is a gateway it may register number prefixes instead of
addresses;
- Time to live - an endpoint may request how long the registration will last.This value can be
overwritten by gatekeeper policies.
The gatekeeper checks the requested registration information and confirms the (possibly
modified) values (RCF). It may also reject a registration request because of, for example, invalid
addresses. In the case of a confirmation, the gatekeeper assigns a unique identifier to the endpoint,
which will be used in subsequent requests to indicate that the endpoint is still registered.
2.2.1.3.1 Addresses and registrations
H.323 defines and utilises several address types.The one most commonly used and derived from
the PSTN world is the dialled digit address, which is defined as a number dialled by the endpoint.
It does not include further information (e.g., about the dial plan) and needs to be interpreted by
the server.The server might convert the dialled number into a party number that includes
information about the type of number and the dial plan.
To provide alphanumeric or name dialling, H.323 supports H.323-IDs that represent either
usernames or e-mail-like addresses, or the more general approach of URL-ID which represent
any kind of URL.
Unlike SIP addresses, an H.323 address can only be registered by one endpoint (per zone), so a
call to that address only resolves to a single endpoint.To call multiple destinations simultaneously
in H.323 requires a gatekeeper that actively maps a single address to multiple different addresses
and tries to contact them in sequence.
2.2.1.3.2 Updating registrations
A registration expires after a defined time and must therefore be refreshed i.e., kept alive by
subsequent registrations which include the previously-assigned endpoint identifier.To reduce the
registration overhead of regular registrations, H.323 supports KeepAlive registrations that contain
only the previously-assigned endpoint identifier. Of course, these registrations may only be sent if
the registration information is unchanged.
Endpoints requesting the registration of large numbers of addresses would exceed the size of a
UDP packet, so H.323v4 supports Additive Registration, a mechanism that allows an endpoint
to send multiple registration requests (RRQ) in which the addresses do not replace existing
registrations but are submitted in addition to them.
P.18
[IP Telephony Cookbook] / Technological Background
{ 2.2.1.4 Signalling models
Call signalling messages and H.245 control messages may be exchanged either end-to-end
between calling party and called party or through a gatekeeper. Depending on the role the
gatekeeper plays in the call signalling and in the H.245 signalling, the H.323 specification foresees
three different types of signalling models:
- Direct signalling
With this signalling model, only H.225.0 RAS messages are routed through the gatekeeper
while the other logical channel messages are directly exchanged between the two endpoints;
- Gatekeeper-routed call signalling
With this signalling model, H.225.0 RAS and H.225.0 call signalling messages are routed
through the gatekeeper, while the H.245 Conference Control messages are directly exchanged
between the two endpoints;
- Gatekeeper-routed H.245 control, H.225.0 RAS and H.225.0
Call signalling and H.245 Conference Control messages are routed through the gatekeeper and
only the media streams are directly exchanged between the two endpoints.
The following sub-sections detail each signalling model.The figures displayed in this section apply
both to the use of a single gatekeeper and to the use of a gatekeeper network. Since the signalling
model is decided by the configuration of the endpoint's gatekeeper and applies to all the messages
the gatekeeper handles, the extensions to the multiple gatekeeper are straightforward (they simply
apply the definition of the signalling model described in the itemised list above to each
gatekeeper involved), except for the location of zone-external targets (described later in the
`Locating zone external targets' section). Message exchanges in any of the figures in this section
are not reported, as the figures are intended to remain bounded in the ellipse where the H.323
Gatekeeper is depicted. Also, it is described in the `Locating zone external targets' section. Please
note that there is no indication about the call termination in the sub-section of each signalling
model. Please refer to the `Communication phases' Section for details.
The direct signalling model is depicted in Figure 2.4. In this model, the H.225.0 Call Signalling
and H.245 Conference Control messages are exchanged directly between the call terminals. As
shown in the figure, the communication starts with an ARQ (Admission ReQuest) message
sent by the calling party (which may be either a terminal or a gateway) to the gatekeeper.The
ARQ message is used by the endpoint to request access to the packet-based network from the
gatekeeper, which either grants the request with an ACF (Admission ConFirm) or denies it
with an ARJ (Admission ReJect). If an ARJ is issued, the call is terminated. After this first step,
the call signalling part of the call begins with the transmission of the SET UP message from the
calling party to the called party.The transport address of the SET UP message (and of all the
H.225.0 call signalling messages) is retrieved by the calling party from the destCallSignalAddress
field carried inside the ACF received. In the case of the direct signalling model, it is the address of
the destination endpoint. Upon receiving the SET UP message, the called party starts its H.225.0
RAS procedure with the gatekeeper. If successful, a CONNECT message is sent back to the
calling party to indicate acceptance of the call. Before sending the CONNECT message, two
other messages may be sent from the called party to the calling party (those two messages are not
depicted in the figure since we have reported only mandatory messages):
P.19
img
[IP Telephony Cookbook] / Technological Background
- ALERTING message
This message may be sent by the called user to indicate that called user alerting has been
initiated (in everyday terms, the `phone is ringing');
- CALL PROCEEDING message
This message may be sent by the called user to indicate that requested call establishment has
been initiated and no more call-establishment information will be accepted.
Figure 2.4 Direct signalling model
The CONNECT message closes the H.225.0 call signalling part of the call and makes the
terminals starting the H.245 conference control one. In such call mode, the H.245 Conference
Control messages are exchanged directly between the two endpoints (the correct `h245Address'
was retrieved from the CONNECT message itself).The procedures started with the H.245
Conference Control channel are used to:
- allow the exchange of audiovisual and data capabilities, with the TERMINAL CAPABILITY
messages;
- request the transmission of a particular audiovisual and data mode, with the LOGICAL
CHANNEL SIGNALLING messages;
- manage the logical channels used to transport the audiovisual and data information;
- establish which terminal is the master terminal and which is the slave terminal for the purposes
of managing logical channels, with the MASTER SLAVE DETERMINATION messages;
- carry various control and indication signals;
- control the bit rate of individual logical channels and the whole multiplex, with the
MULTIPLEX TABLE SIGNALLING messages;
- measure the round trip delay, from one terminal to the other and back, with the ROUND
TRIP DELAY messages.
Once the H.245 conference control messages are exchanged, the two endpoints have all the
necessary information to open the media streams.
2.2.1.4.2 Gatekeeper-routed call signalling model
The gatekeeper-routed call signalling model is depicted in Figure 2.5. In this model, the H.245
Conference Control messages are exchanged directly between the call termination clients.With
each call, the communication starts with an ARQ message (Admission ReQuest) sent by the
calling party to its gatekeeper.The ARQ message is used by the endpoint to request access to the
P.20
img
[IP Telephony Cookbook] / Technological Background
packet-based network from the gatekeeper, which either grants the request with an ACF
(Admission ConFirm) or denies it with an ARJ (Admission ReJect). After this first step, the
call signalling part of the call begins with the transmission of the SET UP message from the calling
party to its gatekeeper.The transport address of the SET UP message (and of all the H.225.0 call
signalling messages) is retrieved by the calling party from the destCallSignalAddress field, carried
inside the ACF received. In the case of the gatekeeper-routed call signalling model, it is the
address of the gatekeeper itself.The SET UP message is then forwarded by the gatekeeper (or by
the gatekeeper network) to the called endpoint. Upon receiving the SET UP message, the called
party starts its H.225.0 RAS procedure with its gatekeeper. If successful, a CONNECT message is
sent to indicate acceptance of the call. Because of the call model, this message is also sent to the
called endpoint's gatekeeper which is in charge of forwarding it to the calling party endpoint
(either directly or using the gatekeeper network). Before sending the CONNECT message, two
other messages may be sent from the called party to its gatekeeper (those two messages are not
depicted in the figure since only mandatory messages are reported):
- ALERTING message
This message may be sent by the called user to indicate that called user alerting has been
initiated (in everyday terms, the `phone is ringing');
- CALL PROCEEDING message
This message may be sent by the called user to indicate that requested call establishment has
been initiated and no more call establishment information will be accepted.
Figure 2.5 Gatekeeper-routed call signalling model
The two optional messages listed above are then forwarded by the gatekeeper (or by the
gatekeeper network) to the calling party. After receiving the CONNECT message, the calling
party starts the H.245 Conference Control channel procedures directly with the called party (the
correct h245Address was retrieved from the CONNECT message itself).The scope of the H.245
Conference Control channel procedure is the same as is detailed above. Please refer to the `Direct
signalling model' Section for details.
2.2.1.4.3 Gatekeeper-routed H.245 control model
The gatekeeper-routed H.245 control model is depicted in Figure 2.6. In this model, only the
media streams are exchanged directly between the call termination clients. For each call, the
communication starts with an ARQ (Admission ReQuest) message sent by the calling party to
its gatekeeper.The ARQ message is used by the endpoint to be allowed to access the packet-based
P.21
img
[IP Telephony Cookbook] / Technological Background
network by the gatekeeper, which either grants the request with an ACF (Admission ConFirm)
or denies it with an ARJ (Admission ReJect). After this first step, the call signalling part of the
call begins with the transmission of the SET UP message from the calling party to its gatekeeper.
The transport address of the SET UP message (and of all the H.225.0 call signalling messages) is
retrieved by the calling party from the destCallSignalAddress field carried inside the ACF
received. In the case of gatekeeper-routed H.245 control model, it is the address of the gatekeeper
itself.The SET UP message is then forwarded by the gatekeeper (or by the gatekeeper network)
to the called endpoint. Upon receiving the SET UP message, the called party starts its H.225.0
RAS procedure with its gatekeeper. If successful, a CONNECT message is sent to indicate
acceptance of the call. Because of the call model, this message is also sent to the called endpoint's
gatekeeper, which is in charge of forwarding it to the calling party endpoint (either directly or
using the gatekeeper network). Before sending the CONNECT message, two other messages may
be sent from the called party to its gatekeeper (those two messages are not depicted in the figure
since only mandatory messages are reported):
- ALERTING message
This message may be sent by the called user to indicate that called user alerting has been
initiated (in everyday terms, the `phone is ringing');
- CALL PROCEEDING message
This message may be sent by the called user to indicate that requested call establishment has
been initiated and no more call establishment information will be accepted.
Figure 2.6 Gatekeeper-routed H.245 control model
The two optional messages listed above are then forwarded by the gatekeeper (or by the
gatekeeper network) to the calling party. After receiving the CONNECT message, the calling
party starts the H.245 Conference Control channel procedures with its gatekeeper (the correct
h245Address was retrieved from the CONNECT message itself). All of the H.245 channel
messages are then exchanged by the endpoints with their gatekeeper (or gatekeepers). It is the
gatekeeper (or gatekeeper network) which takes care of forwarding them up to the remote
endpoint as foreseen by the gatekeeper-routed H.245 control model.The scope of the H.245
Conference Control channel procedure is the same as is detailed above. Please refer to the `Direct
signalling model' Section for details.
P.22
[IP Telephony Cookbook] / Technological Background
{ 2.2.1.5 Communication phases
In a H.323, communication may be identified in five different phases:
- Call set up;
- Initial communication and capability exchange;
- Establishment of audiovisual communication;
- Call services;
- Call termination.
2.2.1.5.1 Call setup
Recommendation H.225.0 defines the call setup messages and procedures detailed here.The
recommendation foresees that requests for bandwidth reservation should take place at the earliest
possible phase. Unlike other protocols, there is no explicit synchronisation between two endpoints
during the call setup procedure (two endpoints can send a SET UP message to each other at
exactly the same time). Actions to be taken when problems of synchronisation during the
exchange of SET UP messages arise are resolved by the application itself. Applications not
supporting multiple simultaneous calls should issue a busy signal when they have an outstanding
SET UP message, while applications supporting multiple simultaneous calls issue a busy signal
only to the same endpoint to which they sent an outstanding SET UP message. Moreover, an
endpoint should be capable of sending the ALERTING messages. ALERTING means that the
called party has been alerted of an incoming call (`phone ringing', in the language of the old
telephony). Only the ultimate called endpoint originates the ALERTING message and only when
the application has already alerted the user. If a gateway is involved, the gateway sends
ALERTING when it receives a ring indication from the Switched Circuit Network (SCN).
The sending of an ALERTING message is not required if an endpoint can respond to a SET UP
message with a CONNECT, CALL PROCEEDING, or RELEASE COMPLETE within four
seconds. After successfully sending a SET UP message, an endpoint can expect to receive either an
ALERTING, CONNECT, CALL PROCEEDING, or RELEASE COMPLETE message within
4 seconds after successful transmission. Finally, to maintain the consistency of the meaning of the
CONNECT message between packet-based networks and circuit-switched networks, the
CONNECT message should be sent only if it is certain that the capability exchange will
successfully take place and a minimum level of communications can be performed.
The call setup phase may have different realisations:
- basic call setup when neither endpoint are registered
In this call setup the two endpoints communicate directly;
- both endpoints registered to the same gatekeeper
In this call set up the communication is decided by the signalling model configured on the
gatekeeper;
- only calling endpoint has gatekeeper
In this call setup only the calling party sends messages to the gatekeeper depending on the
signalling models configured while the called party sends the messages directly to the calling
party endpoint;
- only called endpoint has gatekeeper
In this call setup only the called party sends messages to the gatekeeper depending on the
signalling models configured while the calling party sends the messages directly to the called
endpoint;
P.23
[IP Telephony Cookbook] / Technological Background
- both endpoints registered to different gatekeepers
Each of the two endpoints communicate with their gatekeeper depending on the signalling
model configured, additional H.225.0 RAS messages may be exchanged between gatekeeper in
order to retrieve location information (see `Locating zone external targets' Section for more
details)
- call setup with Fast Connect procedure
In this call set up, the media channels are established using the Fast Connect procedure.The
Fast Connect procedure speeds up the establishment of a basic point-to-point call (only one
round-trip message exchange is needed) enabling immediate media stream delivery upon call
connection.The Fast Connect procedure is started if the calling endpoint initiates it by
sending a SETUP message containing the FastStart element (to advise it is going to use the
Fast Connect procedure).
This kind of element contains, among the other things, a sequence of all of the parameters
necessary to immediately open and begin transferring media on the channels.The Fast Connect
procedure may be refused by the called endpoint (motivations may be either because it wants to
use features requiring use of H.245 or because it does not implement it).The Fast Connect
procedure may be refused with any H.225.0 call signalling message, up to and including the
CONNECT one. Refusing the Fast Connect procedure (or not initiating it) requires that H.245
procedures be used for the exchange of capabilities and the opening of media channels. Moreover,
the Fast Connect procedure allows more information for the scope of H.323/SIP gatewaying
(further details to be found in Chapter 4);
- call setup via gateways
When a gateway is involved, the call setup between it and the network endpoint is the same as
the endpoint-to-endpoint call setup;
- call setup with an MCU
When an MCU is involved, all endpoints exchange call signalling with the MCU (and with the
interested gatekeepers, if any). No changes are foreseen between an endpoint and the MCU call
setup since it proceeds the same as the endpoint-to-endpoint;
- broadcast call setup
This kind of call setup follows the procedures defined in Recommendation H.332.
2.2.1.5.2 Initial communication and capability exchange
After exchanging call setup messages, the endpoints, if they plan to use H.245, establish the H.245
Control Channel.The H.245 Control Channel is used for the capability exchange and to open
the media channels.The H.245 Control Channel procedures are neither started nor closed if
CONNECT does not arrive. An H.245 Control Channel can also be opened on reception of
ALERTING or CALL PROCEEDING messages) or when an endpoint sends RELEASE
COMPLETE. H.323 endpoints support the capabilities exchange procedure of H.245.The H.245
TERMINALCAPABILITYSET message is used for the exchange of endpoint system capabilities.
This message is the first H.245 message sent.
The master-slave determination procedure of H.245 must be supported by H.323-compliant
endpoints. In cases of multipoint conferencing (MC), capability is present in more than one
endpoint and the master-slave determination is used for determining which MC will play an
active role.The H.245 Control Channel procedure also provides master-slave determination for
opening bi-directional channels for data.
P.24
img
[IP Telephony Cookbook] / Technological Background
After Terminal Capability Exchange has been initiated, a master-slave determination
procedure (consisting of either MASTERSLAVEDETERMINATION or
MASTERSLAVEDETERMINATIONACK) has to be started as the first H.245 Conference
Control procedure. Upon failure of initial capability exchange or master-slave determination
procedures, a maximum of two retries are performed before the endpoint passes to the Call
Termination phase. Normally, after successful completion of the requirements of this phase, the
endpoints proceed directly to establishment of the audiovisual communication phase.
2.2.1.5.2.1 Encapsulation of H.245 messages within H.225.0 call signalling messages
Encapsulation of H.245 messages inside H.225.0 call signalling messages instead of establishing a
separate H.245 channel is possible in order to save resources, synchronise call signalling and
control and reduce call setup time.This process is called `encapsulation' or `tunnelling' of H.245
messages.This procedure allows the terminal to copy the encoded H.245 message using one
structure inside the data of the Call Signalling Channel. If tunnelling is used, any H.225.0 call
signalling message may contain one or more H.245 messages. If there is no need to send an
H.225.0 call signalling message when an H.245 message has to be transmitted, a FACILITY
message is sent detailing (with appropriate fields inside) the reason for such a message.
2.2.1.5.3. Establishment of audiovisual communication
The establishment of audiovisual communication follows the procedures of Recommendation
H.245. Open logical channels for the various information streams are opened using the H.245
procedures.The audio and video streams are transported using an unreliable protocol, while data
communications are transported using a reliable protocol.The transport address that the receiving
endpoint has assigned to a specific logical channel (audio, video or data) is transported by the
OPENLOGICALCHANNELACK message (an example is given in Figure 2.7).That transport
address is used to transmit the information stream associated with that logical channel.
Figure 2.7 OPENLOGICALCHANNELACK message content
2.2.1.5.4. Call services
When the call is active, the terminal may request additional call services. Among the services
reported here are the Bandwidth Change Services and Supplementary Services.With Bandwidth
Change Services. During a conference, the endpoints or gatekeeper (if involved) may, at any time,
P.25
img
[IP Telephony Cookbook] / Technological Background
request an increase or decrease in the call bandwidth. If the aggregate bit rate of all transmitted
and received channels does not exceed the current call bandwidth, then an endpoint may change
the bit rate of a logical channel without requesting a bandwidth change. After requesting a
bandwidth change, the endpoint waits for confirmation prior to actually changing the bit rate
(confirmation usually comes from the gatekeeper). Asking for call bandwidth changes is
performed using a BANDWIDTH CHANGE REQUEST(BRQ) message. If the request is not
accepted, a BANDWIDTH CHANGE REJECT (BRJ) message is returned to the endpoint. If
the request is accepted, a BANDWIDTH CHANGE CONFIRM (BCF) is sent back to the
endpoint.With Supplementary Services, support is optional.The H.450 Series of
Recommendations describes a method of providing Supplementary Services in the H.323
environment. Figure 2.8 reports some of the supplementary services defined so far and their
number in the series.
Recommendation number
Recommendation Title
H.450.1
Supplementary Service Framework
H.450.2
Call Transfer Supplementary Service
H.450.3
Call Diversion Supplementary Service
H.450.4
Call Hold Supplementary Service
H.450.5
Call Park and Pickup Supplementary Service
H.450.6
Call Waiting Supplementary Service
H.450.7
Message Waiting Supplementary Service
H.450.8
Name Identification Supplementary Service
H.450.9
Call Completion Supplementary Service
H.450.10
Call Offer Supplementary Service
H.450.11
Call Intrusion Supplementary Service
Figure 2.8 Supplementary services of the H.450-Series
2.2.1.5.5. Call termination
A call may be terminated either by both endpoints or by the gatekeeper. Call termination is
defined using the following procedure:
- video should be terminated after a complete picture and then all logical channels for video
closed;
- data transmission should be terminated and then all logical channels for data closed;
- audio transmission should be terminated and then all logical channels for audio closed;
- the H.245 ENDSESSIONCOMMAND message (H.245 Control Channel) should be sent by
the endpoint/gatekeeper.This message indicates that the call has to be disconnected; then the
H.245 message transmission should be terminated;
- the ENDSESSIONCOMMAND message should be sent back to the sending endpoint and
then the H.245 Control Channel should be closed;
- a RELEASE COMPLETE message should be sent closing the Call Signalling Channel if this is
still open.
An endpoint receiving an ENDSESSIONCOMMAND message does not need to receive it back
again after replying to it in order to clear a call.Terminating a call within a conference does not
mean that the whole conference needs to be terminated. In order to terminate a conference, an
H.245 message (DROPCONFERENCE) is used.Then the MC should terminate the calls with
the endpoint as described above.
P.26
img
[IP Telephony Cookbook] / Technological Background
A call may be terminated differently depending on the gatekeeper presence and on the party
issuing the call termination:
- call clearing without a gatekeeper
No further action is required;
- call clearing with a gatekeeper
The gatekeeper needs to be informed about the call termination. After RELEASE
COMPLETE is sent, an H.225.0 DISENGAGE REQUEST (DRQ) message should be sent by
each endpoint to its gatekeeper. A Disengage Confirm (DCF) message is sent back to the
endpoints to acknowledge the reception;
- call clearing issued by the gatekeeper
A call may be terminated by the gatekeeper by sending a DRQ to an endpoint.The procedure
described above for call termination should be followed immediately by the endpoint up to the
RELEASE COMPLETE message.Then a reply to the gatekeeper should be sent using a DCF
message.The other endpoint should follow the same call termination procedures upon
receiving the ENDSESSIONCOMMAND message. Moreover, if a multipoint conference is
taking place, in order to close the entire conference, the gatekeeper should send a DRQ to each
endpoint in the conference.
{ 2.2.1.6 Locating zone-external targets
When calling an address that is registered at the same gatekeeper as the calling party, the
gatekeeper just needs to look up its internal tables to resolve the target address. Complexity enters
the picture if the destination address is registered with another gatekeeper.While Chapter 7 will
cover this topic in more detail, the most basic mechanism that H.323 provides is explained here.
A gatekeeper may explicitly request the resolution of an address from other gatekeepers. On receipt
of a request to call an address for which the gatekeeper has no registration, it can send out a
location request (LRQ) to other gatekeepers (see Figure 2.9).The receiving gatekeeper, assuming it
knows the address, will reply with the Transport Service Access Point (a combination of IP
address and port number) of either the requested address or its own call signalling TSAP.
Endpoint
Endpoint
Gatekeeper
Gatekeeper
x@tzi.o
tzi.org
ubik@cesnet.cz
cesnet.cz
RRQ: x@tzi.org
RRQ: ubik@cesnet.cz
RCF
RCF
ARQ:x@tzi.org
LRQ:x@tzi.org
LCF + IP
ACF + IP
Setup: x@tzi.org
Figure 2.9 External address resolution using LRQs
P.27
img
[IP Telephony Cookbook] / Technological Background
A location request can be sent via unicast or multicast. If sent via multicast, only the gatekeeper
that can resolve the address replies. If a gatekeeper receives a unicast LRQ, it either confirms or
rejects the request.
This mechanism can have a list of peer gatekeepers to ask, in parallel or sequentially. It is also
possible to assign a domain suffix or number prefix to each peer so that an address with a
matching number prefix of a neighbouring institution will result in a request to the gatekeeper of
that institution. By defining default peers, one could also build a hierarchy of gatekeepers (see
Chapter 7 for further details).
{ 2.2.1.7 A sample call scenario
Figure 2.10 depicts an example of an inter-zone call setup using H.323 with one gatekeeper (A)
using direct signalling while the other uses routed signalling.The calling party in zone A contacts
its gatekeeper to ask for permission to call the called party in zone B (1).The gatekeeper of zone
A confirms this request and provides the calling party with the address of zone B's gatekeeper
(2).1 The calling party establishes a call signalling channel (and subsequently/in parallel the
conference control channel) to the gatekeeper of zone B (3), who determines the location of the
called party and forwards the request to the called party (4).
Zone A
Zone B
Gate-
Gate-
(6)
keeper
keeper
(3)
(1)
(2)
(4)
(7)
(8)
(5)
Caller
Callee
(9)
H.225.0 RAS
H.225.0 Call Signaling + H.245
Media Streams
Figure 2.10 A sample H.323 call setup scenario
The called party explicitly confirms with its gatekeeper that it is allowed to accept the call (5, 6)
and, if so, alerts the recipient of the call, returns an alerting indication and (once the receiving user
picks up the call) eventually an indication of successful connection setup back to the calling party
(7, 8). In (parallel to) this exchange, capability negotiation and media stream configuration take
place.When the setup has completed, both parties start sending media streams directly to each
other.
P.28
[IP Telephony Cookbook] / Technological Background
{ 2.2.1.8 Additional (call) services
It is well known from our daily interaction with PBXs that telephony service comprises far more
than just call setup and teardown: n-way conferencing and various supplementary services (such as
call transfer, call waiting, etc.) are available. Similar features, at least the more commonly known
and used ones, need to be provided by IP Telephony systems as well in order to be accepted by
customers. Additional call services in H.323 can be grouped into three categories:
- Conferencing
H.323 inherently supports multipoint tightly-coupled conferencing, i.e., conferences with
access control, optional support for conference chairs, and close synchronisation of conference
state among all participants from the outset, through the concept of a Multipoint Controller
and an optional Multipoint Processor.While control is centralised in the MC, in theory, data
exchange may be either via IP multicast, multi-unicast (i.e., peer-wise fan-out between
endpoints without MP), or through an MP. (There seems to be practically no H.323 equipment
supporting media multicast.) The distribution mode may be selected per media and per
endpoint peer and is controlled by the MC;
- Broadcast conferencing;
H.323 also provides an interface to support large loosely-coupled conferences as are frequently
used in the Mbone to multicast seminars, events, etc. In this case, the MC defines a session
description (using the Session Description Protocol, SDP, see below) for the H.323 media
sessions (which have to operate using multicast) and announces this description by some means
(e.g., the Session Announcement Protocol, SAP). Details are defined in ITU-T H.332.
- Supplementary services
H.323 provides a variety of supplementary services with additional ones continuously being
defined.While some services can be accomplished using the basic H.323 specifications, the H.450.x
Recommendations defines a framework (derived from QSIG, the ECMA/ISO/ETSI standard
for supplementary service signalling in PBXs) and a number of services (call transfer, call di-
version, call hold, call park & pickup, call waiting, message waiting indication and call completion).
Further extensions to supplementary services and other functional enhancements are on the way.
In particular, an HTTP-based extension framework is being defined at the time of writing to
enable rapid introduction of new services without the need for standardisation.
{ 2.2.1.9 H.235 Security
The H.235 recommendation defines elements of security for H.323:
- Authentication
Authentication can be achieved by using a shared secret (password) or digital signatures.The
RAS messages include a token that was generated using either the shared secret or the
signature. A receiving entity authenticates the sender by comparing the received token with a
self-generated token;
- Message Integrity
Integrity is achieved by generating password-based checks on the message;
Privacy Mechanisms are provided to setup encryption on the media streams.They must be used in
conjunction with the H.245 protocol and employ DES,Triple DES or RC2.The use of SRTP is
not supported yet (in H.235v2).
P.29
[IP Telephony Cookbook] / Technological Background
These mechanisms are grouped into the Security Profiles, where the Baseline Security Profile
provides authentication and message integrity, making it suitable for subscription-based
environments and the Voice Encryption Profile that provides confidential end-to-end media
channels.
{ 2.2.1.10 Protocol Profiles
H.323 has its origin, as mentioned before, in the area of multimedia conferencing.This implies
that a vast number of options are available, which are not necessary for simply providing
telephony services.The TIPHON project of the European Telecommunication Standards Institute
(ETSI) has defined a telephony profile for H.323 that specifies which combination of options
should be implemented.
Similarly, H.323 contains a security framework (H.235) that describes a collection of algorithms
and protocol mechanisms but lacks, because of international political constraints, a precise
specification of a mandatory baseline.This is accounted for by the ETSI TIPHON security
profile: this specification fills in the gaps and provides the foundation for inter-operable
implementations.
In summary, it can be said that the H.323 family of standards provides a mature basis for
commercial products in the field of IP Telephony.While the details of the protocol are often
dominated by their legacy from various earlier ITU protocols, there is an active effort to profile
and simplify the protocol to reduce the complexity.
{ 2.2.2 SIP
{ 2.2.2.1 The purpose of SIP
SIP stands for Session Initiation Protocol. It is an application-layer control protocol that has been
developed and designed within the IETF.The protocol has been designed with easy
implementation, good scalability, and flexibility in mind.
The specification is available in form of several RFCs.The most important one is RFC3261,
which contains the core protocol specification.The protocol is used for creating, modifying and
terminating sessions with one or more participants. By sessions, we understand a set of senders
and receivers that communicate and the state kept in those senders and receivers during the
communication. Examples of a session can include Internet telephone calls, distribution of
multimedia, multimedia conferences, distributed computer games, etc.
SIP is not the only protocol that the communicating devices will need. It is not meant to be a
general purpose protocol.The purpose of SIP is just to make the communication possible.The
communication itself must be achieved by other means (and possibly another protocol).Two
protocols that are most often used along with SIP are RTP and SDP.The RTP protocol is used to
carry the real-time multimedia data (including audio, video and text).The protocol makes it
possible to encode and split the data into packets and transport these packets over the Internet.
Another important protocol is SDP, Session Description Protocol, which is used to describe and
P.30
[IP Telephony Cookbook] / Technological Background
encode capabilities of session participants. Such a description is then used to negotiate the
characteristics of the session so that all of the devices can participate, including, for example,
negotiation of codecs used to encode media so all the participants will be able to decode it,
negotiation of transport protocol used and so on.
SIP has been designed in conformance with the Internet model. It is an end-to-end
-oriented signalling protocol which means that all the logic is stored in end-devices (except
routing of SIP messages). State is also stored only in end-devices.There is no single point of
failure and networks designed this way scale well.The price we have to pay for the
`distributiveness' and scalability is higher message overhead, caused by the messages being sent
end-to-end.
It is worth mentioning that the end-to-end concept of SIP is a significant divergence from a
regular PSTN (Public Switched Telephone Network) where all the state and logic is stored in the
network and the end-devices (telephones) are very primitive.The aim of SIP is to provide the
same functionality that the traditional PSTNs have, but the end-to-end design makes SIP
networks much more powerful and open to the implementation of new services that can hardly
be implemented in the traditional PSTNs.
SIP is based on HTTP protocol.The HTTP protocol inherited format of message headers from
RFC822. HTTP and is probably the most successful and widely used protocol in the Internet.
SIP tries to combine the best of both. In fact, HTTP can be classified as a signalling protocol too,
because user-agents use the protocol to tell an HTTP server which documents they are interested
in. SIP is used to carry the description of session parameters.The description is encoded into a
document using SDP. Both protocols (HTTP and SIP) have inherited the encoding of message
headers from RFC822.The encoding has proven to be robust and flexible over the years.
2.2.2.1.1 SIP URI
SIP entities are identified using SIP URI (Uniform Resource Identifier). A SIP URI has the form
of sip:username@domain, or sip:joe@company.com. SIP URI consists of a username part and
a domain name part, delimited by the @ (at) character. SIP URIs are similar to e-mail addresses
and it is, for instance, possible to use the same URI for e-mail and SIP communication. Such
URIs are easy to remember.
{ 2.2.2.2 SIP network elements
Although, in the simplest configuration, it is possible to use just two user agents that send SIP
messages directly to each other, a typical SIP network will contain more than one type of SIP
element. Basic SIP elements are user agents, proxies, registrars and redirect servers.They are
described briefly in this section.
Note that the elements, as presented in this section, are often only logical entities. It is often
profitable to co-locate them, for instance, to increase the speed of processing, but that depends on
the particular implementation and configuration.
P.31
img
[IP Telephony Cookbook] / Technological Background
2.2.2.2.1. User agents
Internet endpoints that use SIP to find eachother and to negotiate a session's characteristics are
called user agents. User agents usually, but not necessarily, reside on a user's computer in form of
an application.This is currently the most widely-used approach, but user agents can be also
cellular phones, PSTN gateways, PDAs, automated IVR systems and so on.
User agents are often referred to as User Agent Server (UAS) and User Agent Client (UAC). UAS
and UAC are logical entities and each user agent contains a UAC and UAS. UAC is the part of
the user agent that sends requests and receives responses. UAS is the part of the user agent that
receives requests and sends responses.
Because a user agent contains both UAC and UAS, user agents behave like a UAC or a UAS. For
instance, a calling party's user agent behaves like UAC when it sends an INVITE request and
receives responses to the request. A called party's user agent behaves like a UAS when it receives
the INVITE and sends responses.
But this situation changes when the called party decides to send a BYE and terminate the session.
In this case the called party's user agent (sending BYE) behaves like UAC and the calling party's
user agent behaves like UAS.
Called Party
UAC
Stateful Forking Proxy
Calling Party
UAS
INVITE
UAC
INVITE
UAC
UAS
Called Party
UAS
INVITE
UAC
UAS
BYE
UAC
Figure 2.11 UAC and UAS
Figure 2.11 shows three user agents and one stateful forking proxy. Each user agent contains UAC
and UAS.The part of the proxy that receives the INVITE from the calling party, in fact, acts as a
UAS.When forwarding the request statefully, the proxy creates two UACs, each of them
responsible for one branch.
In the example, called party B picked up and later, when he wants to tear down the call, he sends
a BYE. At this time, the user agent that was previously UAS becomes a UAC and vice versa.
P.32
[IP Telephony Cookbook] / Technological Background
2.2.2.2.2 Proxy servers
SIP allows the creation of an infrastructure of network hosts called proxy servers. User agents
can send messages to a proxy server. Proxy servers are very important entities in the SIP
infrastructure.They perform routing of a session invitations according to invitee's current
location, authentication, accounting and many other important functions.
The most important task of a proxy server is to route session invitations `closer' to a called party.
The session invitation will usually traverse a set of proxies until it finds one which knows the
actual location of the called party. Such a proxy will forward the session invitation directly to the
called party and the called party will then accept or decline the session invitation.
There are two basic types of SIP Proxy Servers, stateless and stateful.
2.2.2.2.2.1 Stateless servers
Stateless servers are simple message forwarders.They forward messages independently of
eachother. Although messages are usually arranged into transactions (see Section 2.2.2.4).
Stateless proxies do not take care of transactions.
Stateless proxies are simple, but faster than stateful proxy servers.They can be used as simple load
balancers, message translators and routers. One of drawbacks of stateless proxies is that they are
unable to absorb re-transmissions of messages or perform more advanced routing, for instance,
forking or recursive traversal.
2.2.2.2.2.2 Stateful servers
Stateful proxies are more complex. Upon reception of a request, stateful proxies create a state and
keep the state until the transaction finishes. Some transactions, especially those created by
INVITE, can last quite long (until the called party picks up or declines the call). Because stateful
proxies must maintain the state for the duration of the transactions, their performance is limited.
The ability to associate SIP messages into transactions gives stateful proxies some interesting
features. Stateful proxies can perform forking; that means that upon reception of a message, two or
more messages will be sent out.
Stateful proxies can absorb re-transmissions because they know from the transaction state if they
have already received the same message (stateless proxies cannot do the check because they keep
no state).
Stateful proxies can perform more complicated methods of finding a user. It is, for instance,
possible to try to reach user's office phone and when he does not pick up, redirect the call to his
cell phone. Stateless proxies cannot do this because they have no way of knowing how the
transaction targeted to the office phone finished.
Most SIP Proxies today are stateful because their configuration is usually very complex.They
often perform accounting, forking and some sort of NAT traversal aid and all those features
require a stateful proxy.
P.33
img
[IP Telephony Cookbook] / Technological Background
2.2.2.2.2.3 Proxy server usage
In a typical configuration, each centrally-administered entity (a company, for instance) has its own
SIP Proxy Server, which is used by all user agents in the entity. Suppose that there are two
companies, A and B, and each of them has its own proxy server. Figure 2.12 shows how a session
invitation from employee Joe in company A will reach employee Bob in company B.
DNS Server
2. SIP SRV
for b.com
Company A
Company B
3. proxy.b.com
proxy.a.com
Joe
proxy.b.com
4. INVITE
1. INVITE
5. INVITE
5.6.7.8
Bob
6. BYE
1.2.3.4
Figure 2.12 Session invitation
User Joe uses address sip:bob@b.com to call Bob. Joe's user agent does not know how to route
the invitation itself but it is configured to send all outbound traffic to the company SIP Proxy
Server proxy.a.com.The proxy server figures out that user sip:bob@b.com is in a different
company so it will look up B's SIP Proxy Server and send the invitation there. B's proxy server
can be either pre-configured at proxy.a.com or the proxy will use DNS SRV records to find B's
proxy server.The invitation reaches proxy.bo.com.The proxy knows that Bob is currently sitting
in his office and is reachable through phone on his desk, which has IP address 1.2.3.4, so the
proxy will send the invitation there.
2.2.2.2.3 Registrar
Its has been mentioned that the SIP Proxy at proxy.b.com knows current Bob's location but
have not mentioned yet how a proxy can learn current location of a user. Bob's user agent (SIP
phone) must register with a registrar.The registrar is a special SIP entity that receives registrations
from users, extracts information about their current location (IP address, port and username in
this case) and stores the information into a location database.The purpose of the location database
is to map sip:bob@b.com to something like sip:bob@1.2.3.4:5060.The location database is
then used by B's proxy server.When the proxy receives an invitation for sip:bob@b.com it will
search the location database. It finds sip:bob@1.2.3.4:5060 and will send the invitation there.
A registrar is very often a logical entity only. Because of their tight coupling with proxies,
registrars are usually co-located with proxy servers.
P.34
img
[IP Telephony Cookbook] / Technological Background
Figure 2.13 shows a typical SIP registration. A REGISTER message containing Address of
Record sip:jan@iptel.org and contact address sip:jan@1.2.3.4:5060 where 1.2.3.4 is IP
address of the phone is sent to the registrar.The registrar extracts this information and stores it
into the location database. If everything went well then the registrar sends a 200 OK response to
the phone and the process of registration is finished.
Location Database
Record in Location Database
User Agent
Registrar
Location Database
User sip:jan@iptel.org is
reachable at sip:jan@1.2.3.4:5060
REGISTER
Store Location
2. STORE
200 OK
1. REGISTER
sip:jan@iptel.org
1.2.3.4:5060
3. 200 OK
Registrar
Figure 2.13 Overview of Registrar
Each registration has a limited life span.The expires header field or the expires parameter of the
contact header field determines for how long the registration is valid.The user agent must refresh
the registration within the life span. Otherwise it will expire and the user will become
unavailable.
2.2.2.2.4 Redirect server
The entity that receives a request and sends back a reply containing a list of the current location
of a particular user is called redirect server. A redirect server receives requests and looks up the
intended recipient of the request in the location database, created by a registrar. It then creates a
list of current locations of the user and sends it to the request originator in a response within SIP
3xx redirection responses class.
The originator of the request then extracts the list of destinations and sends another request
directly to them. Figure 2.14 shows a typical redirection.
P.35
img
[IP Telephony Cookbook] / Technological Background
Redirect Server
INVITE #1
302 Moved Temporarily
INVITE #2
User Agent A
User Agent B
Figure 2.14 SIP Redirection
{ 2.2.2.3 SIP messages
Communication using SIP (often called signalling) is comprised of a series of messages. Messages
can be transported independently by the network. Usually they are each transported in a separate
UDP datagram. Each message consists of a `first line', a message header and a message body.The
first line identifies type of the message.There are two types of messages: requests and responses.
Requests are usually used to initiate some action or inform the recipient of the request of
something. Replies are used to confirm that a request was received and processed and contain the
status of the processing.
A typical SIP request looks like this:
INVITE sip:7170@iptel.org SIP/2.0
Via: SIP/2.0/UDP 195.37.77.100:5040;rport
Max-Forwards: 10
From: "jiri" <sip:jiri@iptel.org>;tag=76ff7a07-c091-4192-84a0-
d56e91fe104f
To: <sip:jiri@bat.iptel.org>
Call-ID: d10815e0-bf17-4afa-8412-d9130a793d96@213.20.128.35
CSeq: 2 INVITE
Contact: <sip:213.20.128.35:9315>
User-Agent: Windows RTC/1.0
Proxy-Authorisation: Digest username="jiri", realm="iptel.org",
algorithm="MD5", uri="sip:jiri@bat.iptel.org",
nonce="3cef753900000001771328f5ae1b8b7f0d742da1feb5753c",
response="53fe98db10e1074
b03b3e06438bda70f"
Content-Type: application/sdp
Content-Length: 451
v=0
o=jku2 0 0 IN IP4 213.20.128.35
s=session
P.36
[IP Telephony Cookbook] / Technological Background
c=IN IP4 213.20.128.35
b=CT:1000
t=0 0
m=audio 54742 RTP/AVP 97 111 112 6 0 8 4 5 3 101
a=rtpmap:97 red/8000
a=rtpmap:111 SIREN/16000
a=fmtp:111 bitrate=16000
a=rtpmap:112 G7221/16000
a=fmtp:112 bitrate=24000
a=rtpmap:6 DVI4/16000
a=rtpmap:0 PCMU/8000
a=rtpmap:4 G723/8000
a=rtpmap: 3 GSM/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
The first line tells us that this is an INVITE message which is used to establish a session.The URI
on the first line, sip:7170@iptel.org is called Request URI and contains the URI of the next
hop of the message. In this case, it will be host iptel.org.
A SIP request can contain one or more Via header fields which are used to record path of the
request.They are later used to route SIP responses exactly the same way.The INVITE message
contains just one Via header field which was created by the user agent that sent the request. From
the Via field we can tell that the user agent is running on host 195.37.77.100 and port 5060.
The From and To header fields identify initiator (calling party) and recipient (called party) of the
invitation (just like in SMTP where they identify sender and recipient of a message).
The From header field contains a tag parameter which serves as a dialogue identifier and will be
described in Section 2.2.2.5.
The Call-ID header field is a dialogue identifier and its purpose is to identify messages belonging
to the same call. Such messages have the same Call-ID identifier. CSeq is used to maintain order
of requests. Because requests can be sent over an unreliable transport that can re-order messages,
sequence numbers must be present in the messages so that recipient can identify re-transmissions
and out-of-order requests.
The Contact header field contains the IP address and port on which the sender is awaiting
further requests sent by called party. Other header fields are not important and will be not
described here.
The Message header is delimited from message body by an empty line.The Message body of
the INVITE request contains a description of the media type accepted by the sender and encoded
in SDP.
2.2.2.3.1. SIP requests
An INVITE request has been described.The request is used to invite a called party to a session.
Other important requests are:
P.37
[IP Telephony Cookbook] / Technological Background
- ACK
This message acknowledges receipt of a final response to INVITE. Establishing of a session
utilises 3-way hand-shaking due to asymmetric nature of the invitation. It may take a while
before the called party accepts or declines the call so the called party's user agent periodically
re-transmits a positive final response until it receives an ACK (which indicates that the calling
party is still there and ready to communicate);
- BYE
BYE messages are used to tear down multimedia sessions. A party wishing to tear down a
session sends a BYE to the other party;
- CANCEL
CANCEL is used to cancel a not yet fully-established session. It is used when the called party
has not replied with a final response yet but the calling party wants to abort the call (typically
when a called party does not respond for some time);
- REGISTER
The purpose of REGISTER is to let the registrar know of current user's location. Information
about the current IP address and port on which a user can be reached is carried in REGISTER
messages. Registrar extracts this information and puts it into a location database.The database
can be later used by SIP Proxy Servers to route calls to the user. Registrations are time-limited
and need to be periodically refreshed.
The listed requests usually have no message body because it is not needed in most situations (but
can have one). In addition, many other request-types have been defined but their descriptions are
out of the scope of this document.
2.2.2.3.2 SIP responses
When a user agent or proxy server receives a request, it sends a reply. Each request must be replied
to except ACK requests which trigger no replies.
A typical reply looks like this:
SIP/2.0 200 OK
Via: SIP/2.0/UDP 192.168.1.30:5060;received=66.87.48.68
From: sip:sip2@iptel.org
To: sip:sip2@iptel.org;tag=794fe65c16edfdf45da4fc39a5d2867c.b713
Call-ID: 2443936363@192.168.1.30
CSeq: 63629 REGISTER
Contact: <sip:sip2@66.87.48.68:5060;transport=udp>;q=0.00;expires=120
Server: Sip EXpress router (0.8.11pre21xrc (i386/linux))
Content-Length: 0
Warning: 392 195.37.77.101:5060 "Noisy feedback tells:
pid=5110 req_src_ip=66.87.48.68 req_src_port=5060
in_uri=sip:iptel.org
out_uri=sip:iptel.org via_cnt==1"
Responses are very similar to the requests, except for the first line.The first line of response
contains a protocol version (SIP/2.0) reply code and reason phrase.The reply code is an integer
number from 100 to 699 and indicates type of the response.There are 6 classes of responses:
P.38
[IP Telephony Cookbook] / Technological Background
1xx are provisional responses. A provisional response is a response that tells to its recipient that
the associated request was received but the result of the processing is not known yet. Provisional
responses are sent only when the processing does not finish immediately.The sender must stop
re-transmitting the request upon reception of a provisional response.
Typically, proxy servers send responses with code 100 when they start processing an INVITE and
user agents send responses with code 180 (Ringing) which means that the called party's phone is
ringing.
2xx responses are positive final responses. A final response is the ultimate response that the
originator of the request will ever receive.Therefore, final responses express the result of the
processing of the associated request. Final responses also terminate transactions. Responses with
code from 200 to 299 are positive responses.That means that the request was processed
successfully and accepted. For instance, a 200 OK response is sent when a user accepts the
invitation to a session (INVITE request).
A UAC may receive several 200 messages to a single INVITE request.This is because a forking
proxy (described later) can fork the request so it will reach several UAS and each of them will
accept the invitation. In this case, each response is distinguished by the tag parameter in the To
header field. Each response represents a distinct dialogue with an unambiguous dialogue identifier:
- 3xx responses are used to redirect a calling party. A redirection response gives information about
the user's new location or an alternative service that the calling party might use to satisfy the
call. Redirection responses are usually sent by proxy servers.When a proxy receives a request
and does not want or can't process it for any reason, it will send a redirection response to the
calling party and put another location into the response which the calling party might want to
try. It can be the location of another proxy or the current location of the called party (from the
location database created by a registrar).The calling party is then supposed to re-send the
request to the new location. 3xx responses are final;
- 4xx are negative final responses. A 4xx response means that the problem is on the sender's side.
The request could not be processed because it contains bad syntax or cannot be fulfilled at that
server.
- 5xx means that the problem is on server's side.The request is apparently valid but the server
failed to fulfil it. Clients should usually retry the request later;
- 6xx reply code means that the request cannot be fulfilled at any server.This response is usually
sent by a server that has definitive information about a particular user. User agents usually send
a 603 Decline response when the user does not want to participate in the session.
In addition to the response class, the first line also contains the reason phrase.The code number is
intended to be processed by machines. It is not very human-friendly but it is very easy to parse
and understand by machines.The reason phrase usually contains a human-readable message
describing the result of the processing. A user agent should render the reason phrase to the user.
The request to which a particular response belongs is identified using the CSeq header field. In
addition to the sequence number, this header field also contains the method of corresponding
request. In our example it was a REGISTER request.
P.39
[IP Telephony Cookbook] / Technological Background
{ 2.2.2.4. SIP transactions
Although we said that SIP messages are sent independently over the network, they are usually
arranged into transactions by user agents and certain types of proxy servers.Therefore SIP is said
to be a transactional protocol.
A transaction is a sequence of SIP messages exchanged between SIP network elements. A
transaction consists of one request and all responses to that request.That includes zero or more
provisional responses and one or more final responses (remember that an INVITE might be
answered by more than one final response when a proxy server forks the request).
If a transaction was initiated by an INVITE request, then the same transaction also includes ACK,
but only if the final response was not a 2xx response. If the final response was a 2xx response, then
the ACK is not considered part of the transaction.
As we can see, this is quite asymmetric behaviour, ACK is part of transactions with a negative final
response but is not part of transactions with positive final responses.The reason for this separation
is the importance of delivery of all 200 OK messages. Not only do they establish a session, but
also 200 OK can be generated by multiple entities when a proxy server forks the request and all
of them must be delivered to the calling user agent.Therefore, user agents take responsibility in
this case and retransmit 200 OK responses until they receive an ACK. Also note that only
responses to INVITE are retransmitted.
SIP entities that have a notion of transactions are called stateful. Such entities usually create a state
associated with a transaction that is kept in the memory for the duration of the transaction.When
a request or response comes, a stateful entity tries to associate the request (or response) to existing
transactions.To be able to do this, it must extract a unique transaction identifier from the message
and compare it to identifiers of all existing transactions. If such a transaction exists, then its state
gets updated from the message.
In the previous SIP RFC2543, the transaction identifier was calculated as hash of all important
message header fields (that included To, From, Request-URI and CSeq).This proved to be very
slow and complex. During interoperability tests, such transaction identifiers were a common
source of problems.
In the new RFC3261, the way of calculating transaction identifiers was completely changed.
Instead of the complicated hashing of important header fields, a SIP message now includes the
identifier directly.The branch parameter of Via header fields directly contains the transaction
identifier.This is a significant simplification, but there still exist old implementations that do not
support the new way of calculating of the transaction identifier, so even new implementations
have to support the old way.They must be backwards-compatible.
Figure 2.15 shows what messages belong to what transactions during a conversation of two user
agents.
P.40
img
[IP Telephony Cookbook] / Technological Background
Called party
Calling party
INVITE
100 Trying
Transaction #1
180 Ringing
200 OK
ACK
BYE
200 OK
Transaction #2
Figure 2.15 SIP transactions
{ 2.2.2.5 SIP Dialogues
It has been shown what transactions are, that one transaction includes INVITE and its responses
and another transaction includes BYE and its responses when a session is being torn down.Those
two transactions should be somehow related-both of them belong to the same dialogue. A
dialogue represents a peer-to-peer SIP relationship between two user agents. A dialogue persists
for some time and it is very important concept for user agents. Dialogues facilitate the proper
sequencing and routing of messages between SIP endpoints.
Dialogues are identified using Call-ID, From tag, and To tag. Messages that belong to the same
dialogue must have these fields equal.We have shown that CSeq header field is used to order
messages. In fact, it is used to order messages within a dialogue.The number must be
monotonically increased for each message sent within a dialogue. Otherwise the peer will handle
it as an out-of-order request or retransmission. In fact, the CSeq number identifies a transaction
within a dialogue, because we have said that requests and associated responses are called
transactions.This means that only one transaction in each direction can be active within a
dialogue. One could also say that a dialogue is a sequence of transactions. Figure 2.16 extends
Figure 2.15 to show which messages belong to the same dialogue.
P.41
img
[IP Telephony Cookbook] / Technological Background
Called party
Calling party
INVITE
100 Trying
Transaction #1
180 Ringing
200 OK
Dialog
ACK
BYE
200 OK
Transaction #2
Figure 2.16 SIP dialogue
Some messages establish a dialogue and some do not.This is used to explicitly express the relation-
ship of messages and also to send messages that are not related to other messages outside a dialogue.
That is easier to implement because user agents do not have to maintain the dialogue state.
For instance, an INVITE message establishes a dialogue, because it will later be followed by a
BYE request, which will tear down the session established by the INVITE.This BYE is sent
within the dialogue established by the INVITE.
But, if a user agent sends a MESSAGE request, such a request does not establish any dialogue. Any
subsequent messages (even MESSAGE) will be sent independently of the previous one.
2.2.2.5.1. Dialogues facilitate routing
Dialogues are also used to route the messages between user agents, as described briefly.
Suppose that user sip:bob@a.com wants to talk to user sip:pete@b.com. He knows the SIP
address of the called party (sip:pete@b.com) but this address does not say anything about
current location of the user, i.e., the calling party does not know to which host to send the
request.Therefore, the INVITE request will be sent to a proxy server.
The request will be sent from proxy to proxy until it reaches one that knows the current location
of the called party.This process is called routing. Once the request reaches the called party,
the called party's user agent will create a response that will be sent back to the calling party.
The called party's user agent will also put a contact header field into the response which will
contain the current location of the user.The original request also contained a contact header field
which means that both user agents know the current location of the peer.
P.42
img
[IP Telephony Cookbook] / Technological Background
Because the user agents know the location of each other, it is not necessary to send further
requests to any proxy.They can be sent directly from user agent to user agent.That is exactly how
dialogues facilitate routing.
Further messages within a dialogue are sent directly from user agent to user agent.This is a
significant performance improvement because proxies do not see all the messages within a
dialogue.They are used to route just the first request that establishes the dialogue.The direct
messages are also delivered with much smaller latency because a typical proxy usually implements
complex routing logic. Figure 2.17 contains an example of a message within a dialogue (BYE)
that bypasses the proxies.
Proxy 1
Proxy 2
INVITE
INVITE
INVITE
BYE
Calling party
Called party
Figure 2.17 SIP trapezoid
2.2.2.5.2 Dialogue identifiers
Dialogue identifiers consist of three parts, Call-Id, From tag and To tag, but it is not that clear
why dialogue identifiers are created exactly this way and who contributes which part.
Call-ID is called call identifier. It must be a unique string that identifies a call. A call consists of
one or more dialogues. Multiple user agents may respond to a request when a proxy along the
path forks the request. Each user agent that sends a 2xx response, establishes a separate dialogue
with the calling party. All such dialogues are part of the same call and have the same Call-ID.
A From tag is generated by the calling party and it uniquely identifies the dialogue in the calling
party's user agent.
A To tag is generated by a called party and uniquely identifies it, just like the From tag is the
dialogue in the called party's user agent.
This hierarchical dialogue identifier is necessary because a single call-invitation can create several
dialogues and the calling party must be able to distinguish them.
{ 2.2.2.6 Typical SIP scenarios
This section gives a brief overview of typical SIP scenarios that usually make up the SIP traffic.
P.43
img
[IP Telephony Cookbook] / Technological Background
2.2.2.6.1 Registration
Users must register themselves with a registrar to be reachable by other users. A registration
comprises a REGISTER message followed by a 200 OK sent by the registrar if the registration
was successful. Registrations are usually authorised so a 407 reply which can appear if the user did
not provide valid credentials. Figure 2.18 shows an example of a registration.
Registrar
User Agent
REGISTER
w/o credentials
407
REGISTER
w/ credentials
200 OK
Figure 2.18 REGISTER message flow
2.2.2.6.2 Session invitation
A session invitation consists of one INVITE request which is usually sent to a proxy.The proxy
sends immediately a 100 Trying reply to stop re-transmissions and forwards the request further.
All provisional responses generated by the called party are sent back to the calling party. See teh
180 Ringing response in the call flow.The response is generated when the called party's phone
starts ringing.
A 200 OK is generated once the called party picks up the phone and it is re-transmitted by the
called party's user agent until it receives an ACK from the calling party.The session is established
at this point.
2.2.2.6.3 Session termination
Session termination is accomplished by sending a BYE request within the dialogue established by
INVITE. BYE messages are sent directly from one user agent to the other, unless a proxy on the
path of the INVITE request has indicated that it wishes to stay on the path by using record
routing (see Section 2.2.2.6.4).
A party wishing to tear down a session sends a BYE request to the other party involved in the
session.The other party sends a 200 OK response to confirm the BYE and the session is
terminated. See Figure 2.20, left message flow.
P.44
img
[IP Telephony Cookbook] / Technological Background
Calling party
SIP Proxy
Called party
INVITE
100 Trying
INVITE
100 Trying
180 Ringing
180 Ringing
200 OK
200 OK
ACK
RTP Streams
Figure 2.19 INVITE message flow
2.2.2.6.4 Record routing
All requests sent within a dialogue are, by default, sent directly from one user agent to the other.
Only requests outside a dialogue traverse SIP proxies.This approach makes a SIP network more
scalable because only a small number of SIP messages hit the proxies.
There are certain situations in which a SIP Proxy needs to stay on the path of all further
messages. For instance, proxies controlling a NAT box, or proxies doing accounting need to stay
on the path of BYE requests.
The mechanism by which a proxy can inform user agents that it wishes to stay on the path of all
further messages is called record routing. Such a proxy would insert a Record-Route header
field into SIP messages which contain address of the proxy. Messages sent within a dialogue will
then traverse all SIP proxies that put a Record-Route header field into the message.
The recipient of the request receives a set of Record-Route header fields in the message. It must
mirror all the Record-Route header fields into responses because the originator of the request
also needs to know the set of proxies.
P.45
img
[IP Telephony Cookbook] / Technological Background
Without record routing
With record routing
SIP Proxy
UA1
SIP Proxy
UA1
UA2
UA2
BYE
BYE
200 OK
BYE
200 OK
200 OK
Figure 2.20 BYE message flow (with and without record routing)
The lefthand message flow of Figure 2.20 shows how a BYE (request within dialogue established
by INVITE) is sent directly to the other user agent when there is no Record-Route header field
in the message.The righthand message flow shows how the situation changes when the proxy
puts a Record-Route header field into the message.
2.2.2.6.5 Event subscription and notification
The SIP specification has been extended to support a general mechanism allowing subscription to
events. Such evens can include, SIP Proxy statistics changes to, presence information, session
changes and so on.
The mechanism is used mainly to convey information on presence (the willingness to
communicate) of users. Figure 2.21 shows the basic message flow.
Server
User Agent
SUBSCRIBE
200 OK
NOTIFY
200 OK
Event
NOTIFY
200 OK
Figure 2.21 Event subscription and notification
P.46
img
[IP Telephony Cookbook] / Technological Background
A user agent interested in an event notification sends a SUBSCRIBE message to a SIP server.The
SUBSCRIBE message establishes a dialogue and is immediately replied to by the server using a
200 OK response. At this point, the dialogue is established.The server sends a NOTIFY request to
the user every time the event to which the user subscribed changes. NOTIFY messages are sent
within the dialogue established by the SUBSCRIBE.
Note that the first NOTIFY message in Figure 2.21 is sent regardless of any event that triggers
notifications.
Subscriptions, as well as registrations, have a limited life span and therefore must be periodically
refreshed.
2.2.2.6.6 Instant messages
Instant messages are sent using a MESSAGE request. MESSAGE requests do not establish a
dialogue and therefore they will always traverse the same set of proxies.This is the simplest form
of sending instant messages.The text of the instant message is transported in the body of the SIP
request.
User Agent
Proxy
User Agent
MESSAGE
MESSAGE
200 OK
200 OK
MESSAGE
MESSAGE
200 OK
200 OK
Figure 2.22 Instant Messages
{ 2.2.3. Media Gateway Control Protocols
In a traditional telephone network, the infrastructure consists of large telephone switches which
interconnect with each other to create the backbone network and which also connect to
customer equipment (PBXs, telephones).While the internal network today is based upon digital
communication, links to customers may be either analogue (PSTN) or digital (ISDN).The links
to customers are shared between call signalling (for dialling, invocation of supplementary services,
etc.) and carriage of voice/data. In the backbone, dedicated (virtual) links interconnecting
switches are reserved for call signalling (de-facto creation of a dedicated network of its own)
whereas voice/data traffic is carried on separate links.The Signalling System No. 7 (SS7) or
P.47
img
[IP Telephony Cookbook] / Technological Background
variants of it are used as the call signalling protocol between switches; this protocol is used to
route voice/data channels across the backbone network by instructing each switch on the way
which incoming `line' is to be forwarded to which outgoing `line' and which other processing
(such as simple voice compression, in-band signalling detection to customer premise equipment,
etc.) is to be applied.Voice/data channels themselves are plain bit pipes identified by roughly a
trunk and line identifier at each switch.
Figure 2.23 Application scenario for Media Gateway Control Protocols
A similar construction is now considered by a number of telecom companies for IP-based
backbone networks that may successively replace parts of their overall switched-network
infrastructure, as depicted in Figure 3.7. Instead of voice switches, IP routers are used to build up a
backbone network which employs IP routing, possibly MPLS, and, most likely, some explicit form
of QoS support to carry voice and data packets from any point in the network to any other. In
contrast to voice switches, this does not require explicit configuration of the individual routers
per voice connection. Instead, only the entry and exit points need to be configured with each
others' addresses, so that they know where to send their voice/data packets.Two types of gateways
are used at the edges of the IP network to connect to the conventional telephone network:
signalling gateways to convert SS7 signalling into IP-based call control (which may make use of
H.323 or SIP or simply provide a transport to carry SS7 signalling in IP packets [SIGTRAN])
and media gateways that perform voice transcoding. Some central entity (or more probably, a
number of co-operating entities) forms the intelligent core of the backbone, the Media Gateway
Controller(s).They interpret call signalling and decide how to route calls and they provide
supplementary services, etc. Having decided on how a call is to be established, they inform the
(largely passive and `dumb') media gateways at the edges (ingress and egress gateways) how and
where to transmit the voice packets.The Media Gateway Controllers also re-configure the
gateways in case of any changes in the call, invocation of supplementary services, etc.The media
gateways may be capable of detecting invocation of control features in the media channel (e.g.,
through DTMF tones) and notify the Media Gateway Controller(s), which then initiate the
appropriate actions.
A number of protocols have been defined for communication between Media Gateway
Controllers and media gateways. Initial versions were developed by multiple camps, some of
which merged to create the Media Gateway Control Protocol (MGCP), the only one of the
proprietary protocols that is documented as an Informational RFC (RFC 2705). An effort was
launched to make the two remaining camps cooperate and develop a single protocol to be
standardised, which resulted in work groups in the ITU-T (rooted in Study Group 16, Q.14) and
P.48
[IP Telephony Cookbook] / Technological Background
in the IETF (Media Gateway Control, MEGACO WG).The protocol being jointly developed is
referred to as H.248 in the ITU-T and as MEGACO in the IETF.
One particular protocol extension currently discussed in the IETF is the definition of a protocol
for communication with an IP telephone at the customer premises that fits seamlessly with the
Media Gateway Control architecture. Such a telephone would be a rather simple entity, essentially
capable of transmitting and receiving events and reacting to them, while the call services are
provided directly by the network infrastructure.
{ 2.2.4 Proprietary signalling protocols
Today nearly every vendor that offers VoIP products uses his own VoIP protocol, e.g., Cisco's
Skinny or Siemens's CorNet.They were invented by the vendors to be able to provide more
specific supplementary services in the Voice over IP world, in order to offer customers all the
features they already know from their classic PBX.The enterprise solutions usually feature such
proprietary protocols at the cure and provide minimalist support for standardised protocols (until
now usually H.323) with only basic call functionality.
Giving detailed information about those protocols is out of the scope of this document and is
usually difficult to provide because most protocols are not publicly available.
{ 2.2.5. Real Time Protocol (RTP) and Real Time Control Protocol (RTCP)
RTP and RTCP are the transport protocols used for IP Telephony media streams. Both of them
were defined in RFC1889: the former as a protocol to carry data that has real-time properties, the
latter to monitor the quality of service and to convey information about the participants in on-
going session.The services provided by the RTP protocol are:
- identification of the carried information (audio and video codecs);
- checking packet in-order delivery and, if necessary, re-ordering the out-of-sequence blocks;
- transport of the coder/decoder synchronisation information;
- monitoring of the information delivery.
The RTP protocol uses the underlying User Datagram Protocol (UDP) to manage multiple
connections between two entities and to check for data integrity (checksum). An important
point to stress is that RTP neither provides any means to have a guaranteed QoS nor assumes the
underlying network delivers ordered packets.
The RTCP protocol uses the same protocols as RTP to periodically send control packets to all
session participants. Every RTP channel using port number N has its own RTCP protocol channel
with port number equal to N+1.The services provided by the RTCP are:
- giving a feedback on the data quality distribution, feedback used to keep control of the active
codecs;
- transporting a constant identifier for the RTP source (CNAME), used by the video data;
- advertising the number of session participants which is used to adjust the RTP data transmission
rate;
- carrying session control information used to identify the session participants.
P.49
img
[IP Telephony Cookbook] / Technological Background
The next two subsections describe the RTP and RTCP header and the different types of packets
that the two protocols use.
{ 2.2.5.1 RTP header
Figure 2.24 shows the RTP header.The first twelve bytes are present in all of the RTP packets.
The last bytes, containing the CSRC (Contributing SouRCe) identifiers list, is present only when
a mixer is crossed (mixer refers to a system which receives two or more RTP flows, combines
them and forwards the resulting flow).
Figure 2.24 RTP header
The header fields are here detailed:
- version (V - 2 bits) contains the RTP protocol version;
- padding (P - 1 bit), if set to 1, then the packet contains one or more additional bytes after the
data field;
- extension (X - 1 bit), if set to 1, then the header is followed by an extension;
- CSRC count (CC - 4 bits) contains the CSRC identifier number which follows the header;
- marker (M - 1 bit) is the application available field;
- payload type (PT - 7 bits) identifies the data field format of the RTP packet and determines its
interpretation by the application;
- sequence number (16 bits) value incremented by one for each RTP packet sent, is used by the
receiver to detect losses and to determine the right sequence;
- RTP timestamp (32 bits) is the sampling time of the first RTP byte, used for synchronisation
and jitter calculation;
- SSRC ID (32 bits) identifies the synchronisation source, chosen randomly within a RTP
session;
- CSRC ID list (from 0 to 15*32 bits) is an optional field identifying the sources which
contribute to the data in the packet.The number of the CSRC IDs is written in the CSRC
count field.
{ 2.2.5.2 RTCP packet-types and format
In order to transport the session control information, the RTCP foresees a number of packet-types:
- SR, Sender Report, to carry the information sent by the transmitters, to give notice to the
other participants on the control information they should receive (number of bytes, number of
packets, etc.);
P.50
[IP Telephony Cookbook] / Technological Background
- RR, Receiver Report, to carry the statistics of the session participants which are not active
transmitters;
- SDES, Source DESscription, to carry the session description (including the CNAME
identifier);
- BYE, to notify the intention of leaving the session;
- AAP, to carry application specific functions, used by experimental use of new applications.
Every RTCP packet begins with a fixed part similar to the one of the RTP ones, and this part is
then followed by structural elements of variable length. More than one RTCP packet may be
linked together to build a COMPOUND PACKET. Moreover, in order to maximise the
statistics resolution, the SR and the RR packet-types are to be sent more often than the other
packet-types.
P.51