TECHNOLOGICAL BACKGROUND:Components, Terminal, Protocols, SIP

<< INTRODUCTION:Goal, Reasons for writing this document, How to read this document

IP TELEPHONY SCENARIOS:Long-distance least cost routing, Integration of VoIP and videoconferencing >>

[IP Telephony Cookbook] / Technological Background

Technological Background }

This chapter provides technical background information about the protocols and components

used in IP Telephony. It introduces the relevant component types, gives detailed information

about H.323, SIP and RTP as well as information about media gateway control and vendor

-specific protocols.

} 2.1 Components

An IP Telephony infrastructure usually consists of different types of components.This section

gives an overview of typical components without describing them in a protocol-specific context.

} 2.1.1 Terminal

A terminal is a communication endpoint that terminates calls and their media streams. Most

commonly, this is either a hardware or a software telephone or videophone, possibly enhanced

with data capabilities.There are terminals that are intended for user interaction and others that are

automated, e.g., answering machines.

An IP Telephony terminal is located on at least one IP address.There may well be multiple

terminals on the same IP address but they are treated independently. Most of the time, a terminal

has been assigned one or more addresses (see Section 2.1.5), which others will use to dial to it.

If IP Telephony servers are used, a terminal registers the addresses with its server.

} 2.1.2 Server

Placing an IP Telephony call requires at least two terminals, and the knowledge of the IP address

and port number of the terminal to call. Obviously, forcing the user to remember and use IP

addresses for placing calls is not ideal and dynamic IP addressing schemes (DHCP) make this

requirement even more intolerable.

As mentioned before, terminals usually register their addresses with a server.The server stores

these telephone addresses along with the IP addresses of the respective terminals, and is thus able

to map a telephone address to a host.

When a telephone user dials an address, the server tries to resolve the given address into a

network address.To do so, the server may interact with other telephony servers or services.

It may also provide further call routing mechanisms like CPL (Call Processing Language) scripts

P.11

[IP Telephony Cookbook] / Technological Background

or skill-based routing (e.g., route calls to `WWW-Support' to a list of persons who are tagged to

be responsible for this subject).

Finally, a telephony server is responsible for authenticating registrations, authorising calling parties

and performing the accounting

{ 2.1.3 Gateway

Gateways are telephony endpoints that facilitate calls between endpoints that usually would not

interoperate. Usually this means that a gateway translates one signalling protocol into another (e.g.

SIP/ISDN signalling gateways), but translating between different network addresses (IPv4/IPv6)

or codecs (media gateways) can be considered gatewaying as well. Of course, it is possible that

multiple functionalities exist in a single gateway.

Finding gateways between VoIP and a traditional PBX is usually quite simple. Gateways that

translate different VoIP protocols are harder to find. Most of them are limited to basic call

functionality.

{ 2.1.4 Conference bridge

Conference bridges provide the means to have 3-point or multi-point conferences that can either

be ad-hoc or scheduled. Because of the high resource requirements, conference bridges are

usually dedicated servers with special media hardware.

{ 2.1.5 Addressing

A user willing to use a communication service needs an identifier to describe himself and the

called party. Ideally, such an identifier should be independent of the user's physical location.The

network should be then responsible for finding the current location of the called party. A specific

user may define to be reached by multiple contact address identifiers.

Regular telephony systems use E.164 numbers (the international public telecommunication

numbering plan). An identifier is composed of up to fifteen digits with a leading plus sign, for

example, +1234565789123.When dialling, the leading plus is normally replaced by the

international access code, usually double zero (00).This is followed by a country code and a

subscriber number.

The first IP Telephony systems used the IP addresses of end-point devices as user identifiers.

Sometimes they are still used now. However, IP addresses are not location-independent (even if

IPv6 is used) and they are hard to remember (especially if IPv6 is used) so they are not suitable as

user identifiers.

Current IP Telephony systems use two kinds of identifiers:

- URIs (RFC2396);

- Numbers (E.164).

P.12

[IP Telephony Cookbook] / Technological Background

Some systems tried to use names (alpha-numeric strings), but this led to a flat naming space and

thus limited zones of applicability.

A Universal Resource Identifier (URI) uses a registered naming space to describe a resource in a

location-independent way. Resources are available under a variety of naming schemes and access

methods including e-mail addresses (mailto), SIP identifiers (sip), H.323 identifiers (h.323,

RFC3508) or telephone numbers (draft-ietf-iptel-rfc2806bis-02). E-mail-like identifiers have

several advantages.They are easy to remember, nearly every Internet user already has an e-mail

address and a new service can be added using the same identifier.The user location can be found

with a Domain Name System (DNS).The disadvantage of URIs is that they are difficult or

impossible to dial on some user devices (phones).

If we want to integrate a regular telephony system with IP Telephony, we must deal with phone

number identifiers even on the IP Telephony-side.The numbers are not well suited for an

Internet world relying on domain names.Therefore, the ENUM system was invented, using

adapted phone numbers as domain names. ENUM is described in Chapter 7.

{ 2.2. Protocols

{ 2.2.1 H.323

The H.323 Series of Recommendations evolved out of the ITU-T's work on video telephony

and multimedia conferencing. After completing standardisation on video telephony and

videoconferencing for ISDN at up to 2 Mbit/s in the H.320 series, the ITU-T took on work on

similar multimedia communication over ATM networks (H.310, H.321), over the analogue Public

Switched Telephone Network (PSTN) using modem technology (H.324), and over the stillborn

Isochronous Ethernet (H.322).The most widely-adopted and hence most promising network

infrastructure - and the one bearing the largest difficulties to achieve well-defined Quality of

Service - was addressed in the beginning of 1995 in H.323: Local Area Networks, with the focus

on IP as the network layer protocol.The primary goal was to interface multimedia

communication equipment on LANs to the reasonably well-established base on circuit-switched

networks.

The initial version of H.323 was approved by the ITU-T about one year later, in June 1996,

thereby providing a base on which the industry could converge.The initial focus was clearly on

local network environments, because QoS mechanisms for IP-based wide area networks, such as

the Internet, were not well established at this point. In early 1996, Internet-wide deployment of

H.323 was already explicitly included in the scope, as was the aim to support voice-only

applications and, thus, the foundations to use H.323 for IP Telephony were laid. H.323 has

continuously evolved towards becoming a technically sound and functionally rich protocol

platform for IP Telephony applications.The first major additions to this end were included in

H.323 version 2, approved by the ITU-T in January 1998. In September 1999, H.323v3 was

approved by the ITU-T, incorporating numerous further functional and conceptual extensions to

enable H.323 to serve as a basis for IP Telephony on a global scale and as well as making it meet

requirements in enterprise environments. Moreover, many new enhancements were introduced

into the H.323 protocol.Version 4 was approved on November 17, 2000 and contains

enhancements in a number of important areas, including reliability, scalability, and flexibility.

P.13

[IP Telephony Cookbook] / Technological Background

New features help facilitate more scalable Gateway and MCU solutions to meet the growing

market requirements. H.323 has been the undisputed leader in voice, video, and data conferencing

on packet networks, and Version 4 endeavours to keep H.323 ahead of the competition.

{ 2.2.1.1 Scope

As stated before, the scope of H.323 encompasses multimedia communication in IP-based

networks, with significant consideration given to gatewaying to circuit-switched networks (in

particular to ISDN-based video telephony and to PSTN/ISDN/GSM for voice communication).

Internet / Intranet

ISDN

H.320

H.323

Terminal

H.323

Gatekeeper

PSTN

H.324

H.323

MCU

H.323

ATM

Gateway

H.310, H.321

H.323

Terminal

Internet/Intranet

SIP

Figure 2.1 Scope and components defined in H.323

H.323 defines a number of functional / logical components as shown in Figure 2.1:

- Terminal

Terminals are H.323-capable endpoints, which may be implemented in software on

workstations or as stand-alone devices (such as telephones).They are assigned to one or more

aliases (e.g. a user's name/URI) and/or telephone number(s);

- Gateway

Gateways interconnect H.323 entities (such as endpoints, MCUs, or other gateways) to other

network/protocol environments (such as the telephone network).They are also assigned one or

more aliases and/or telephone number(s).The H.323 Series of Recommendations provides

detailed specifications for interfacing H.323 to H.320, ISDN/PSTN, and ATM-based networks.

Recent work also addresses control and media gateway specifications for telephony trunking

networks such as SS7/ISUP;

- Gatekeeper

The gatekeeper is the core management entity in an H.323 environment. It is, among other

things, responsible for access control, address resolution and H.323 network (load) management

and provides the central hook to implement any kind of utilisation / access policies. An H.323

environment is subdivided into zones (which may, but need not be congruent with the

underlying network topology); each zone is controlled by one primary gatekeeper (with

optional backup gatekeepers). Gatekeepers may also provide added value, e.g., act as a

P.14

[IP Telephony Cookbook] / Technological Background

conferencing bridge or offer supplementary call services. An H.323 Gatekeeper can also be

equipped with the proxy feature. Such a feature enables the routing through the gatekeeper of

the RTP traffic (audio and video) and the T.120 traffic (data), so no traffic is directly exchanged

between endpoints. (It could be considered a kind of IP-to-IP gateway that can be used for

security and QoS purposes);

- Multipoint Controller (MC)

A Multipoint Controller is a logical entity that interconnects the call signalling and conference

control channels of two or more H.323 entities in a star topology. MCs coordinate the (control

aspects of) media exchange between all entities involved in a conference.They also provide the

endpoints with participant lists, exercise floor control, etc. MCs may be embedded in any H.323

entity (terminals, gateways gatekeepers) or implemented as stand-alone entities.They can be

cascaded to allow conferences spanning multiple MCs;

- Multipoint Processor (MP)

For multipoint conferences with H.323, an optional Multipoint Processor may be used that

receives media streams from the individual endpoints, combines them through some

mixing/switching technique, and transmits the resulting media streams back to the endpoints;

- Multipoint Control Unit (MCU)

In the H.323 world, an MCU is simply a combination of an MC and an MP in a single device.

The term originates in the ISDN videoconferencing world where MCUs were needed to

create multipoint conferences out of a set of point-to-point connections.

{ 2.2.1.2 Signalling protocols

H.323 resides on top of the basic Internet Protocols (IP, IP Multicast,TCP, and UDP) in a similar

way as the IETF protocols discussed in the next subsection, and can make use of integrated and

differentiated services along with resource reservation protocols.

Audio

Conference

Gatekeeper

Data Applications

Video

Control

T.120

RTP/RTCP

RAS

H.225.0

H.245

RSVP

Relaiable MC

UDP

TCP + RFC 1006

IP / IP Multicast

Intergrated / Differentiated Services Forwarding

Figure 2.2 H.323 protocol architecture

For basic call signalling and conference control interactions with H.323, the aforementioned

components communicate using three control protocols:

P.15

[IP Telephony Cookbook] / Technological Background

- H.225.0 Registration, Admission, and Status (RAS)

The RAS channel is used for communication between H.323 endpoints and their gatekeeper

and for some inter-gatekeeper communication. Endpoints use RAS to register with their

gatekeeper, to request permission to utilise system resources, to have addresses of remote

endpoints resolved, etc. Gatekeepers use RAS to keep track of the status of their associated

endpoints and to collect information about actual resource utilisation after call termination.

RAS provides mechanisms for user/endpoint authentication and call authorisation;

- H.225.0 Call Signalling

The call signalling channel is used to signal call setup intention, success, failures, etc, as well as

to carry operations for supplementary services (see below). Call signalling messages are derived

from Q.931 (ISDN call signalling); however, simplified procedures and only a subset of the

messages are used in H.323.The call signalling channel is used end-to-end between calling

party and called party and may optionally run through one or more gatekeepers (the call

signalling models are later described in the `Signalling models' Section).

Optimisations: Since version 3, H.225.0 supports the following enhancements:

Multiple Calls - To prevent using a dedicated TCP connection for each call, gateways can

be built to handle multiple calls on each connection.

Maintain Connection - Similar to Multiple Calls, this enhancement will reduce the need

to open new TCP connections. After the last call has ended, the endpoint may decide to

maintain the TCP connection to provide a better call setup time for the next call.

The primary use of both enhancements is at the communication between servers (gatekeeper,

MCU) or gateways.While, in theory, both mechanisms were possible before, beginning with

H.323v3, the messages contained fields to indicate support for the mechanisms;

- H.245 Conference Control

The conference control channel is used to establish and control two-party calls (as well as

multiparty conferences). Its functionality includes determining possible modes for media

exchange (e.g., select media encoding formats that both parties understand) and configuring

actual media streams (including exchanging transport addresses to send media streams to and

receive them from). H.245 can be used to carry user input (such as DTMF) and enables

confidential media exchange and defines syntax and semantics for multipoint conference

operation (see below). Finally, it provides a number of maintenance messages. Also, this logical

channel may (optionally) run through one or more gatekeepers, or directly between calling

party and called party (please refer to the `Signalling models' Section for details).

It should be noted that H.245 is a legacy protocol inherited from the collective work on

multimedia conferencing over ATM, PSTN and other networks. Hence it carries a lot of fields

and procedures that do not apply to H.323 but make the protocol specification quite

heavyweight.

Optimisations:

The conference control channel is also subject to optimisations. Per default, it is transported

over an exclusive TCP connection but it may also be tunnelled within the signalling connection

P.16

[IP Telephony Cookbook] / Technological Background

(H.245 tunnelling). Other optimisations deal with the call setup time.The last chance to start an

H.245 channel is on receipt of the CONNECT message which implies that the first seconds

after the user accepted the call, no media is transmitted. H.245 may also start parallel to the

setup of the H.225 call signalling, which is not really a new feature but another way of dealing

with H.245.Vendors often call this Early Connect or Early Media. Since H.323v2, it is

possible to start a call using a less powerful but sufficient capability exchange by simply offering

possible media channels that just have to be accepted.This procedure, called FastConnect or

FastStart, requires less round-trips and is transported over the H.225 channel. After the

FastConnect procedure is finished or when it fails, the normal H.245 procedures start.

A number of extensions to H.323 include mechanisms for more efficient call setup (H.323 Annex

E) and reduction of protocol overhead e.g., for simple telephones (SETs, simple endpoint types

and H.323.Annex F).

{ 2.2.1.3 Gatekeeper discovery and registration

An H.323 endpoint usually registers with a gatekeeper that provides basic services like address

resolution for calling the other endpoints.There are two possibilities for an endpoint to find its

gatekeeper:

- Multicast discovery

The endpoint sends a gatekeeper request (GRQ) to a well-known multicast address

(224.0.1.41) and port (1718). Receiving gatekeepers may confirm their responsibility for the

endpoint (GCF) or ignore the request

- Configuration

The endpoint knows the IP address of the gatekeeper by manual configuration.While there is

no need for a gatekeeper request (GRQ) to be sent to the preconfigured gatekeeper, some

products need this protocol step. If a gatekeeper receives a GRQ via unicast, it must either

confirm (GCF) the request or reject it (GRJ).

When trying to discover the gatekeeper via multicast, an endpoint may request any gatekeeper or

specify the request by adding a gatekeeper identifier to the request. Only the gatekeeper that has

the requested identifier may reply positively. (see Figure 2.3)

Endpoint

Gatekeeper

h323:prelle

id1

id2

GRQ:id1

GCF

GRJ

RRQ:prelle

RCF

Figure 2.3 Discovery and registration process

P.17

[IP Telephony Cookbook] / Technological Background

After the endpoint discovers the location of the gatekeeper, it tries to register itself (RRQ). Such

a registration includes (among other information):

- The addresses of the endpoint - for a terminal, this may be the user ids or telephone numbers.

An endpoint may have more than one address. In theory it is possible that addresses belong to

different users to enable multiple users to share a single phone - in practice, this depends on the

phones and the gatekeeper implementation;

- Prefixes - if the registering endpoint is a gateway it may register number prefixes instead of

addresses;

- Time to live - an endpoint may request how long the registration will last.This value can be

overwritten by gatekeeper policies.

The gatekeeper checks the requested registration information and confirms the (possibly

modified) values (RCF). It may also reject a registration request because of, for example, invalid

addresses. In the case of a confirmation, the gatekeeper assigns a unique identifier to the endpoint,

which will be used in subsequent requests to indicate that the endpoint is still registered.

2.2.1.3.1 Addresses and registrations

H.323 defines and utilises several address types.The one most commonly used and derived from

the PSTN world is the dialled digit address, which is defined as a number dialled by the endpoint.

It does not include further information (e.g., about the dial plan) and needs to be interpreted by

the server.The server might convert the dialled number into a party number that includes

information about the type of number and the dial plan.

To provide alphanumeric or name dialling, H.323 supports H.323-IDs that represent either

usernames or e-mail-like addresses, or the more general approach of URL-ID which represent

any kind of URL.

Unlike SIP addresses, an H.323 address can only be registered by one endpoint (per zone), so a

call to that address only resolves to a single endpoint.To call multiple destinations simultaneously

in H.323 requires a gatekeeper that actively maps a single address to multiple different addresses

and tries to contact them in sequence.

2.2.1.3.2 Updating registrations

A registration expires after a defined time and must therefore be refreshed i.e., kept alive by

subsequent registrations which include the previously-assigned endpoint identifier.To reduce the

registration overhead of regular registrations, H.323 supports KeepAlive registrations that contain

only the previously-assigned endpoint identifier. Of course, these registrations may only be sent if

the registration information is unchanged.

Endpoints requesting the registration of large numbers of addresses would exceed the size of a

UDP packet, so H.323v4 supports Additive Registration, a mechanism that allows an endpoint

to send multiple registration requests (RRQ) in which the addresses do not replace existing

registrations but are submitted in addition to them.

P.18

[IP Telephony Cookbook] / Technological Background

{ 2.2.1.4 Signalling models

Call signalling messages and H.245 control messages may be exchanged either end-to-end

between calling party and called party or through a gatekeeper. Depending on the role the

gatekeeper plays in the call signalling and in the H.245 signalling, the H.323 specification foresees

three different types of signalling models:

- Direct signalling

With this signalling model, only H.225.0 RAS messages are routed through the gatekeeper

while the other logical channel messages are directly exchanged between the two endpoints;

- Gatekeeper-routed call signalling

With this signalling model, H.225.0 RAS and H.225.0 call signalling messages are routed

through the gatekeeper, while the H.245 Conference Control messages are directly exchanged

between the two endpoints;

- Gatekeeper-routed H.245 control, H.225.0 RAS and H.225.0

Call signalling and H.245 Conference Control messages are routed through the gatekeeper and

only the media streams are directly exchanged between the two endpoints.

The following sub-sections detail each signalling model.The figures displayed in this section apply

both to the use of a single gatekeeper and to the use of a gatekeeper network. Since the signalling

model is decided by the configuration of the endpoint's gatekeeper and applies to all the messages

the gatekeeper handles, the extensions to the multiple gatekeeper are straightforward (they simply

apply the definition of the signalling model described in the itemised list above to each

gatekeeper involved), except for the location of zone-external targets (described later in the

`Locating zone external targets' section). Message exchanges in any of the figures in this section

are not reported, as the figures are intended to remain bounded in the ellipse where the H.323

Gatekeeper is depicted. Also, it is described in the `Locating zone external targets' section. Please

note that there is no indication about the call termination in the sub-section of each signalling

model. Please refer to the `Communication phases' Section for details.

The direct signalling model is depicted in Figure 2.4. In this model, the H.225.0 Call Signalling

and H.245 Conference Control messages are exchanged directly between the call terminals. As

shown in the figure, the communication starts with an ARQ (Admission ReQuest) message

sent by the calling party (which may be either a terminal or a gateway) to the gatekeeper.The

ARQ message is used by the endpoint to request access to the packet-based network from the

gatekeeper, which either grants the request with an ACF (Admission ConFirm) or denies it

with an ARJ (Admission ReJect). If an ARJ is issued, the call is terminated. After this first step,

the call signalling part of the call begins with the transmission of the SET UP message from the

calling party to the called party.The transport address of the SET UP message (and of all the

H.225.0 call signalling messages) is retrieved by the calling party from the destCallSignalAddress

field carried inside the ACF received. In the case of the direct signalling model, it is the address of

the destination endpoint. Upon receiving the SET UP message, the called party starts its H.225.0

RAS procedure with the gatekeeper. If successful, a CONNECT message is sent back to the

calling party to indicate acceptance of the call. Before sending the CONNECT message, two

other messages may be sent from the called party to the calling party (those two messages are not

depicted in the figure since we have reported only mandatory messages):

P.19

[IP Telephony Cookbook] / Technological Background

- ALERTING message

This message may be sent by the called user to indicate that called user alerting has been

initiated (in everyday terms, the `phone is ringing');

- CALL PROCEEDING message

This message may be sent by the called user to indicate that requested call establishment has

been initiated and no more call-establishment information will be accepted.

Figure 2.4 Direct signalling model

The CONNECT message closes the H.225.0 call signalling part of the call and makes the

terminals starting the H.245 conference control one. In such call mode, the H.245 Conference

Control messages are exchanged directly between the two endpoints (the correct `h245Address'

was retrieved from the CONNECT message itself).The procedures started with the H.245

Conference Control channel are used to:

- allow the exchange of audiovisual and data capabilities, with the TERMINAL CAPABILITY

messages;

- request the transmission of a particular audiovisual and data mode, with the LOGICAL

CHANNEL SIGNALLING messages;

- manage the logical channels used to transport the audiovisual and data information;

- establish which terminal is the master terminal and which is the slave terminal for the purposes

of managing logical channels, with the MASTER SLAVE DETERMINATION messages;

- carry various control and indication signals;

- control the bit rate of individual logical channels and the whole multiplex, with the

MULTIPLEX TABLE SIGNALLING messages;

- measure the round trip delay, from one terminal to the other and back, with the ROUND

TRIP DELAY messages.

Once the H.245 conference control messages are exchanged, the two endpoints have all the

necessary information to open the media streams.

2.2.1.4.2 Gatekeeper-routed call signalling model

The gatekeeper-routed call signalling model is depicted in Figure 2.5. In this model, the H.245

Conference Control messages are exchanged directly between the call termination clients.With

each call, the communication starts with an ARQ message (Admission ReQuest) sent by the

calling party to its gatekeeper.The ARQ message is used by the endpoint to request access to the

P.20

[IP Telephony Cookbook] / Technological Background

packet-based network from the gatekeeper, which either grants the request with an ACF

(Admission ConFirm) or denies it with an ARJ (Admission ReJect). After this first step, the

call signalling part of the call begins with the transmission of the SET UP message from the calling

party to its gatekeeper.The transport address of the SET UP message (and of all the H.225.0 call

signalling messages) is retrieved by the calling party from the destCallSignalAddress field, carried

inside the ACF received. In the case of the gatekeeper-routed call signalling model, it is the

address of the gatekeeper itself.The SET UP message is then forwarded by the gatekeeper (or by

the gatekeeper network) to the called endpoint. Upon receiving the SET UP message, the called

party starts its H.225.0 RAS procedure with its gatekeeper. If successful, a CONNECT message is

sent to indicate acceptance of the call. Because of the call model, this message is also sent to the

called endpoint's gatekeeper which is in charge of forwarding it to the calling party endpoint

(either directly or using the gatekeeper network). Before sending the CONNECT message, two

other messages may be sent from the called party to its gatekeeper (those two messages are not

depicted in the figure since only mandatory messages are reported):

- ALERTING message

This message may be sent by the called user to indicate that called user alerting has been

initiated (in everyday terms, the `phone is ringing');

- CALL PROCEEDING message

This message may be sent by the called user to indicate that requested call establishment has

been initiated and no more call establishment information will be accepted.

Figure 2.5 Gatekeeper-routed call signalling model

The two optional messages listed above are then forwarded by the gatekeeper (or by the

gatekeeper network) to the calling party. After receiving the CONNECT message, the calling

party starts the H.245 Conference Control channel procedures directly with the called party (the

correct h245Address was retrieved from the CONNECT message itself).The scope of the H.245

Conference Control channel procedure is the same as is detailed above. Please refer to the `Direct

signalling model' Section for details.

2.2.1.4.3 Gatekeeper-routed H.245 control model

The gatekeeper-routed H.245 control model is depicted in Figure 2.6. In this model, only the

media streams are exchanged directly between the call termination clients. For each call, the

communication starts with an ARQ (Admission ReQuest) message sent by the calling party to

its gatekeeper.The ARQ message is used by the endpoint to be allowed to access the packet-based

P.21

[IP Telephony Cookbook] / Technological Background

network by the gatekeeper, which either grants the request with an ACF (Admission ConFirm)

or denies it with an ARJ (Admission ReJect). After this first step, the call signalling part of the

call begins with the transmission of the SET UP message from the calling party to its gatekeeper.

The transport address of the SET UP message (and of all the H.225.0 call signalling messages) is

retrieved by the calling party from the destCallSignalAddress field carried inside the ACF

received. In the case of gatekeeper-routed H.245 control model, it is the address of the gatekeeper

itself.The SET UP message is then forwarded by the gatekeeper (or by the gatekeeper network)

to the called endpoint. Upon receiving the SET UP message, the called party starts its H.225.0

RAS procedure with its gatekeeper. If successful, a CONNECT message is sent to indicate

acceptance of the call. Because of the call model, this message is also sent to the called endpoint's

gatekeeper, which is in charge of forwarding it to the calling party endpoint (either directly or

using the gatekeeper network). Before sending the CONNECT message, two other messages may

be sent from the called party to its gatekeeper (those two messages are not depicted in the figure

since only mandatory messages are reported):

- ALERTING message

This message may be sent by the called user to indicate that called user alerting has been

initiated (in everyday terms, the `phone is ringing');

- CALL PROCEEDING message

This message may be sent by the called user to indicate that requested call establishment has

been initiated and no more call establishment information will be accepted.

Figure 2.6 Gatekeeper-routed H.245 control model

The two optional messages listed above are then forwarded by the gatekeeper (or by the

gatekeeper network) to the calling party. After receiving the CONNECT message, the calling

party starts the H.245 Conference Control channel procedures with its gatekeeper (the correct

h245Address was retrieved from the CONNECT message itself). All of the H.245 channel

messages are then exchanged by the endpoints with their gatekeeper (or gatekeepers). It is the

gatekeeper (or gatekeeper network) which takes care of forwarding them up to the remote

endpoint as foreseen by the gatekeeper-routed H.245 control model.The scope of the H.245

Conference Control channel procedure is the same as is detailed above. Please refer to the `Direct

signalling model' Section for details.

P.22

[IP Telephony Cookbook] / Technological Background

{ 2.2.1.5 Communication phases

In a H.323, communication may be identified in five different phases:

- Call set up;

- Initial communication and capability exchange;

- Establishment of audiovisual communication;

- Call services;

- Call termination.

2.2.1.5.1 Call setup

Recommendation H.225.0 defines the call setup messages and procedures detailed here.The

recommendation foresees that requests for bandwidth reservation should take place at the earliest

possible phase. Unlike other protocols, there is no explicit synchronisation between two endpoints

during the call setup procedure (two endpoints can send a SET UP message to each other at

exactly the same time). Actions to be taken when problems of synchronisation during the

exchange of SET UP messages arise are resolved by the application itself. Applications not

supporting multiple simultaneous calls should issue a busy signal when they have an outstanding

SET UP message, while applications supporting multiple simultaneous calls issue a busy signal

only to the same endpoint to which they sent an outstanding SET UP message. Moreover, an

endpoint should be capable of sending the ALERTING messages. ALERTING means that the

called party has been alerted of an incoming call (`phone ringing', in the language of the old

telephony). Only the ultimate called endpoint originates the ALERTING message and only when

the application has already alerted the user. If a gateway is involved, the gateway sends

ALERTING when it receives a ring indication from the Switched Circuit Network (SCN).

The sending of an ALERTING message is not required if an endpoint can respond to a SET UP

message with a CONNECT, CALL PROCEEDING, or RELEASE COMPLETE within four

seconds. After successfully sending a SET UP message, an endpoint can expect to receive either an

ALERTING, CONNECT, CALL PROCEEDING, or RELEASE COMPLETE message within

4 seconds after successful transmission. Finally, to maintain the consistency of the meaning of the

CONNECT message between packet-based networks and circuit-switched networks, the

CONNECT message should be sent only if it is certain that the capability exchange will

successfully take place and a minimum level of communications can be performed.

The call setup phase may have different realisations:

- basic call setup when neither endpoint are registered

In this call setup the two endpoints communicate directly;

- both endpoints registered to the same gatekeeper

In this call set up the communication is decided by the signalling model configured on the

gatekeeper;

- only calling endpoint has gatekeeper

In this call setup only the calling party sends messages to the gatekeeper depending on the

signalling models configured while the called party sends the messages directly to the calling

party endpoint;

- only called endpoint has gatekeeper

In this call setup only the called party sends messages to the gatekeeper depending on the

signalling models configured while the calling party sends the messages directly to the called

endpoint;

P.23

[IP Telephony Cookbook] / Technological Background

- both endpoints registered to different gatekeepers

Each of the two endpoints communicate with their gatekeeper depending on the signalling

model configured, additional H.225.0 RAS messages may be exchanged between gatekeeper in

order to retrieve location information (see `Locating zone external targets' Section for more

details)

- call setup with Fast Connect procedure

In this call set up, the media channels are established using the Fast Connect procedure.The

Fast Connect procedure speeds up the establishment of a basic point-to-point call (only one

round-trip message exchange is needed) enabling immediate media stream delivery upon call

connection.The Fast Connect procedure is started if the calling endpoint initiates it by

sending a SETUP message containing the FastStart element (to advise it is going to use the

Fast Connect procedure).

This kind of element contains, among the other things, a sequence of all of the parameters

necessary to immediately open and begin transferring media on the channels.The Fast Connect

procedure may be refused by the called endpoint (motivations may be either because it wants to

use features requiring use of H.245 or because it does not implement it).The Fast Connect

procedure may be refused with any H.225.0 call signalling message, up to and including the

CONNECT one. Refusing the Fast Connect procedure (or not initiating it) requires that H.245

procedures be used for the exchange of capabilities and the opening of media channels. Moreover,

the Fast Connect procedure allows more information for the scope of H.323/SIP gatewaying

(further details to be found in Chapter 4);

- call setup via gateways

When a gateway is involved, the call setup between it and the network endpoint is the same as

the endpoint-to-endpoint call setup;

- call setup with an MCU

When an MCU is involved, all endpoints exchange call signalling with the MCU (and with the

interested gatekeepers, if any). No changes are foreseen between an endpoint and the MCU call

setup since it proceeds the same as the endpoint-to-endpoint;

- broadcast call setup

This kind of call setup follows the procedures defined in Recommendation H.332.

2.2.1.5.2 Initial communication and capability exchange

After exchanging call setup messages, the endpoints, if they plan to use H.245, establish the H.245

Control Channel.The H.245 Control Channel is used for the capability exchange and to open

the media channels.The H.245 Control Channel procedures are neither started nor closed if

CONNECT does not arrive. An H.245 Control Channel can also be opened on reception of

ALERTING or CALL PROCEEDING messages) or when an endpoint sends RELEASE

COMPLETE. H.323 endpoints support the capabilities exchange procedure of H.245.The H.245

TERMINALCAPABILITYSET message is used for the exchange of endpoint system capabilities.

This message is the first H.245 message sent.

The master-slave determination procedure of H.245 must be supported by H.323-compliant

endpoints. In cases of multipoint conferencing (MC), capability is present in more than one

endpoint and the master-slave determination is used for determining which MC will play an

active role.The H.245 Control Channel procedure also provides master-slave determination for

opening bi-directional channels for data.

P.24

[IP Telephony Cookbook] / Technological Background

After Terminal Capability Exchange has been initiated, a master-slave determination

procedure (consisting of either MASTERSLAVEDETERMINATION or

MASTERSLAVEDETERMINATIONACK) has to be started as the first H.245 Conference

Control procedure. Upon failure of initial capability exchange or master-slave determination

procedures, a maximum of two retries are performed before the endpoint passes to the Call

Termination phase. Normally, after successful completion of the requirements of this phase, the

endpoints proceed directly to establishment of the audiovisual communication phase.

2.2.1.5.2.1 Encapsulation of H.245 messages within H.225.0 call signalling messages

Encapsulation of H.245 messages inside H.225.0 call signalling messages instead of establishing a

separate H.245 channel is possible in order to save resources, synchronise call signalling and

control and reduce call setup time.This process is called `encapsulation' or `tunnelling' of H.245

messages.This procedure allows the terminal to copy the encoded H.245 message using one

structure inside the data of the Call Signalling Channel. If tunnelling is used, any H.225.0 call

signalling message may contain one or more H.245 messages. If there is no need to send an

H.225.0 call signalling message when an H.245 message has to be transmitted, a FACILITY

message is sent detailing (with appropriate fields inside) the reason for such a message.

2.2.1.5.3. Establishment of audiovisual communication

The establishment of audiovisual communication follows the procedures of Recommendation

H.245. Open logical channels for the various information streams are opened using the H.245

procedures.The audio and video streams are transported using an unreliable protocol, while data

communications are transported using a reliable protocol.The transport address that the receiving

endpoint has assigned to a specific logical channel (audio, video or data) is transported by the

OPENLOGICALCHANNELACK message (an example is given in Figure 2.7).That transport

address is used to transmit the information stream associated with that logical channel.

Figure 2.7 OPENLOGICALCHANNELACK message content

2.2.1.5.4. Call services

When the call is active, the terminal may request additional call services. Among the services

reported here are the Bandwidth Change Services and Supplementary Services.With Bandwidth

Change Services. During a conference, the endpoints or gatekeeper (if involved) may, at any time,

P.25

[IP Telephony Cookbook] / Technological Background

request an increase or decrease in the call bandwidth. If the aggregate bit rate of all transmitted

and received channels does not exceed the current call bandwidth, then an endpoint may change

the bit rate of a logical channel without requesting a bandwidth change. After requesting a

bandwidth change, the endpoint waits for confirmation prior to actually changing the bit rate

(confirmation usually comes from the gatekeeper). Asking for call bandwidth changes is

performed using a BANDWIDTH CHANGE REQUEST(BRQ) message. If the request is not

accepted, a BANDWIDTH CHANGE REJECT (BRJ) message is returned to the endpoint. If

the request is accepted, a BANDWIDTH CHANGE CONFIRM (BCF) is sent back to the

endpoint.With Supplementary Services, support is optional.The H.450 Series of

Recommendations describes a method of providing Supplementary Services in the H.323

environment. Figure 2.8 reports some of the supplementary services defined so far and their

number in the series.

Recommendation number

Recommendation Title

H.450.1

Supplementary Service Framework

H.450.2

Call Transfer Supplementary Service

H.450.3

Call Diversion Supplementary Service

H.450.4

Call Hold Supplementary Service

H.450.5

Call Park and Pickup Supplementary Service

H.450.6

Call Waiting Supplementary Service

H.450.7

Message Waiting Supplementary Service

H.450.8

Name Identification Supplementary Service

H.450.9

Call Completion Supplementary Service

H.450.10

Call Offer Supplementary Service

H.450.11

Call Intrusion Supplementary Service

Figure 2.8 Supplementary services of the H.450-Series

2.2.1.5.5. Call termination

A call may be terminated either by both endpoints or by the gatekeeper. Call termination is

defined using the following procedure:

- video should be terminated after a complete picture and then all logical channels for video

closed;

- data transmission should be terminated and then all logical channels for data closed;

- audio transmission should be terminated and then all logical channels for audio closed;

- the H.245 ENDSESSIONCOMMAND message (H.245 Control Channel) should be sent by

the endpoint/gatekeeper.This message indicates that the call has to be disconnected; then the

H.245 message transmission should be terminated;

- the ENDSESSIONCOMMAND message should be sent back to the sending endpoint and

then the H.245 Control Channel should be closed;

- a RELEASE COMPLETE message should be sent closing the Call Signalling Channel if this is

still open.

An endpoint receiving an ENDSESSIONCOMMAND message does not need to receive it back

again after replying to it in order to clear a call.Terminating a call within a conference does not

mean that the whole conference needs to be terminated. In order to terminate a conference, an

H.245 message (DROPCONFERENCE) is used.Then the MC should terminate the calls with

the endpoint as described above.

P.26

[IP Telephony Cookbook] / Technological Background

A call may be terminated differently depending on the gatekeeper presence and on the party

issuing the call termination:

- call clearing without a gatekeeper

No further action is required;

- call clearing with a gatekeeper

The gatekeeper needs to be informed about the call termination. After RELEASE

COMPLETE is sent, an H.225.0 DISENGAGE REQUEST (DRQ) message should be sent by

each endpoint to its gatekeeper. A Disengage Confirm (DCF) message is sent back to the

endpoints to acknowledge the reception;

- call clearing issued by the gatekeeper

A call may be terminated by the gatekeeper by sending a DRQ to an endpoint.The procedure

described above for call termination should be followed immediately by the endpoint up to the

RELEASE COMPLETE message.Then a reply to the gatekeeper should be sent using a DCF

message.The other endpoint should follow the same call termination procedures upon

receiving the ENDSESSIONCOMMAND message. Moreover, if a multipoint conference is

taking place, in order to close the entire conference, the gatekeeper should send a DRQ to each

endpoint in the conference.

{ 2.2.1.6 Locating zone-external targets

When calling an address that is registered at the same gatekeeper as the calling party, the

gatekeeper just needs to look up its internal tables to resolve the target address. Complexity enters

the picture if the destination address is registered with another gatekeeper.While Chapter 7 will

cover this topic in more detail, the most basic mechanism that H.323 provides is explained here.

A gatekeeper may explicitly request the resolution of an address from other gatekeepers. On receipt

of a request to call an address for which the gatekeeper has no registration, it can send out a

location request (LRQ) to other gatekeepers (see Figure 2.9).The receiving gatekeeper, assuming it

knows the address, will reply with the Transport Service Access Point (a combination of IP

address and port number) of either the requested address or its own call signalling TSAP.

Endpoint

Gatekeeper

x@tzi.o

tzi.org

ubik@cesnet.cz

cesnet.cz

RRQ: x@tzi.org

RRQ: ubik@cesnet.cz

RCF

ARQ:x@tzi.org

LRQ:x@tzi.org

LCF + IP

ACF + IP

Setup: x@tzi.org

Figure 2.9 External address resolution using LRQs

P.27

[IP Telephony Cookbook] / Technological Background

A location request can be sent via unicast or multicast. If sent via multicast, only the gatekeeper

that can resolve the address replies. If a gatekeeper receives a unicast LRQ, it either confirms or

rejects the request.

This mechanism can have a list of peer gatekeepers to ask, in parallel or sequentially. It is also

possible to assign a domain suffix or number prefix to each peer so that an address with a

matching number prefix of a neighbouring institution will result in a request to the gatekeeper of

that institution. By defining default peers, one could also build a hierarchy of gatekeepers (see

Chapter 7 for further details).

{ 2.2.1.7 A sample call scenario

Figure 2.10 depicts an example of an inter-zone call setup using H.323 with one gatekeeper (A)

using direct signalling while the other uses routed signalling.The calling party in zone A contacts

its gatekeeper to ask for permission to call the called party in zone B (1).The gatekeeper of zone

A confirms this request and provides the calling party with the address of zone B's gatekeeper

(2).1 The calling party establishes a call signalling channel (and subsequently/in parallel the

conference control channel) to the gatekeeper of zone B (3), who determines the location of the

called party and forwards the request to the called party (4).

Zone A

Zone B

Gate-

(6)

keeper

(3)

(1)

(2)

(4)

(7)

(8)

(5)

Caller

Callee

(9)

H.225.0 RAS

H.225.0 Call Signaling + H.245

Media Streams

Figure 2.10 A sample H.323 call setup scenario

The called party explicitly confirms with its gatekeeper that it is allowed to accept the call (5, 6)

and, if so, alerts the recipient of the call, returns an alerting indication and (once the receiving user

picks up the call) eventually an indication of successful connection setup back to the calling party

(7, 8). In (parallel to) this exchange, capability negotiation and media stream configuration take

place.When the setup has completed, both parties start sending media streams directly to each

other.

P.28

[IP Telephony Cookbook] / Technological Background

{ 2.2.1.8 Additional (call) services

It is well known from our daily interaction with PBXs that telephony service comprises far more

than just call setup and teardown: n-way conferencing and various supplementary services (such as

call transfer, call waiting, etc.) are available. Similar features, at least the more commonly known

and used ones, need to be provided by IP Telephony systems as well in order to be accepted by

customers. Additional call services in H.323 can be grouped into three categories:

- Conferencing

H.323 inherently supports multipoint tightly-coupled conferencing, i.e., conferences with

access control, optional support for conference chairs, and close synchronisation of conference

state among all participants from the outset, through the concept of a Multipoint Controller

and an optional Multipoint Processor.While control is centralised in the MC, in theory, data

exchange may be either via IP multicast, multi-unicast (i.e., peer-wise fan-out between

endpoints without MP), or through an MP. (There seems to be practically no H.323 equipment

supporting media multicast.) The distribution mode may be selected per media and per

endpoint peer and is controlled by the MC;

- Broadcast conferencing;

H.323 also provides an interface to support large loosely-coupled conferences as are frequently

used in the Mbone to multicast seminars, events, etc. In this case, the MC defines a session

description (using the Session Description Protocol, SDP, see below) for the H.323 media

sessions (which have to operate using multicast) and announces this description by some means

(e.g., the Session Announcement Protocol, SAP). Details are defined in ITU-T H.332.

- Supplementary services

H.323 provides a variety of supplementary services with additional ones continuously being

defined.While some services can be accomplished using the basic H.323 specifications, the H.450.x

Recommendations defines a framework (derived from QSIG, the ECMA/ISO/ETSI standard

for supplementary service signalling in PBXs) and a number of services (call transfer, call di-

version, call hold, call park & pickup, call waiting, message waiting indication and call completion).

Further extensions to supplementary services and other functional enhancements are on the way.

In particular, an HTTP-based extension framework is being defined at the time of writing to

enable rapid introduction of new services without the need for standardisation.

{ 2.2.1.9 H.235 Security

The H.235 recommendation defines elements of security for H.323:

- Authentication

Authentication can be achieved by using a shared secret (password) or digital signatures.The

RAS messages include a token that was generated using either the shared secret or the

signature. A receiving entity authenticates the sender by comparing the received token with a

self-generated token;

- Message Integrity

Integrity is achieved by generating password-based checks on the message;

Privacy Mechanisms are provided to setup encryption on the media streams.They must be used in

conjunction with the H.245 protocol and employ DES,Triple DES or RC2.The use of SRTP is

not supported yet (in H.235v2).

P.29

[IP Telephony Cookbook] / Technological Background

These mechanisms are grouped into the Security Profiles, where the Baseline Security Profile

provides authentication and message integrity, making it suitable for subscription-based

environments and the Voice Encryption Profile that provides confidential end-to-end media

channels.

{ 2.2.1.10 Protocol Profiles

H.323 has its origin, as mentioned before, in the area of multimedia conferencing.This implies

that a vast number of options are available, which are not necessary for simply providing

telephony services.The TIPHON project of the European Telecommunication Standards Institute

(ETSI) has defined a telephony profile for H.323 that specifies which combination of options

should be implemented.

Similarly, H.323 contains a security framework (H.235) that describes a collection of algorithms

and protocol mechanisms but lacks, because of international political constraints, a precise

specification of a mandatory baseline.This is accounted for by the ETSI TIPHON security

profile: this specification fills in the gaps and provides the foundation for inter-operable

implementations.

In summary, it can be said that the H.323 family of standards provides a mature basis for

commercial products in the field of IP Telephony.While the details of the protocol are often

dominated by their legacy from various earlier ITU protocols, there is an active effort to profile

and simplify the protocol to reduce the complexity.

{ 2.2.2 SIP

{ 2.2.2.1 The purpose of SIP

SIP stands for Session Initiation Protocol. It is an application-layer control protocol that has been

developed and designed within the IETF.The protocol has been designed with easy

implementation, good scalability, and flexibility in mind.

The specification is available in form of several RFCs.The most important one is RFC3261,

which contains the core protocol specification.The protocol is used for creating, modifying and

terminating sessions with one or more participants. By sessions, we understand a set of senders

and receivers that communicate and the state kept in those senders and receivers during the

communication. Examples of a session can include Internet telephone calls, distribution of

multimedia, multimedia conferences, distributed computer games, etc.

SIP is not the only protocol that the communicating devices will need. It is not meant to be a

general purpose protocol.The purpose of SIP is just to make the communication possible.The

communication itself must be achieved by other means (and possibly another protocol).Two

protocols that are most often used along with SIP are RTP and SDP.The RTP protocol is used to

carry the real-time multimedia data (including audio, video and text).The protocol makes it

possible to encode and split the data into packets and transport these packets over the Internet.

Another important protocol is SDP, Session Description Protocol, which is used to describe and

P.30

[IP Telephony Cookbook] / Technological Background

encode capabilities of session participants. Such a description is then used to negotiate the

characteristics of the session so that all of the devices can participate, including, for example,

negotiation of codecs used to encode media so all the participants will be able to decode it,

negotiation of transport protocol used and so on.

SIP has been designed in conformance with the Internet model. It is an end-to-end

-oriented signalling protocol which means that all the logic is stored in end-devices (except

routing of SIP messages). State is also stored only in end-devices.There is no single point of

failure and networks designed this way scale well.The price we have to pay for the

`distributiveness' and scalability is higher message overhead, caused by the messages being sent

end-to-end.

It is worth mentioning that the end-to-end concept of SIP is a significant divergence from a

regular PSTN (Public Switched Telephone Network) where all the state and logic is stored in the

network and the end-devices (telephones) are very primitive.The aim of SIP is to provide the

same functionality that the traditional PSTNs have, but the end-to-end design makes SIP

networks much more powerful and open to the implementation of new services that can hardly

be implemented in the traditional PSTNs.

SIP is based on HTTP protocol.The HTTP protocol inherited format of message headers from

RFC822. HTTP and is probably the most successful and widely used protocol in the Internet.

SIP tries to combine the best of both. In fact, HTTP can be classified as a signalling protocol too,

because user-agents use the protocol to tell an HTTP server which documents they are interested

in. SIP is used to carry the description of session parameters.The description is encoded into a

document using SDP. Both protocols (HTTP and SIP) have inherited the encoding of message

headers from RFC822.The encoding has proven to be robust and flexible over the years.

2.2.2.1.1 SIP URI

SIP entities are identified using SIP URI (Uniform Resource Identifier). A SIP URI has the form

of sip:username@domain, or sip:joe@company.com. SIP URI consists of a username part and

a domain name part, delimited by the @ (at) character. SIP URIs are similar to e-mail addresses

and it is, for instance, possible to use the same URI for e-mail and SIP communication. Such

URIs are easy to remember.

{ 2.2.2.2 SIP network elements

Although, in the simplest configuration, it is possible to use just two user agents that send SIP

messages directly to each other, a typical SIP network will contain more than one type of SIP

element. Basic SIP elements are user agents, proxies, registrars and redirect servers.They are

described briefly in this section.

Note that the elements, as presented in this section, are often only logical entities. It is often

profitable to co-locate them, for instance, to increase the speed of processing, but that depends on

the particular implementation and configuration.

P.31

[IP Telephony Cookbook] / Technological Background

2.2.2.2.1. User agents

Internet endpoints that use SIP to find eachother and to negotiate a session's characteristics are

called user agents. User agents usually, but not necessarily, reside on a user's computer in form of

an application.This is currently the most widely-used approach, but user agents can be also

cellular phones, PSTN gateways, PDAs, automated IVR systems and so on.

User agents are often referred to as User Agent Server (UAS) and User Agent Client (UAC). UAS

and UAC are logical entities and each user agent contains a UAC and UAS. UAC is the part of

the user agent that sends requests and receives responses. UAS is the part of the user agent that

receives requests and sends responses.

Because a user agent contains both UAC and UAS, user agents behave like a UAC or a UAS. For

instance, a calling party's user agent behaves like UAC when it sends an INVITE request and

receives responses to the request. A called party's user agent behaves like a UAS when it receives

the INVITE and sends responses.

But this situation changes when the called party decides to send a BYE and terminate the session.

In this case the called party's user agent (sending BYE) behaves like UAC and the calling party's

user agent behaves like UAS.

Called Party

UAC

Stateful Forking Proxy

Calling Party

UAS

INVITE

UAC

INVITE

UAC

UAS

Called Party

UAS

INVITE

UAC

UAS

BYE

UAC

Figure 2.11 UAC and UAS

Figure 2.11 shows three user agents and one stateful forking proxy. Each user agent contains UAC

and UAS.The part of the proxy that receives the INVITE from the calling party, in fact, acts as a

UAS.When forwarding the request statefully, the proxy creates two UACs, each of them

responsible for one branch.

In the example, called party B picked up and later, when he wants to tear down the call, he sends

a BYE. At this time, the user agent that was previously UAS becomes a UAC and vice versa.

P.32

[IP Telephony Cookbook] / Technological Background

2.2.2.2.2 Proxy servers

SIP allows the creation of an infrastructure of network hosts called proxy servers. User agents

can send messages to a proxy server. Proxy servers are very important entities in the SIP

infrastructure.They perform routing of a session invitations according to invitee's current

location, authentication, accounting and many other important functions.

The most important task of a proxy server is to route session invitations `closer' to a called party.

The session invitation will usually traverse a set of proxies until it finds one which knows the

actual location of the called party. Such a proxy will forward the session invitation directly to the

called party and the called party will then accept or decline the session invitation.

There are two basic types of SIP Proxy Servers, stateless and stateful.

2.2.2.2.2.1 Stateless servers

Stateless servers are simple message forwarders.They forward messages independently of

eachother. Although messages are usually arranged into transactions (see Section 2.2.2.4).

Stateless proxies do not take care of transactions.

Stateless proxies are simple, but faster than stateful proxy servers.They can be used as simple load

balancers, message translators and routers. One of drawbacks of stateless proxies is that they are

unable to absorb re-transmissions of messages or perform more advanced routing, for instance,

forking or recursive traversal.

2.2.2.2.2.2 Stateful servers

Stateful proxies are more complex. Upon reception of a request, stateful proxies create a state and

keep the state until the transaction finishes. Some transactions, especially those created by

INVITE, can last quite long (until the called party picks up or declines the call). Because stateful

proxies must maintain the state for the duration of the transactions, their performance is limited.

The ability to associate SIP messages into transactions gives stateful proxies some interesting

features. Stateful proxies can perform forking; that means that upon reception of a message, two or

more messages will be sent out.

Stateful proxies can absorb re-transmissions because they know from the transaction state if they

have already received the same message (stateless proxies cannot do the check because they keep

no state).

Stateful proxies can perform more complicated methods of finding a user. It is, for instance,

possible to try to reach user's office phone and when he does not pick up, redirect the call to his

cell phone. Stateless proxies cannot do this because they have no way of knowing how the

transaction targeted to the office phone finished.

Most SIP Proxies today are stateful because their configuration is usually very complex.They

often perform accounting, forking and some sort of NAT traversal aid and all those features

require a stateful proxy.

P.33

[IP Telephony Cookbook] / Technological Background

2.2.2.2.2.3 Proxy server usage

In a typical configuration, each centrally-administered entity (a company, for instance) has its own

SIP Proxy Server, which is used by all user agents in the entity. Suppose that there are two

companies, A and B, and each of them has its own proxy server. Figure 2.12 shows how a session

invitation from employee Joe in company A will reach employee Bob in company B.

DNS Server

2. SIP SRV

for b.com

Company A

Company B

3. proxy.b.com

proxy.a.com

Joe

proxy.b.com

4. INVITE

1. INVITE

5. INVITE

5.6.7.8

Bob

6. BYE

1.2.3.4

Figure 2.12 Session invitation

User Joe uses address sip:bob@b.com to call Bob. Joe's user agent does not know how to route

the invitation itself but it is configured to send all outbound traffic to the company SIP Proxy

Server proxy.a.com.The proxy server figures out that user sip:bob@b.com is in a different

company so it will look up B's SIP Proxy Server and send the invitation there. B's proxy server

can be either pre-configured at proxy.a.com or the proxy will use DNS SRV records to find B's

proxy server.The invitation reaches proxy.bo.com.The proxy knows that Bob is currently sitting

in his office and is reachable through phone on his desk, which has IP address 1.2.3.4, so the

proxy will send the invitation there.

2.2.2.2.3 Registrar

Its has been mentioned that the SIP Proxy at proxy.b.com knows current Bob's location but

have not mentioned yet how a proxy can learn current location of a user. Bob's user agent (SIP

phone) must register with a registrar.The registrar is a special SIP entity that receives registrations

from users, extracts information about their current location (IP address, port and username in

this case) and stores the information into a location database.The purpose of the location database

is to map sip:bob@b.com to something like sip:bob@1.2.3.4:5060.The location database is

then used by B's proxy server.When the proxy receives an invitation for sip:bob@b.com it will

search the location database. It finds sip:bob@1.2.3.4:5060 and will send the invitation there.

A registrar is very often a logical entity only. Because of their tight coupling with proxies,

registrars are usually co-located with proxy servers.

P.34

[IP Telephony Cookbook] / Technological Background

Figure 2.13 shows a typical SIP registration. A REGISTER message containing Address of

Record sip:jan@iptel.org and contact address sip:jan@1.2.3.4:5060 where 1.2.3.4 is IP

address of the phone is sent to the registrar.The registrar extracts this information and stores it

into the location database. If everything went well then the registrar sends a 200 OK response to

the phone and the process of registration is finished.

Location Database

Record in Location Database

User Agent

Registrar

Location Database

User sip:jan@iptel.org is

reachable at sip:jan@1.2.3.4:5060

REGISTER

Store Location

2. STORE

200 OK

1. REGISTER

sip:jan@iptel.org

1.2.3.4:5060

3. 200 OK

Registrar

Figure 2.13 Overview of Registrar

Each registration has a limited life span.The expires header field or the expires parameter of the

contact header field determines for how long the registration is valid.The user agent must refresh

the registration within the life span. Otherwise it will expire and the user will become

unavailable.

2.2.2.2.4 Redirect server

The entity that receives a request and sends back a reply containing a list of the current location

of a particular user is called redirect server. A redirect server receives requests and looks up the

intended recipient of the request in the location database, created by a registrar. It then creates a

list of current locations of the user and sends it to the request originator in a response within SIP

3xx redirection responses class.

The originator of the request then extracts the list of destinations and sends another request

directly to them. Figure 2.14 shows a typical redirection.

P.35

[IP Telephony Cookbook] / Technological Background

Redirect Server

INVITE #1

302 Moved Temporarily

INVITE #2

User Agent A

User Agent B

Figure 2.14 SIP Redirection

{ 2.2.2.3 SIP messages

Communication using SIP (often called signalling) is comprised of a series of messages. Messages

can be transported independently by the network. Usually they are each transported in a separate

UDP datagram. Each message consists of a `first line', a message header and a message body.The

first line identifies type of the message.There are two types of messages: requests and responses.

Requests are usually used to initiate some action or inform the recipient of the request of

something. Replies are used to confirm that a request was received and processed and contain the

status of the processing.

A typical SIP request looks like this:

INVITE sip:7170@iptel.org SIP/2.0

Via: SIP/2.0/UDP 195.37.77.100:5040;rport

Max-Forwards: 10

From: "jiri" <sip:jiri@iptel.org>;tag=76ff7a07-c091-4192-84a0-

d56e91fe104f

To: <sip:jiri@bat.iptel.org>

Call-ID: d10815e0-bf17-4afa-8412-d9130a793d96@213.20.128.35

CSeq: 2 INVITE

Contact: <sip:213.20.128.35:9315>

User-Agent: Windows RTC/1.0

Proxy-Authorisation: Digest username="jiri", realm="iptel.org",

algorithm="MD5", uri="sip:jiri@bat.iptel.org",

nonce="3cef753900000001771328f5ae1b8b7f0d742da1feb5753c",

response="53fe98db10e1074

b03b3e06438bda70f"

Content-Type: application/sdp

Content-Length: 451

v=0

o=jku2 0 0 IN IP4 213.20.128.35

s=session

P.36

[IP Telephony Cookbook] / Technological Background

c=IN IP4 213.20.128.35

b=CT:1000

t=0 0

m=audio 54742 RTP/AVP 97 111 112 6 0 8 4 5 3 101

a=rtpmap:97 red/8000

a=rtpmap:111 SIREN/16000

a=fmtp:111 bitrate=16000

a=rtpmap:112 G7221/16000

a=fmtp:112 bitrate=24000

a=rtpmap:6 DVI4/16000

a=rtpmap:0 PCMU/8000

a=rtpmap:4 G723/8000

a=rtpmap: 3 GSM/8000

a=rtpmap:101 telephone-event/8000

a=fmtp:101 0-16

The first line tells us that this is an INVITE message which is used to establish a session.The URI

on the first line, sip:7170@iptel.org is called Request URI and contains the URI of the next

hop of the message. In this case, it will be host iptel.org.

A SIP request can contain one or more Via header fields which are used to record path of the

request.They are later used to route SIP responses exactly the same way.The INVITE message

contains just one Via header field which was created by the user agent that sent the request. From

the Via field we can tell that the user agent is running on host 195.37.77.100 and port 5060.

The From and To header fields identify initiator (calling party) and recipient (called party) of the

invitation (just like in SMTP where they identify sender and recipient of a message).

The From header field contains a tag parameter which serves as a dialogue identifier and will be

described in Section 2.2.2.5.

The Call-ID header field is a dialogue identifier and its purpose is to identify messages belonging

to the same call. Such messages have the same Call-ID identifier. CSeq is used to maintain order

of requests. Because requests can be sent over an unreliable transport that can re-order messages,

sequence numbers must be present in the messages so that recipient can identify re-transmissions

and out-of-order requests.

The Contact header field contains the IP address and port on which the sender is awaiting

further requests sent by called party. Other header fields are not important and will be not

described here.

The Message header is delimited from message body by an empty line.The Message body of

the INVITE request contains a description of the media type accepted by the sender and encoded

in SDP.

2.2.2.3.1. SIP requests

An INVITE request has been described.The request is used to invite a called party to a session.

Other important requests are:

P.37

[IP Telephony Cookbook] / Technological Background

- ACK

This message acknowledges receipt of a final response to INVITE. Establishing of a session

utilises 3-way hand-shaking due to asymmetric nature of the invitation. It may take a while

before the called party accepts or declines the call so the called party's user agent periodically

re-transmits a positive final response until it receives an ACK (which indicates that the calling

party is still there and ready to communicate);

- BYE

BYE messages are used to tear down multimedia sessions. A party wishing to tear down a

session sends a BYE to the other party;

- CANCEL

CANCEL is used to cancel a not yet fully-established session. It is used when the called party

has not replied with a final response yet but the calling party wants to abort the call (typically

when a called party does not respond for some time);

- REGISTER

The purpose of REGISTER is to let the registrar know of current user's location. Information

about the current IP address and port on which a user can be reached is carried in REGISTER

messages. Registrar extracts this information and puts it into a location database.The database

can be later used by SIP Proxy Servers to route calls to the user. Registrations are time-limited

and need to be periodically refreshed.

The listed requests usually have no message body because it is not needed in most situations (but

can have one). In addition, many other request-types have been defined but their descriptions are

out of the scope of this document.

2.2.2.3.2 SIP responses

When a user agent or proxy server receives a request, it sends a reply. Each request must be replied

to except ACK requests which trigger no replies.

A typical reply looks like this:

SIP/2.0 200 OK

Via: SIP/2.0/UDP 192.168.1.30:5060;received=66.87.48.68

From: sip:sip2@iptel.org

To: sip:sip2@iptel.org;tag=794fe65c16edfdf45da4fc39a5d2867c.b713

Call-ID: 2443936363@192.168.1.30

CSeq: 63629 REGISTER

Contact: <sip:sip2@66.87.48.68:5060;transport=udp>;q=0.00;expires=120

Server: Sip EXpress router (0.8.11pre21xrc (i386/linux))

Content-Length: 0

Warning: 392 195.37.77.101:5060 "Noisy feedback tells:

pid=5110 req_src_ip=66.87.48.68 req_src_port=5060

in_uri=sip:iptel.org

out_uri=sip:iptel.org via_cnt==1"

Responses are very similar to the requests, except for the first line.The first line of response

contains a protocol version (SIP/2.0) reply code and reason phrase.The reply code is an integer

number from 100 to 699 and indicates type of the response.There are 6 classes of responses:

P.38

[IP Telephony Cookbook] / Technological Background

1xx are provisional responses. A provisional response is a response that tells to its recipient that

the associated request was received but the result of the processing is not known yet. Provisional

responses are sent only when the processing does not finish immediately.The sender must stop

re-transmitting the request upon reception of a provisional response.

Typically, proxy servers send responses with code 100 when they start processing an INVITE and

user agents send responses with code 180 (Ringing) which means that the called party's phone is

ringing.

2xx responses are positive final responses. A final response is the ultimate response that the

originator of the request will ever receive.Therefore, final responses express the result of the

processing of the associated request. Final responses also terminate transactions. Responses with

code from 200 to 299 are positive responses.That means that the request was processed

successfully and accepted. For instance, a 200 OK response is sent when a user accepts the

invitation to a session (INVITE request).

A UAC may receive several 200 messages to a single INVITE request.This is because a forking

proxy (described later) can fork the request so it will reach several UAS and each of them will

accept the invitation. In this case, each response is distinguished by the tag parameter in the To

header field. Each response represents a distinct dialogue with an unambiguous dialogue identifier:

- 3xx responses are used to redirect a calling party. A redirection response gives information about

the user's new location or an alternative service that the calling party might use to satisfy the

call. Redirection responses are usually sent by proxy servers.When a proxy receives a request

and does not want or can't process it for any reason, it will send a redirection response to the

calling party and put another location into the response which the calling party might want to

try. It can be the location of another proxy or the current location of the called party (from the

location database created by a registrar).The calling party is then supposed to re-send the

request to the new location. 3xx responses are final;

- 4xx are negative final responses. A 4xx response means that the problem is on the sender's side.

The request could not be processed because it contains bad syntax or cannot be fulfilled at that

server.

- 5xx means that the problem is on server's side.The request is apparently valid but the server

failed to fulfil it. Clients should usually retry the request later;

- 6xx reply code means that the request cannot be fulfilled at any server.This response is usually

sent by a server that has definitive information about a particular user. User agents usually send

a 603 Decline response when the user does not want to participate in the session.

In addition to the response class, the first line also contains the reason phrase.The code number is

intended to be processed by machines. It is not very human-friendly but it is very easy to parse

and understand by machines.The reason phrase usually contains a human-readable message

describing the result of the processing. A user agent should render the reason phrase to the user.

The request to which a particular response belongs is identified using the CSeq header field. In

addition to the sequence number, this header field also contains the method of corresponding

request. In our example it was a REGISTER request.

P.39

[IP Telephony Cookbook] / Technological Background

{ 2.2.2.4. SIP transactions

Although we said that SIP messages are sent independently over the network, they are usually

arranged into transactions by user agents and certain types of proxy servers.Therefore SIP is said

to be a transactional protocol.

A transaction is a sequence of SIP messages exchanged between SIP network elements. A

transaction consists of one request and all responses to that request.That includes zero or more

provisional responses and one or more final responses (remember that an INVITE might be

answered by more than one final response when a proxy server forks the request).

If a transaction was initiated by an INVITE request, then the same transaction also includes ACK,

but only if the final response was not a 2xx response. If the final response was a 2xx response, then

the ACK is not considered part of the transaction.

As we can see, this is quite asymmetric behaviour, ACK is part of transactions with a negative final

response but is not part of transactions with positive final responses.The reason for this separation

is the importance of delivery of all 200 OK messages. Not only do they establish a session, but

also 200 OK can be generated by multiple entities when a proxy server forks the request and all

of them must be delivered to the calling user agent.Therefore, user agents take responsibility in

this case and retransmit 200 OK responses until they receive an ACK. Also note that only

responses to INVITE are retransmitted.

SIP entities that have a notion of transactions are called stateful. Such entities usually create a state

associated with a transaction that is kept in the memory for the duration of the transaction.When

a request or response comes, a stateful entity tries to associate the request (or response) to existing

transactions.To be able to do this, it must extract a unique transaction identifier from the message

and compare it to identifiers of all existing transactions. If such a transaction exists, then its state

gets updated from the message.

In the previous SIP RFC2543, the transaction identifier was calculated as hash of all important

message header fields (that included To, From, Request-URI and CSeq).This proved to be very

slow and complex. During interoperability tests, such transaction identifiers were a common

source of problems.

In the new RFC3261, the way of calculating transaction identifiers was completely changed.

Instead of the complicated hashing of important header fields, a SIP message now includes the

identifier directly.The branch parameter of Via header fields directly contains the transaction

identifier.This is a significant simplification, but there still exist old implementations that do not

support the new way of calculating of the transaction identifier, so even new implementations

have to support the old way.They must be backwards-compatible.

Figure 2.15 shows what messages belong to what transactions during a conversation of two user

agents.

P.40

[IP Telephony Cookbook] / Technological Background

Called party

Calling party

INVITE

100 Trying

Transaction #1

180 Ringing

200 OK

ACK

BYE

200 OK

Transaction #2

Figure 2.15 SIP transactions

{ 2.2.2.5 SIP Dialogues

It has been shown what transactions are, that one transaction includes INVITE and its responses

and another transaction includes BYE and its responses when a session is being torn down.Those

two transactions should be somehow related-both of them belong to the same dialogue. A

dialogue represents a peer-to-peer SIP relationship between two user agents. A dialogue persists

for some time and it is very important concept for user agents. Dialogues facilitate the proper

sequencing and routing of messages between SIP endpoints.

Dialogues are identified using Call-ID, From tag, and To tag. Messages that belong to the same

dialogue must have these fields equal.We have shown that CSeq header field is used to order

messages. In fact, it is used to order messages within a dialogue.The number must be

monotonically increased for each message sent within a dialogue. Otherwise the peer will handle

it as an out-of-order request or retransmission. In fact, the CSeq number identifies a transaction

within a dialogue, because we have said that requests and associated responses are called

transactions.This means that only one transaction in each direction can be active within a

dialogue. One could also say that a dialogue is a sequence of transactions. Figure 2.16 extends

Figure 2.15 to show which messages belong to the same dialogue.

P.41

[IP Telephony Cookbook] / Technological Background

Called party

Calling party

INVITE

100 Trying

Transaction #1

180 Ringing

200 OK

Dialog

ACK

BYE

200 OK

Transaction #2

Figure 2.16 SIP dialogue

Some messages establish a dialogue and some do not.This is used to explicitly express the relation-

ship of messages and also to send messages that are not related to other messages outside a dialogue.

That is easier to implement because user agents do not have to maintain the dialogue state.

For instance, an INVITE message establishes a dialogue, because it will later be followed by a

BYE request, which will tear down the session established by the INVITE.This BYE is sent

within the dialogue established by the INVITE.

But, if a user agent sends a MESSAGE request, such a request does not establish any dialogue. Any

subsequent messages (even MESSAGE) will be sent independently of the previous one.

2.2.2.5.1. Dialogues facilitate routing

Dialogues are also used to route the messages between user agents, as described briefly.

Suppose that user sip:bob@a.com wants to talk to user sip:pete@b.com. He knows the SIP

address of the called party (sip:pete@b.com) but this address does not say anything about

current location of the user, i.e., the calling party does not know to which host to send the

request.Therefore, the INVITE request will be sent to a proxy server.

The request will be sent from proxy to proxy until it reaches one that knows the current location

of the called party.This process is called routing. Once the request reaches the called party,

the called party's user agent will create a response that will be sent back to the calling party.

The called party's user agent will also put a contact header field into the response which will

contain the current location of the user.The original request also contained a contact header field

which means that both user agents know the current location of the peer.

P.42

[IP Telephony Cookbook] / Technological Background

Because the user agents know the location of each other, it is not necessary to send further

requests to any proxy.They can be sent directly from user agent to user agent.That is exactly how

dialogues facilitate routing.

Further messages within a dialogue are sent directly from user agent to user agent.This is a

significant performance improvement because proxies do not see all the messages within a

dialogue.They are used to route just the first request that establishes the dialogue.The direct

messages are also delivered with much smaller latency because a typical proxy usually implements

complex routing logic. Figure 2.17 contains an example of a message within a dialogue (BYE)

that bypasses the proxies.

Proxy 1

Proxy 2

INVITE

BYE

Calling party

Called party

Figure 2.17 SIP trapezoid

2.2.2.5.2 Dialogue identifiers

Dialogue identifiers consist of three parts, Call-Id, From tag and To tag, but it is not that clear

why dialogue identifiers are created exactly this way and who contributes which part.

Call-ID is called call identifier. It must be a unique string that identifies a call. A call consists of

one or more dialogues. Multiple user agents may respond to a request when a proxy along the

path forks the request. Each user agent that sends a 2xx response, establishes a separate dialogue

with the calling party. All such dialogues are part of the same call and have the same Call-ID.

A From tag is generated by the calling party and it uniquely identifies the dialogue in the calling

party's user agent.

A To tag is generated by a called party and uniquely identifies it, just like the From tag is the

dialogue in the called party's user agent.

This hierarchical dialogue identifier is necessary because a single call-invitation can create several

dialogues and the calling party must be able to distinguish them.

{ 2.2.2.6 Typical SIP scenarios

This section gives a brief overview of typical SIP scenarios that usually make up the SIP traffic.

P.43

[IP Telephony Cookbook] / Technological Background

2.2.2.6.1 Registration

Users must register themselves with a registrar to be reachable by other users. A registration

comprises a REGISTER message followed by a 200 OK sent by the registrar if the registration

was successful. Registrations are usually authorised so a 407 reply which can appear if the user did

not provide valid credentials. Figure 2.18 shows an example of a registration.

Registrar

User Agent

REGISTER

w/o credentials

407

REGISTER

w/ credentials

200 OK

Figure 2.18 REGISTER message flow

2.2.2.6.2 Session invitation

A session invitation consists of one INVITE request which is usually sent to a proxy.The proxy

sends immediately a 100 Trying reply to stop re-transmissions and forwards the request further.

All provisional responses generated by the called party are sent back to the calling party. See teh

180 Ringing response in the call flow.The response is generated when the called party's phone

starts ringing.

A 200 OK is generated once the called party picks up the phone and it is re-transmitted by the

called party's user agent until it receives an ACK from the calling party.The session is established

at this point.

2.2.2.6.3 Session termination

Session termination is accomplished by sending a BYE request within the dialogue established by

INVITE. BYE messages are sent directly from one user agent to the other, unless a proxy on the

path of the INVITE request has indicated that it wishes to stay on the path by using record

routing (see Section 2.2.2.6.4).

A party wishing to tear down a session sends a BYE request to the other party involved in the

session.The other party sends a 200 OK response to confirm the BYE and the session is

terminated. See Figure 2.20, left message flow.

P.44

[IP Telephony Cookbook] / Technological Background

Calling party

SIP Proxy

Called party

INVITE

100 Trying

INVITE

100 Trying

180 Ringing

200 OK

ACK

RTP Streams

Figure 2.19 INVITE message flow

2.2.2.6.4 Record routing

All requests sent within a dialogue are, by default, sent directly from one user agent to the other.

Only requests outside a dialogue traverse SIP proxies.This approach makes a SIP network more

scalable because only a small number of SIP messages hit the proxies.

There are certain situations in which a SIP Proxy needs to stay on the path of all further

messages. For instance, proxies controlling a NAT box, or proxies doing accounting need to stay

on the path of BYE requests.

The mechanism by which a proxy can inform user agents that it wishes to stay on the path of all

further messages is called record routing. Such a proxy would insert a Record-Route header

field into SIP messages which contain address of the proxy. Messages sent within a dialogue will

then traverse all SIP proxies that put a Record-Route header field into the message.

The recipient of the request receives a set of Record-Route header fields in the message. It must

mirror all the Record-Route header fields into responses because the originator of the request

also needs to know the set of proxies.

P.45

[IP Telephony Cookbook] / Technological Background

Without record routing

With record routing

SIP Proxy

UA1

SIP Proxy

UA1

UA2

BYE

200 OK

BYE

200 OK

Figure 2.20 BYE message flow (with and without record routing)

The lefthand message flow of Figure 2.20 shows how a BYE (request within dialogue established

by INVITE) is sent directly to the other user agent when there is no Record-Route header field

in the message.The righthand message flow shows how the situation changes when the proxy

puts a Record-Route header field into the message.

2.2.2.6.5 Event subscription and notification

The SIP specification has been extended to support a general mechanism allowing subscription to

events. Such evens can include, SIP Proxy statistics changes to, presence information, session

changes and so on.

The mechanism is used mainly to convey information on presence (the willingness to

communicate) of users. Figure 2.21 shows the basic message flow.

Server

User Agent

SUBSCRIBE

200 OK

NOTIFY

200 OK

Event

NOTIFY

200 OK

Figure 2.21 Event subscription and notification

P.46

[IP Telephony Cookbook] / Technological Background

A user agent interested in an event notification sends a SUBSCRIBE message to a SIP server.The

SUBSCRIBE message establishes a dialogue and is immediately replied to by the server using a

200 OK response. At this point, the dialogue is established.The server sends a NOTIFY request to

the user every time the event to which the user subscribed changes. NOTIFY messages are sent

within the dialogue established by the SUBSCRIBE.

Note that the first NOTIFY message in Figure 2.21 is sent regardless of any event that triggers

notifications.

Subscriptions, as well as registrations, have a limited life span and therefore must be periodically

refreshed.

2.2.2.6.6 Instant messages

Instant messages are sent using a MESSAGE request. MESSAGE requests do not establish a

dialogue and therefore they will always traverse the same set of proxies.This is the simplest form

of sending instant messages.The text of the instant message is transported in the body of the SIP

request.

User Agent

Proxy

User Agent

MESSAGE

200 OK

MESSAGE

200 OK

Figure 2.22 Instant Messages

{ 2.2.3. Media Gateway Control Protocols

In a traditional telephone network, the infrastructure consists of large telephone switches which

interconnect with each other to create the backbone network and which also connect to

customer equipment (PBXs, telephones).While the internal network today is based upon digital

communication, links to customers may be either analogue (PSTN) or digital (ISDN).The links

to customers are shared between call signalling (for dialling, invocation of supplementary services,

etc.) and carriage of voice/data. In the backbone, dedicated (virtual) links interconnecting

switches are reserved for call signalling (de-facto creation of a dedicated network of its own)

whereas voice/data traffic is carried on separate links.The Signalling System No. 7 (SS7) or

P.47

[IP Telephony Cookbook] / Technological Background

variants of it are used as the call signalling protocol between switches; this protocol is used to

route voice/data channels across the backbone network by instructing each switch on the way

which incoming `line' is to be forwarded to which outgoing `line' and which other processing

(such as simple voice compression, in-band signalling detection to customer premise equipment,

etc.) is to be applied.Voice/data channels themselves are plain bit pipes identified by roughly a

trunk and line identifier at each switch.

Figure 2.23 Application scenario for Media Gateway Control Protocols

A similar construction is now considered by a number of telecom companies for IP-based

backbone networks that may successively replace parts of their overall switched-network

infrastructure, as depicted in Figure 3.7. Instead of voice switches, IP routers are used to build up a

backbone network which employs IP routing, possibly MPLS, and, most likely, some explicit form

of QoS support to carry voice and data packets from any point in the network to any other. In

contrast to voice switches, this does not require explicit configuration of the individual routers

per voice connection. Instead, only the entry and exit points need to be configured with each

others' addresses, so that they know where to send their voice/data packets.Two types of gateways

are used at the edges of the IP network to connect to the conventional telephone network:

signalling gateways to convert SS7 signalling into IP-based call control (which may make use of

H.323 or SIP or simply provide a transport to carry SS7 signalling in IP packets [SIGTRAN])

and media gateways that perform voice transcoding. Some central entity (or more probably, a

number of co-operating entities) forms the intelligent core of the backbone, the Media Gateway

Controller(s).They interpret call signalling and decide how to route calls and they provide

supplementary services, etc. Having decided on how a call is to be established, they inform the

(largely passive and `dumb') media gateways at the edges (ingress and egress gateways) how and

where to transmit the voice packets.The Media Gateway Controllers also re-configure the

gateways in case of any changes in the call, invocation of supplementary services, etc.The media

gateways may be capable of detecting invocation of control features in the media channel (e.g.,

through DTMF tones) and notify the Media Gateway Controller(s), which then initiate the

appropriate actions.

A number of protocols have been defined for communication between Media Gateway

Controllers and media gateways. Initial versions were developed by multiple camps, some of

which merged to create the Media Gateway Control Protocol (MGCP), the only one of the

proprietary protocols that is documented as an Informational RFC (RFC 2705). An effort was

launched to make the two remaining camps cooperate and develop a single protocol to be

standardised, which resulted in work groups in the ITU-T (rooted in Study Group 16, Q.14) and

P.48

[IP Telephony Cookbook] / Technological Background

in the IETF (Media Gateway Control, MEGACO WG).The protocol being jointly developed is

referred to as H.248 in the ITU-T and as MEGACO in the IETF.

One particular protocol extension currently discussed in the IETF is the definition of a protocol

for communication with an IP telephone at the customer premises that fits seamlessly with the

Media Gateway Control architecture. Such a telephone would be a rather simple entity, essentially

capable of transmitting and receiving events and reacting to them, while the call services are

provided directly by the network infrastructure.

{ 2.2.4 Proprietary signalling protocols

Today nearly every vendor that offers VoIP products uses his own VoIP protocol, e.g., Cisco's

Skinny or Siemens's CorNet.They were invented by the vendors to be able to provide more

specific supplementary services in the Voice over IP world, in order to offer customers all the

features they already know from their classic PBX.The enterprise solutions usually feature such

proprietary protocols at the cure and provide minimalist support for standardised protocols (until

now usually H.323) with only basic call functionality.

Giving detailed information about those protocols is out of the scope of this document and is

usually difficult to provide because most protocols are not publicly available.

{ 2.2.5. Real Time Protocol (RTP) and Real Time Control Protocol (RTCP)

RTP and RTCP are the transport protocols used for IP Telephony media streams. Both of them

were defined in RFC1889: the former as a protocol to carry data that has real-time properties, the

latter to monitor the quality of service and to convey information about the participants in on-

going session.The services provided by the RTP protocol are:

- identification of the carried information (audio and video codecs);

- checking packet in-order delivery and, if necessary, re-ordering the out-of-sequence blocks;

- transport of the coder/decoder synchronisation information;

- monitoring of the information delivery.

The RTP protocol uses the underlying User Datagram Protocol (UDP) to manage multiple

connections between two entities and to check for data integrity (checksum). An important

point to stress is that RTP neither provides any means to have a guaranteed QoS nor assumes the

underlying network delivers ordered packets.

The RTCP protocol uses the same protocols as RTP to periodically send control packets to all

session participants. Every RTP channel using port number N has its own RTCP protocol channel

with port number equal to N+1.The services provided by the RTCP are:

- giving a feedback on the data quality distribution, feedback used to keep control of the active

codecs;

- transporting a constant identifier for the RTP source (CNAME), used by the video data;

- advertising the number of session participants which is used to adjust the RTP data transmission

rate;

- carrying session control information used to identify the session participants.

P.49

[IP Telephony Cookbook] / Technological Background

The next two subsections describe the RTP and RTCP header and the different types of packets

that the two protocols use.

{ 2.2.5.1 RTP header

Figure 2.24 shows the RTP header.The first twelve bytes are present in all of the RTP packets.

The last bytes, containing the CSRC (Contributing SouRCe) identifiers list, is present only when

a mixer is crossed (mixer refers to a system which receives two or more RTP flows, combines

them and forwards the resulting flow).

Figure 2.24 RTP header

The header fields are here detailed:

- version (V - 2 bits) contains the RTP protocol version;

- padding (P - 1 bit), if set to 1, then the packet contains one or more additional bytes after the

data field;

- extension (X - 1 bit), if set to 1, then the header is followed by an extension;

- CSRC count (CC - 4 bits) contains the CSRC identifier number which follows the header;

- marker (M - 1 bit) is the application available field;

- payload type (PT - 7 bits) identifies the data field format of the RTP packet and determines its

interpretation by the application;

- sequence number (16 bits) value incremented by one for each RTP packet sent, is used by the

receiver to detect losses and to determine the right sequence;

- RTP timestamp (32 bits) is the sampling time of the first RTP byte, used for synchronisation

and jitter calculation;

- SSRC ID (32 bits) identifies the synchronisation source, chosen randomly within a RTP

session;

- CSRC ID list (from 0 to 15*32 bits) is an optional field identifying the sources which

contribute to the data in the packet.The number of the CSRC IDs is written in the CSRC

count field.

{ 2.2.5.2 RTCP packet-types and format

In order to transport the session control information, the RTCP foresees a number of packet-types:

- SR, Sender Report, to carry the information sent by the transmitters, to give notice to the

other participants on the control information they should receive (number of bytes, number of

packets, etc.);

P.50

[IP Telephony Cookbook] / Technological Background

- RR, Receiver Report, to carry the statistics of the session participants which are not active

transmitters;

- SDES, Source DESscription, to carry the session description (including the CNAME

identifier);

- BYE, to notify the intention of leaving the session;

- AAP, to carry application specific functions, used by experimental use of new applications.

Every RTCP packet begins with a fixed part similar to the one of the RTP ones, and this part is

then followed by structural elements of variable length. More than one RTCP packet may be

linked together to build a COMPOUND PACKET. Moreover, in order to maximise the

statistics resolution, the SR and the RR packet-types are to be sent more often than the other

packet-types.

P.51

Table of Contents: