SIP is only involved in the signaling portion of a media communication session, primarily used to set up and terminate voice or video calls. SIP can be used to establish two-party ("unicast) or multiparty ("multicast) sessions. It also allows modification of existing calls. The modification can involve changing addresses or "ports, inviting more participants, and adding or deleting media streams. SIP has also found applications in messaging applications, such as instant messaging, and event subscription and notification.

SIP works in concert with several other protocols to specify the media format and coding, and the protocol for communicating the media once the call is set up. For call setup, the body of a SIP message contains a "Session Description Protocol (SDP) data unit, which specifies the media format, codec and media communication protocol. Voice and video media is typically specified to be communicated between the terminals using the "Real-time Transport Protocol (RTP) or "Secure Real-Time Transport Protocol (SRTP).[7][8]

SIP employs design elements similar to the HTTP request/response transaction model.[12] Each transaction consists of a client request that invokes a particular method or function on the server and at least one response. SIP reuses most of the header fields, encoding rules and status codes of HTTP, providing a readable text-based format.

SIP-enabled telephony networks often implement call processing features of "Signaling System 7 (SS7), although the two protocols themselves are very different. SS7 is a centralized protocol, characterized by a complex central network architecture and dumb endpoints (traditional telephone handsets). SIP is a "client-server protocol of equipotent peers. SIP features are implemented in the communicating endpoints, while many traditional SS7 architectures use the suite only between switching centers.

The network elements that use the Session Initiation Protocol for communication are called SIP user agents. Each user agent (UA) performs the function of a user agent client (UAC) when it is requesting a service function, and that of a user agent server (UAS) when responding to a request. Thus, any two SIP endpoints may in principle operate without any intervening SIP infrastructure. However, for network operational reasons, for provisioning public services to users, and for directory services, SIP defines several specific types of network server elements. Each of these service elements also communicates within the client-server model implemented in user agent clients and servers.

A user agent is a logical network end-point used to create or receive SIP messages. The user agent manages SIP sessions. As a client (UAC), it sends SIP requests, and as a server (UAS) it receives requests and returns a SIP response. Unlike other network protocols that fix the roles of client and server, e.g., in HTTP, in which a web browser only acts as a client, and never as a server, SIP requires both peers to implement both roles. The roles of UAC and UAS only last for the duration of a SIP transaction.[5]

A SIP phone is an "IP phone that implements client and server functions of a SIP user agent and provides the traditional call functions of a telephone, such as dial, answer, reject, call hold, and call transfer.[15][16] SIP phones may be implemented as a hardware device or as a "softphone. As vendors increasingly implement SIP as a standard telephony platform, the distinction between hardware-based and software-based SIP phones is blurred and SIP elements are implemented in the basic firmware functions of many IP-capable devices.

In SIP, as in HTTP, the "user agent may identify itself using a message header field (User-Agent), containing a text description of the software, hardware, or the product name. The user agent field is sent in request messages, which means that the receiving SIP server can evaluate this information to perform device-specific configuration or feature activation. Operators of SIP network elements sometimes store this information in customer account portals,[17] where it can be useful in diagnosing SIP compatibility problems or display of service status.

A proxy server is a network server with UAC and UAS components that functions as an intermediary entity for the purpose of performing requests on behalf of other network elements. A proxy server primarily plays the role of routing, meaning that its job is to ensure that a request is sent to another entity closer to the targeted user. Proxies are also useful for enforcing policy, such as for determining whether a user is allowed to make a call. A proxy interprets, and, if necessary, rewrites specific parts of a request message before forwarding it.

A registrar is a SIP endpoint that provides a location service. It accepts REGISTER requests, recording the address and other parameters from the user agent. For subsequent requests it provides an essential means to locate possible communication peers on the network. The location service links one or more "IP addresses to the SIP "URI of the registering agent. Multiple user agents may register for the same URI, with the result that all registered user agents receive the calls to the URI.

SIP registrars are logical elements, and are often co-located with SIP proxies. To improve network scalability, location services may instead be located with a redirect server.

A redirect server is a user agent server that generates 3xx (redirection) responses to requests it receives, directing the client to contact an alternate set of URIs. A redirect server allows proxy servers to direct SIP session invitations to external domains.

SIP is a text-based protocol with syntax similar to that of HTTP. There are two different types of SIP messages: requests and responses. The first line of a request has a method, defining the nature of the request, and a Request-URI, indicating where the request should be sent.[18] The first line of a response has a response code.

Example: User1’s UAC uses an Invite Client Transaction to send the initial INVITE (1) message. If no response is received after a timer controlled wait period the UAC may chose to terminate the transaction or retransmit the INVITE. Once a response is received, User1 is confident the INVITE was delivered reliably. User1’s UAC must then acknowledge the response. On delivery of the ACK (2) both sides of the transaction are complete. In this case, a dialog may have been established.[20]

SIP defines a transaction mechanism to control the exchanges between participants and deliver messages reliably. A transaction is a state of a session, which is controlled by various timers. Client transactions send requests and server transactions respond to those requests with one or more responses. The responses may include provisional responses with a response code in the form 1xx, and one or multiple final responses (2xx – 6xx).

Transactions are further categorized as either type Invite or type Non-Invite. Invite transactions differ in that they can establish a long-running conversation, referred to as a dialog in SIP, and so include an acknowledgment (ACK) of any non-failing final response, e.g., 200 OK.

Because of these transactional mechanisms, unreliable transport protocols, such as the "User Datagram Protocol (UDP), are sufficient for SIP operation.

The SIP developer community meets regularly at conferences organized by SIP Forum to test interoperability of SIP implementations.[21] The "TTCN-3 test specification language, developed by a task force at "ETSI (STF 196), is used for specifying conformance tests for SIP implementations.[22]

When developing SIP software or deploying a new SIP infrastructure, it is very important to test capability of servers and IP networks to handle certain call load: number of concurrent calls and number of calls per second. SIP performance tester software is used to simulate SIP and RTP traffic to see if the server and IP network are stable under the call load.[23] The software measures performance indicators like answer delay, "answer/seizure ratio, RTP "jitter and "packet loss, "round-trip delay time.

SIP trunking is a similar marketing term preferred for when the service is used to simplify a telecom infrastructure by sharing the carrier access circuit for voice, data, and Internet traffic while removing the need for "Primary Rate Interface (PRI) circuits.[24][25]

SIP-enabled video surveillance cameras can initiate calls to alert the operator of events, such as motion of objects in a protected area.

SIP is used in "audio over IP for "broadcasting applications where it provides an interoperable means for audio interfaces from different manufacturers to make connections with one another.[26]

SIP-I, or the Session Initiation Protocol with encapsulated "ISUP, is a protocol used to create, modify, and terminate communication sessions based on ISUP using SIP and IP networks. Services using SIP-I include voice, video telephony, fax and data. SIP-I and SIP-T[28] are two protocols with similar features, notably to allow ISUP messages to be transported over SIP networks. This preserves all of the detail available in the ISUP header, which is important as there are many country-specific variants of ISUP that have been implemented over the last 30 years, and it is not always possible to express all of the same detail using a native SIP message. SIP-I was defined by the "ITU-T, whereas SIP-T was defined via the "IETF "RFC route.[29]

Concerns about the security of calls via the public Internet have been addressed by encryption of the SIP protocol for "secure transmission. The URI scheme sips is used to mandate that each hop over which the request is forwarded up to the target domain must be secured with "Transport Layer Security (TLS). The last hop from the proxy of the target domain to the user agent has to be secured according to local policies. TLS protects against attackers who try to listen on the signaling link but it does not provide end-to-end security to prevent espionage and law enforcement interception, as the encryption is only hop-by-hop and every single intermediate proxy has to be trusted.

End-to-end security may also be achieved with secure tunneling and "IPsec, but most service providers that offer secure connections use TLS for securing signaling. The relationship between SIP (port 5060) and SIPS (port 5061), is similar to HTTP and HTTPS, and uses URIs in the form sips:user@example.com. The media streams, which occur on different connections to the signaling stream, may be encrypted with "SRTP. The key exchange for SRTP is performed with "SDES (RFC 4568), or with "ZRTP (RFC 6189), which can automatically upgrade RTP to SRTP using dynamic key exchange, and a verification phrase. One may also add a "MIKEY (RFC 3830) exchange to SIP to determine session keys for use with SRTP.