This chapter introduces the SOAP protocol, describes the SOAP object model, and discusses the serialization of data.

This pre-publication chapter is from Applied SOAP: Implementing .NET Web Services, by Kenn Scribner and Mark Stiver (0672321114). Content is based on Beta2 version of Microsoft's .NET technology.

As you've learned in Chapter 1, "Web Service Fundamentals,"
SOAP is a critical technological component in the .NET Web Service scheme. SOAP
in general defines a mechanism for encoding information into an XML wrapper. For
Web Services, SOAP gets a bit more specific when it facilitates the mappings
between method signatures and the XML document.

The basic idea behind the use of the SOAP protocol is to interpret a remote
method's parameter values at runtime and stuff those values into an XML
document, at least as SOAP was originally envisioned. The XML data is then
transported to the remote server using the HTTP protocol (other transport
protocols are also used, albeit currently outside.NET). If you use SOAP in this
manner, you are using the SOAP protocol as an implementation of a more general
concept, the invocation of some remote method (implemented as a Web Service).
The Remote Procedure Call (RPC) protocol has the same objective: to carry local
computer information to a remote computer, even information that might not make
sense to a remote system without some conversion (for example, addresses of data
to their textual equivalents). This allows the remote computer to execute the
remote method on your behalf and return a result.

In this chapter, you'll explore SOAP in some detail to see how it acts
as a messaging protocol. To SOAP, the concept of an RPC protocol has a specific
meaning. .NET, however, actually uses the SOAP protocol in a dual fashion. It
uses SOAP to carry the information back and forth as SOAP messages or as true
SOAP RPC calls, depending on how you configure your Web method.

After learning why SOAP is quickly becoming so successful in the industry,
you'll dive into the protocol itself to see specifically how SOAP carries
remote method information to and from a Web Service. You'll also learn how
.NET employs the protocol.

Why Is SOAP Needed?

This innocent-looking question is actually a very good one to ask. RPC
protocols grew from research in the mid-1980s that had roots all the way back to
Tim Berners-Lee and his description of TCP/IP —and even the invention of
Ethernet itself, circa 1970 by Bob Metcalfe, later of 3Com fame. The concept of
distributed computing dates back farther than that. The creators of the ENIAC
envisioned that one day large numbers of computers would be linked to solve very
complex problems.

In the case of RPC and SOAP, the distributed computing issue is simply one of
consuming resources on a remote computer as if the remote computer and the
calling (local) machine were the same machine. The goal is to seamlessly tie the
distributed systems together so that when you call a given method, you
don't know (and presumably don't care) whether the call is actually
handled by a remote system. In real-world situations, you very often do care, if
only because the call latency is greatly increased. It simply takes longer for
the method to complete its task. For the purposes of discussing SOAP as a
protocol, however, let's ignore those issues and imagine that SOAP, as an
RPC protocol actually does seamlessly integrate distributed systems. In critical
cases when this model breaks down, you'll find the issue noted in the
text.

Why Do You Need to Understand SOAP?

This is also a very good question. After all, given the power of .NET, should
you be concerned about underlying protocols .NET uses? As an analogy, when you
bring up your e-mail client, do you care how your email is sent or how
attachments to your email messages are encoded?

You could make an excellent argument that you don't need to understand
SOAP to program Web Services today. .NET handles the details for you. You write
the code that .NET requires to handle the Web Service, or the invocation of the
Web Service, if you're writing client-side code. Then .NET takes care of
the serialization aspects as well as the transmission of the information back
and forth. As you'll see in Chapter 5, "Web Services and Description
and Discovery," you don't require the more complex aspects of the SOAP
protocol because you tell the world what your packets look like using the Web
Service Description Language (WSDL).

Those complex SOAP encoding practices were required when client and server
had to agree on a protocol using static code. WSDL allows for dynamic packet
layout and description—at least, from an early binding
perspective—rendering the deeper encoding structures less necessary and
even obsolete. This is so because the RPC style of encoding is going out of
fashion in favor of the wider range of encoding possibilities that WSDL
document/literal encoding offers. You are free to encode information as you see
fit rather than blindly follow the SOAP specification itself, at least as far as
Section 5 is concerned.

NOTE

In this case, the term early binding refers to tying the client code
to the Web Service when the code is compiled. With .NET, the WSDL is read to
create proxy source code that is compiled into the client. Dynamic proxy
generation, to be used for truly late bound Web Services at runtime, is
certainly a possibility but is not currently supported by .NET. It is an
alternative offered by the SOAP Toolkit, however, if you require this
capability.

But if you really examine what you intend to do, the "why do I
care" fac[cd]ade breaks down. Returning to our analogy, an email consumer
doesn't need to understand the details associated with the Internet email
protocols—or email attachment encoding, for that matter. But developers
need to understand these protocols to write code that uses them directly.
Blindly trusting infrastructure might get you 80% of the features you require.
After all, the infrastructure was designed to satisfy the needs of the general
populace. That other 20% or so requires the true ingenuity that comes from
understanding the lower levels of the technology.

This chapter won't make you an expert on SOAP, but it will give you the
understanding that you'll require to write professional grade Web Services
and clients. So back to the initial question—why SOAP? Let's explain
it in this way ....

The SOAP Advantage

Probably the best-known RPC protocol is DCE-RPC, which is the Distributed
Computing Environment's implementation. Many Unix environments use DCE-RPC,
as does Microsoft Windows (which modified it slightly to handle object
references across machines to support DCOM). DCE-RPC requires the use of a port
mapper, which you'll find listening (monitoring network traffic) on TCP/UDP
Port 135. Whenever you want to access a remote computer using DCE-RPC, you
access the remote system's port mapper and request a socket address. The
actual distributed communication then takes place over the assigned port.

The issue here is actually one of security, when your business-critical
servers are safely tucked behind a firewall. For DCE-RPC to work, not only do
you have to open Port 135 to the world for port-mapping purposes, but you also
need to have a range of other socket addresses available for the general public
to use for RPC communications. This very often leads to an opening through which
some 13-year-old will ruin your crucial data as well as your day. So it
isn't surprising to find nearly every business IT guru locks Port 135 and
almost every other port. The one universal exception is Port 80, or Port 443 for
secure sockets.

Port 80 is the network socket port used by HTTP, at least as it is nominally
configured (Port 8080 is often used to manage the Web server, and HTTP is also
spoken there). As you probably already know, the Hypertext Transfer Protocol
(HTTP), is the lower-level network protocol used to shuttle Hypertext Markup
Language (HTML) documents around the Internet. HTTP is the transport protocol
for Web pages, and because you can bet that practically every corporate vice
president likes to surf the Net, you'll probably find Port 80 open through
any firewall you'll likely encounter.

This is SOAP's secret weapon and one of the sources of its power.
It's almost unheard of to find someone blocking Port 80 with a corporate
firewall, so SOAP (as bound to HTTP) should pass through corporate firewalls
untouched.

The other source of SOAP's power is the fact that the information
transported by the HTTP protocol is actually XML (which is why Chapter 3,
"Web Services and XML," dealt so heavily with XML within the .NET
Framework). To be more specific, the content-type of the HTTP packet is
text/xml. The remainder of this chapter is dedicated to uncovering the
XML format that SOAP uses to serialize method parameter information, starting
with the SOAP XML object model.