URLEncode

I use this functions to prepare POST strings with XML data. The first function URLEncode1 uses less memory but is slower than the second URLEncode2. Both functions return a CString and get a CString as the input parameter.

The demo project contains sample usage with execution time of presented functions.

Modifications

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

About the Author

Comments and Discussions

To work on a project recently I had to figure this stuff out.
From reading countless questions and requests there is apparently some confusion, probably do to the progressive manor in which the web standards were developed and standardized over time.

Then the terms "URL" and "encoding" are pretty ambiguous.
There is URL encoding for an actual address/location then there is encoding for forms data, commonly of the "application/x-www-form-urlencoded" type.
In scope there is related "URL", "URI", and "URN" (search and study these on the Wikipedia).

In either case certain text characters are not allowed; known as "unsafe", or "reserved".
These are "percent encoded". These "unsafe" characters are represented as a hex number by following '%' with a number. I.E. '$' (USA dollar sign) character must be encoded like this "%24".
Reference: http://en.wikipedia.org/wiki/Percent-encoding[^]

Unfortunately with mixed and some times ambiguous standards the ' ' (space) character can, be represented as a single '+' or percent encoded as %20. Apparently either way will work for URL location/addresses but for forms they need be of the '+' variety.

For Windows unfortunately most if not all (all that I tried anyhow) base Windows API functions like the WinINet API InternetCanonicalizeUrl() just do the percent encoding type. And if even then the results and manner are pretty inconsistent.
Probably coming from the issue that in actual URL location/address you don't want to encode the '/' path character as it's part of the address but then in forms data you do.

So in working with URL loc/address you want to encode them one way then with the forms parts you want to encode a slightly different way.

I verified these behaviors (September 2012) between the standard documents and the behavior of the MS C# faculties like "System.Web.HttpUtility.UrlEncode()" for HTML forms (although I won't be using C# my self).

The OP's code is a little broke. The isalnum() covers most of the "unreserved characters" but misses the four punctuation chars (see the Wikipedia page above for reference).
Then some people have attempted to fix this in some code here but then have one or more of the four characters wrong.
Also some of the code here incorrectly replaces the line feed chars ('\r' and '\n') with the space '+' encoding. Maybe it's "filtering" if anything if that was what was desired?
The standard says these should just be percent encoded (%0A and %0D).

Also using the term "Unicode aware" is ambiguous. In discussion one should say "UTF8".
Anything else like attempts at UTF-16 style are not standard and not used (for obvious reason like Endianness).
And as it states here http://w3techs.com/technologies/overview/character_encoding/all[^] statistically ~73% percent of web pages are UTF-8 and growing steadily each year.
The original code already covers this since any other character (in particular those in the 128 to 255 range) are correctly percent encoded.

The rules for encoding form data for "application/x-www-form-urlencoded" encoding (for both ASCII and the close cousin UTF-8) are:
1) If a character is unreserved just copy it.

AS one of learner,I want to add a dialog to verify whether a user is legal before Windows98's Explore working ,which is like NT's verification dialog.
But I don't know how to do,anyone can help me? Thanks!

I use presented functions in WinSock application. InternetCanonicalizeUrl function is from wininet.dll. I encode XML data and I need fast functions. My function is faster than the standard InternetCanonicalizeUrl.

when doing fast conversions like this I usually uses the _alloca() routine. It allocates memory on the stack instead of the heap, which usually makes it faster. And also when the function runs out of scope it automaticallu frees the stack and so the allocated item.