How Do I Get Rid of Extraneous Spaces in a String?

I know that I can use the Split command to separate a sentence like this - VBScript is fun! - into an array of individual words. However, what happens if I have a sentence like this: VBScript is fun! I can’t use a blank space as the delimiter because I might have lots of blank spaces. And I can’t use a specific number of blank spaces as the delimiter, because the number might vary. Any suggestions?

-- SC

Hey, SC. You might find this difficult to believe, but we Scripting Guys actually have standards, at least when it comes to this column. We get scores of questions each day, and we can’t respond to each one. So how do we decide which questions get published? Well, first we look for questions that seem reasonably easy to do. Here’s a hint: if your question includes lots of ands - “And I want the script to do this and I want the script to do that and ....” - well, then there’s much less chance it will find its way to the top of the pile. We realize that that’s a bit unfair, but for better or worse this column is a very small part of our daily responsibilities. Thus we can’t spend too much time on any one question.

The second thing we look for is mass appeal: will the answer to this question be of interest and of use to most of our audience. Again, that’s not always fair, but questions dealing with “mainstream” technologies will usually win out over more esoteric questions. Sorry.

To tell you the truth, though, we’re not sure if your script fits either category; we decided to answer it simply because we found it interesting. After all, how the heck do you get rid of all those extraneous spaces, and thus turn VBScript is fun! into VBScript is fun! ? Here’s the solution we came up with:

The problem here is not that we have spaces in the string; if that was the case, we could just use VBScript’s Replace function to get rid of all the spaces. However, we want a few spaces in the string; in fact, we want to separate each word with a space. We just want to get rid of all the extra spaces. In the end, we decided to go ahead and use the Replace function, replacing all multi-space instances with a single space. Thus if we found 7 consecutive blank spaces, we’d replace those with a single space.

Pretty simple, except for one problem. The Replace function needs a specific string to search for; you can’t just tell it, “OK, search for multi-space instances and replace them with a single space.” Instead, you have to use code similar to this, which replaces seven blank spaces with one space:

strStarter = Replace(strStarter, “ “, " ")

That’s great, except that we don’t know how many blank spaces to search for; on top of that, while we might have 7 consecutive blank spaces in the string, we might also have 5 consecutive blank spaces in that same string. How do we deal with that?

We dealt with it using this script. Here’s the code, which we’ll explain in a second:

What we ended up doing was saying, “OK, we have to search for strings consisting of blank spaces, but we don’t know how many blank spaces might be in one of these strings.” That was a bit of a stumbling block until we realized, hey, let’s say the string has a total of 37 characters in it. That means the most consecutive blank spaces we could find would be 37 (assuming the string was nothing but blank spaces). Therefore, we can start by searching for 37 consecutive blank spaces; if we find them, we replace them with a single blank space. We then search for 36 consecutive blank spaces, then 35, then 34. We continue working our way down until we search for 2 blank spaces, and replace those with a single space. At that point, we should have eliminated all the extraneous blank spaces.

Pretty slick, huh? And, surprisingly enough, it’s easy to do. Notice in our script we start by assigning a string to the variable strStarter. We then use this code to determine how many characters are in strStarter:

intStarter = Len(strStarter)

Again, let’s say there are 37 characters. What we need now is a loop that will start at 37 and work its way down to 2. And guess what? That’s exactly what this loop does:

The loop starts at the number of characters in the string, then uses the Step -1 parameter to work its way down to 2. What’s Step -1? Well, by default, a For Next loop works its way up one number at a time. For example, in this loop i starts off at 1, and keeps incrementing by 1 until it gets to 10:

For i = 1 to 10

In our loop, we’re starting with the big number (37), and working our way down to 2, decrementing the value of i by 1 each time. Make sense? And that’s what Step -1 does; it basically subtracts 1 each time the loop runs.

Inside the loop itself we do two things. First, we have to create a string consisting of i number of blank spaces. Fortunately, we can do that with one line of code thanks to the Space function:

strChars = Space(i)

Second, we need to see if the substring consisting of i number of blank spaces can be found anywhere in strStarter. If it can, we need to replace it with a single blank space. And that’s what this code does:

strStarter = Replace(strStarter, strChars, " ")

From there we continue looping until we’ve checked for 2 consecutive blank spaces. We then exit the loop, use the Split command to divide the string into an array of individual words, and then - just to show you that everything worked - we echo back those individual words.

Now, this isn’t totally foolproof; for example, what if you had a few extraneous spaces in front of the word VBScript? We didn’t bother with situations like that, but you can always use VBScript’s LTrim and RTrim functions to get rid of leading and trailing spaces. If you’d like more information about string manipulation, check out this portion of the Microsoft Windows 2000 Scripting Guide.

We also don’t know the context in which a string like VBScript is fun! might arise. We suspect you might encounter strings like this when trying to read a fixed-width log file. If that’s the case, the code above will work, but you might find it easier to use ADO (ActiveX Database Objects) to parse the file instead.

I know this is a really old question, but as I just came across it in a search, I think it's still relevant, and I thought I'd post a response.

Your solution definitely works well, but I thought I would point out that there's a more optimal way to do this, and arguably alot simpler. Instead of checking for all the possible different lengths of whitespace, just loop through the string and replace all instances of Space(2) with Space(1) until there are no more instances of Space(2).

<Code>

strStarter = "VBScript is fun!"

Do Until InStr(strStarter , Space(2)) = 0

strStarter = Replace(strStarter , Space(2), Space(1))

Loop

arrStarter = Split(strStarter, " ")

For Each strUnit in arrStarter

Wscript.Echo strUnit

Next

</Code>

Using this method and the above example, I was able to get rid of all extra Whitespace in only 5 iterations rather than the 52 iterations which your method required. Just some food for thought. Keep up the good work!