Question to the pros about deconstructing NSStrings

I'm new to the Mac Programming forum but I've been reading MacRumors for years.

My question is about how to take apart an NSString. I'm reading the Apple reference docs and there's some confusing stuff about NSRange, Scanners, blocks, and not to mention that all of the "substring" methods start with "range".

So what I'd like to do is look at my NSString and count the number of non-numerical characters. Then, I'd like to look at the first character of my NSString, determine if it's a number or not, make note of that, and move on to the next and so on, to the end of the string.

What NSObjects/methods/scanners/ranges should I be looking at to do that?

characterAtIndex: should be fine for your case, but Apple recommends (see WWDC 2011's "Advanced Text Processing" session) using the string enumeration method enumerateSubstringsInRangeptions:usingBlock: with NSStringEnumerationByComposedCharacterSequences as one option. The main reason is that using characterAtIndex: requires extra effort to correctly handle composed character sequences, which I guess is what you mean by character.

Seems to me that NSScanner would be the most efficient way to go. Just use the -scanUpToCharactersInSet: to find the first digit, then you could use one of the number-scanning methods (like -scanDecimal) to capture a numeric value, or -scanCharactersInSet: to collect the digits into a string. You could use -scanLocation with accumulating variables or a NSMutableIndexSet if you need to keep track of character counts. I have taken the attitude that the less you muck around directly with NSString contents, the better off you are.

characterAtIndex: should be fine for your case, but Apple recommends (see WWDC 2011's "Advanced Text Processing" session) using the string enumeration method enumerateSubstringsInRangeptions:usingBlock: with NSStringEnumerationByComposedCharacterSequences as one option. The main reason is that using characterAtIndex: requires extra effort to correctly handle composed character sequences, which I guess is what you mean by character.

Click to expand...

Actually, what is a composed character sequence? In my case I want to check for colons in a string of digits. This may be a dumb question, but what exactly does that string enumeration method do?

Seems to me that NSScanner would be the most efficient way to go. Just use the -scanUpToCharactersInSet: to find the first digit, then you could use one of the number-scanning methods (like -scanDecimal) to capture a numeric value, or -scanCharactersInSet: to collect the digits into a string. You could use -scanLocation with accumulating variables or a NSMutableIndexSet if you need to keep track of character counts. I have taken the attitude that the less you muck around directly with NSString contents, the better off you are.

Click to expand...

Does scanUpToCharactersInSet mean it will read the characters sequentially (into another string?) up until it hits my colon?

An accented character, for example á, could be encoded in two different ways in Unicode. One way would be encode the precomposed character 00E1 (Latin Small Letter A with acute). Another way would use the composed character sequence of 0061 (Latin Small Letter A) followed by 0301 (Combining Acute Accent). They are visually the same, but they are not equal.

Does scanUpToCharactersInSet mean it will read the characters sequentially (into another string?) up until it hits my colon?

Click to expand...

If you use the default numerics NSCharacterSet and -scanCharactersInSet:, it will fill the string with the characters it finds in the set until it runs into a character that is not in the set. Then you can get to the next numeric digit with -scanUpToCharactersInSet:, back and forth until you run out of string. Read the docs on those methods and on NSCharacterSet.

I think in a case like this to give you the best solution it would help to know what the end goal is. As you can see from previous posts there are a few ways to do this that have certain strengths. So what do you want to end up with at the end? And what do you want to do with it? So far I would vote for NSScanner as you're best bet.

Thanks for asking about that. I'm trying to work with video timecode. As far as I know the only way to input timecode is into a string with digits and colons. Once I have the string, I want to see what the timecode is. So I need to read the digits out of the string and discard the colons. When I need to display the timecode, I'll put the colons back in.

Reading over the suggestions so far, I realize that I don't know what a range or scanner is. Could someone be so kind as to explain? Especially the range. I've read the class references but I'm still clueless. Sounds like range is the actual memory range? Why would I ever want to know that just to figure out what kind of character it is? *head explodes*

I noticed in C# (unrelated I know) there are some nice properties like

char.IsDigit(char)
char.IsPunctuation(char)

Wow, how useful that would be right now! Is there anything that simple in Objective-C?

Well, if you are working with timecodes, it might be easier (and faster in the code) to just get the raw timecode data and convert it mathemagically to usable numbers. According to Wikipedia, timecodes are stored in BCD, meaning each byte is two decimal digits, which you can convert to an int or whatever with a little simple math

Code:

timeValueComponent = ( timeByte >> 4 ) * 10 + ( timeByte & 0x0F );

though the frame number might require more conversion. QuickTime can provide you with this raw data in a QTTime record, not sure how it works in AV Foundation.

This gets you an array of strings from your timecode.
Then you'd get [timeElements objectAtIndex:0] for your hours, [timeElements objectAtIndex:1] for your minutes, and so on.

In the future, if you're not sure of how to ask how to do what you want to do, it helps if you describe the overall problem statement first because we might be able to provide easier ways to accomodate your task. "I have video timecode strings and want to break them up into pieces; they're delimited by colons like 11:22:33:44" versus "I want to read a string character by character, determine if number or punctuation, and then record that and move on" will gets you very different results.

MacRumors attracts a broad audience
of both consumers and professionals interested in
the latest technologies and products. We also boast an active community focused on
purchasing decisions and technical aspects of the iPhone, iPod, iPad, and Mac platforms.