How do you remove duplicate characters in a string without converting the string to a list. I want to delete all duplicate characters in a string (i.e. get the unsorted union of characters). For example string like "113233454766" should give me string "1324576". Note order.

Why do you not want to convert the String to a list of characters? Since it is probably the most natural approach, as your comment implicitly acknowledges, I think you should explain why you find it unacceptable.
–
Mr.Wizard♦Sep 21 '13 at 13:16

1

@Vajira See my answer for timings. Unless someone finds a fast way of deleting duplicates in a String, it looks like splitting to character codes then deleting is faster.
–
Michael E2Sep 21 '13 at 19:52

1

@Vajira No, it's a good question, even if now you decide you want to do something else. You could ask another question, if it's sufficiently different.
–
Michael E2Sep 21 '13 at 20:26

One can see that ToCharacterCode uses roughly eight times the memory. (It converts the string to packed array.) Characters is very wasteful of memory, using roughly 45 times the amount as String.

On the other hand, if I start from a fresh kernel, MaxMemoryUsed[] returns about 56MB. We can compare how much memory is used by evaluating MaxMemoryUsed[] after running a method. For this test, I joined 100 copies of Beowulf. Mine used 241MB and for István's used 629MB. For @belisarius' it was about 113MB when I aborted it as it would take too long to run to completion.

When I tried this with Michael E2's test, not being able to find the full character range though, it came in at 72 seconds. Had I found the full character range it would have been slower, but then again the full character range of a typical text is not the same as old English.

Mathematica is a registered trademark of Wolfram Research, Inc. While the mark is used herein with the limited permission of Wolfram Research, Stack Exchange and this site disclaim all affiliation therewith.