Using Substring is the best option in combination with Convert.ToByte. See this answer for more information. If you need better performance, you must avoid Convert.ToByte before you can drop SubString.

Honestly - until it tears down performance dramatically, I would tend to ignore this and trust the Runtime and the GC to take care of it.
–
TomalakMar 6 '09 at 17:11

60

Because a byte is two nibbles, any hex string that validly represents a byte array must have an even character count. A 0 should not be added anywhere - to add one would be making an assumption about invalid data that is potentially dangerous. If anything, the StringToByteArray method should throw a FormatException if the hex string contains an odd number of characters.
–
David BoikeMar 9 '10 at 19:01

4

@00jt You must make an assumption that F == 0F. Either it is the same as 0F, or the input was clipped and F is actually the start of something you have not received. It is up to your context to make those assumptions, but I believe a general purpose function should reject odd characters as invalid instead of making that assumption for the calling code.
–
David BoikeJan 28 '13 at 15:35

3

@DavidBoike The question had NOTHING to do with "how to handle possibly clipped stream values" Its talking about a String. String myValue = 10.ToString("X"); myValue is "A" not "0A". Now go read that string back into bytes, oops you broke it.
–
00jtJan 30 '13 at 19:25

Performance Analysis

Note: new leader as of 2014-07-31.

I ran each of the various conversion methods through some crude Stopwatch performance testing, a run with a random sentence (n=61, 1000 iterations) and a run with a Project Gutenburg text (n=1,238,957, 150 iterations). Here are the results, roughly from fastest to slowest. All measurements are in ticks (10,000 ticks = 1 ms) and all relative notes are compared to the [slowest] StringBuilder implementation. For the code used, see below or the test framework repo where I now maintain the code for running this.

Disclaimer

WARNING: Do not rely on these stats for anything concrete; they are simply a sample run of sample data. If you really need top-notch performance, please test these methods in an environment representative of your production needs with data representative of what you will use.

Lookup tables have taken the lead over byte manipulation. Basically, there is some form of precomputing what any given nibble or byte will be in hex. Then, as you rip through the data, you simply look up the next portion to see what hex string it would be. That value is then added to the resulting string output in some fashion. For a long time byte manipulation, potentially harder to read by some developers, was the top-performing approach.

Your best bet is still going to be finding some representative data and trying it out in a production-like environment. If you have different memory constraints, you may prefer a method with fewer allocations to one that would be faster but consume more memory.

Testing Code

Feel free to play with the testing code I used. A version is included here but feel free to clone the repo and add your own methods. Please submit a pull request if you find anything interesting or want to help improve the testing framework it uses.

Add the new static method (Func<byte[], string>) to /Tests/ConvertByteArrayToHexString/Test.cs.

Add that method's name to the TestCandidates return value in that same class.

Make sure you are running the input version you want, sentence or text, by toggling the comments in GenerateTestInput in that same class.

Hit F5 and wait for the output (an HTML dump is also generated in the /bin folder).

Despite making the code available for you to do the very thing you requested on your own, I updated the testing code to include Waleed answer. All grumpiness aside, it is much faster.
–
patridgeJan 13 '10 at 16:29

1

@CodesInChaos Done. And it won in my tests by quite a bit as well. I don't pretend to fully understand either of the top methods yet, but they are easily hidden from direct interaction.
–
patridgeJan 15 '13 at 18:01

2

This answer has no intention of answering the question of what is "natural" or commonplace. The goal is to give people some basic performance benchmarks since, when you need to do these conversion, you tend to do them a lot. If someone needs raw speed, they just run the benchmarks with some appropriate test data in their desired computing environment. Then, tuck that method away into an extension method where you never look its implementation again (e.g., bytes.ToHexStringAtLudicrousSpeed()).
–
patridgeApr 8 '13 at 20:37

bytes[i] >> 4 extracts the high nibble of a bytebytes[i] & 0xF extracts the low nibble of a byte

b - 10
is < 0 for values b < 10, which will become a decimal digit
is >= 0 for values b > 10, which will become a letter from A to F.

Using i >> 31 on a signed 32 bit integer extracts the sign, thanks to sign extension.
It will be -1 for i < 0 and 0 for i >= 0.

Combining 2) and 3), shows that (b-10)>>31 will be 0 for letters and -1 for digits.

Looking at the case for letters, the last summand becomes 0, and b is in the range 10 to 15. We want to map it to A(65) to F(70), which implies adding 55 ('A'-10).

Looking at the case for digits, we want to adapt the last summand so it maps b from the range 0 to 9 to the range 0(48) to 9(57). This means it needs to become -7 ('0' - 55).
Now we could just multiply with 7. But since -1 is represented by all bits being 1, we can instead use & -7 since (0 & -7) == 0 and (-1 & -7) == -7.

Some further considerations:

I didn't use a second loop variable to index into c, since measurement shows that calculating it from i is cheaper.

Using exactly i < bytes.Length as upper bound of the loop allows the JITter to eliminate bounds checks on bytes[i], so I chose that variant.

The accepted answer provides 2 excellent HexToByteArray methods, which represent the other half of the question. Waleed's solution answers the running question of how to do this without creating a huge number of strings in the process.
–
Brendten EickstaedtOct 10 '12 at 16:08

Comparison

Note

During decoding IOException and IndexOutOfRangeException could occur (if a character has a too high value > 256). Methods for de/encoding streams or arrays should be implemented, this is just a proof of concept.

This is a great post. I like Waleed's solution. I haven't run it through patridge's test but it seems to be quite fast. I also needed the reverse process, converting a hex string to a byte array, so I wrote it as a reversal of Waleed's solution. Not sure if it's any faster than Tomalak's original solution. Again, I did not run the reverse process through patridge's test either.

Not to pile on to the many answers here, but I found a fairly optimal (~4.5x better than accepted), straightforward implementation of the hex string parser. First, output from my tests (first batch is my impl.):

BTW
For benchmark testing initializing alphabet every time convert function called is wrong, alphabet must be const (for string) or static readonly (for char[]). Then alphabet-based conversion of byte[] to string becomes as fast as byte manipulation versions.

And of course test must be compiled in Release (with optimization) and with debug option "Suppress JIT optimization" turned off (same for "Enable Just My Code" if code must be debuggable).

I did not get the code you suggested to work, Olipro. hex[i] + hex[i+1] apparently returned an int.

I did, however have some success by taking some hints from Waleeds code and hammering this together. It's ugly as hell but it seems to work and performs at 1/3 of the time compared to the others according to my tests (using patridges testing mechanism). Depending on input size. Switching around the ?:s to separate out 0-9 first would probably yield a slightly faster result since there are more numbers than letters.

I'll enter this bit fiddling competition as I have an answer that also uses bit-fiddling to decode hexadecimals. Note that using character arrays may be even faster as calling StringBuilder methods will take time as well.

For performance I would go with drphrozens solution. A tiny optimization for the decoder could be to use a table for either char to get rid of the "<< 4".

Clearly the two method calls are costly. If some kind of check is made either on input or output data (could be CRC, checksum or whatever) the if (b == 255)... could be skipped and thereby also the method calls altogether.

Using offset++ and offset instead of offset and offset + 1 might give some theoretical benefit but I suspect the compiler handles this better than me.

I'll make the case that this edit is wrong, shouldn't have been approved, and should reverted. Along the way, you might learn a thing or two about some internals, and see yet another example of what premature optimization really is and how it can bite you.

tl;dr: Just use Convert.ToByte and String.Substring if you're in a hurry ("Original code" below), it's the best combination if you don't want to re-implement Convert.ToByte. Use something more advanced (see other answers) that doesn't use Convert.ToByte if you need performance. Do not use anything else than String.Substring in combination with Convert.ToByte, unless someone has something interesting to say about this in the comments of this answer.

warning: This answer may become obsolete if a Convert.ToByte(char[], Int32) overload is implemented in the framework. This is unlikely to happen soon.

As a general rule, I don't much like to say "don't optimize prematurely", because nobody knows when "premature" is. The only thing you must consider when deciding whether to optimize or not is: "Do I have the time and resources to investigate optimization approaches properly?". If you don't, then it's too soon, wait until your project is more mature or until you need the performance (if there is a real need, then you will make the time). In the meantime, do the simplest thing that could possibly work instead.

It does allocate a new string however, but then you need to allocate one to pass to Convert.ToByte anyway. Ironically, the solution provided in the revision allocates yet another object on every iteration (the two-char array); you can safely put that allocation outside the loop and reuse the array to avoid that.

What you're left with is a string reader whose only added "value" is a parallel index (internal _pos) which you could have declared yourself (as j for example), a redundant length variable (internal _length), and a redundant reference to the input string (internal _s). In other words, it's useless.

If you wonder how Read "reads", just look at the code, all it does is call String.CopyTo on the input string. The rest is just book-keeping overhead to maintain values we don't need.

What does the solution look like now? Exactly like it was at the beginning, only instead of using String.Substring to allocate the string and copy the data to it, you're using an intermediary array to which you copy the hexadecimal numerals to, then allocate the string yourself and copy the data again from the array and into the string (when you pass it in the string constructor). The second copy might be optimized-out if the string is already in the intern pool, but then String.Substring will also be able to avoid it in these cases.

In fact, if you look at String.Substring again, you see that it uses some low-level internal knowledge of how strings are constructed to allocate the string faster than you could normally do it, and it inlines the same code used by CopyTo directly in there to avoid the call overhead.

String.Substring

Worst-case: One fast allocation, one fast copy.

Best-case: No allocation, no copy.

Manual method

Worst-case: Two normal allocations, one normal copy, one fast copy.

Best-case: One normal allocation, one normal copy.

Conclusion? If you want to use Convert.ToByte(String, Int32) (because you don't want to re-implement that functionality yourself), there doesn't seem to be a way to beat String.Substring; all you do is run in circles, re-inventing the wheel (only with sub-optimal materials).

Note that using Convert.ToByte and String.Substring is a perfectly valid choice if you don't need extreme performance. Remember: only opt for an alternative if you have the time and resources to investigate how it works properly.

If there was a Convert.ToByte(char[], Int32), things would be different of course (it would be possible to do what I described above and completely avoid String).

I suspect that people who report better performance by "avoiding String.Substring" also avoid Convert.ToByte(String, Int32), which you should really be doing if you need the performance anyway. Look at the countless other answers to discover all the different approaches to do that.

Disclaimer: I haven't decompiled the latest version of the framework to verify that the reference source is up-to-date, I assume it is.

Now, it all sounds good and logical, hopefully even obvious if you've managed to get so far. But is it true?