Machine code binary buffer searching regardless of NULL

No I didn't do any work, I just knew that \0 was the character for null and patterns in pcre can always be matched regardless of what precedes it. The problem you discovered with FileRead is as you noticed, nothing to do with RegEx but how AutoHotkey stores data.

Your example indicates that in the search string we have to replace each special character ([]()"\'.*?<>^$|null…) with a hex (or octal, as in your example) escape sequence (or precede them with ""), which is not a very elegant solution.

You misunderstood, I used \x61 as a control in the benchmark test so you could easily replace it with any other hex value to observe an identical result. In the previous script I had written expressions that used '\d' and '.' to a similar effect, any valid regex pattern can be used. Again, it doesn't hurt to try yourself before coming to such presumptions.

so try to search for 4MB binary needle inside 60MB binary buffer (or bigger, using e.g. CreateFileMapping produced memory block, say 1GB, is it enough :-D) using RegExMatch and compare that to InBuf, hehe

first how do you think you will read 1GB file? using a chunk read cycle? nah, it's bad! Is there anyway to feed 1GB mapped memory space to RegExMatch? if no then there is not much sense to compare IMO, since CreateFileMapping + MapViewOfFile + InBuf would be the most suitable and *elegant* solution, not requiring pre-escaping of a large binary buffer. BTW it would take long to convert a large binary buffer to the RegExMatch acceptable format using pure AHK, doesn't it :-) ?

Different algorithms are suited for specific situations. Even if your function is specialized to parse extremely large sets of data it lacks the features of regular expressions. Conversely, regex was designed for manipulating complex strings and requires additional steps to read large variables.

I don't mean to depreciate your work, it's certainly most impressive to see assembly used in AutoHotkey. I just find that regex can satisfy most requirements, including searching for and past null characters.

Your example indicates that in the search string we have to replace each special character ([]()"\'.*?<>^$|null…) with a hex (or octal, as in your example) escape sequence (or precede them with ""), which is not a very elegant solution.

You misunderstood, I used \x61 as a control in the benchmark test so you could easily replace it with any other hex value to observe an identical result. In the previous script I had written expressions that used '\d' and '.' to a similar effect, any valid regex pattern can be used. Again, it doesn't hurt to try yourself before coming to such presumptions.

I did try myself. I was speaking about the scenario where you search for a binary pattern, like a piece of code in a program, or an embedded MD5 signature, or descriptors of images, videos. These patterns are normally taken from other files. You cannot use those patterns directly as the search string, can you? You have to "replace each special character", convert the non-printables to escape sentences, that is, first you have to process the search string. If I misunderstood, and you can use a binary file or buffer directly as the search string in another buffer, please tell the trick. I tried the following:

I got the following msg:---------------------------test.ahk---------------------------1006 : Compile error 14 at offset 4: missing )---------------------------OK ---------------------------It shows that some 4-byte binary search strings lead to errors.

If the search string contains \0, other tricks are needed in AHK (as I noted earlier): VarSetCapacity and dllcall to RtlMoveMemory or NumPuts to have the desired StrLen. You can make a wrapper function, but it is still ugly.

That is only true if your program is so poorly designed without fault tolerance and control. This is often a cause of bugs and security holes which AutoHotkey's uniquely simplistic syntax aims to prevent in the first place. It's ironic how you find the need to replace a single \E so mundane and 'ugly' knowing the overheads of a machine code function.

I never expected that you would be so adamant to suppress alternatives techniques. Raw buffer searching and regex have their trade-offs and either are suited for different applications.