If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

C-Like Strings and Arrays - a PHP Emulation

C doesn't have strings - it has arrays of type char: "foo" is actually shorthand for {'f', 'o', 'o', '\0x00'}
C doesn't have arrays - it has pointers into allocated regions of memory: foo[4] is actually shorthand for *(foo+4)

PHP doesn't have pointers into allocated regions of memory - it has arrays and strings.

They don't work the same way.

But if you're porting a C implementation of some algorithm, as an intermediate step on the way to translating it into PHP, it would be nice to emulate the mechanics of the former using those of the latter.

Sometimes C's strings and arrays can be replaced by PHP's strings and arrays without too much grief; but as soon as the C program starts using pointer manipulations - which is usually quite quickly - it's not so simple, especially once they start being passed from function to function. Sometimes passing them by reference is enough, but all too often it's not.

We can model C's "allocated region of memory" as a PHP array; and we can model a pointer into that region as an array index. When we pass a "pointer", we need to pass both the array index, and the array the index indexes. And we have to pass the array as a reference, because any element within it is supposed to represent a specific location in memory, and all pointers to that location should see the same value, even if it changes.

So here's the trick. A C pointer is emulated in PHP by an array of two elements; the zeroth element is a reference to an array, and the first is an integer that indexes that array.

Allocating memory becomes creating an array of suitable size, and then putting it into an array of the above type; the index is initialised to zero.

(Attentive readers will note that this is more like calloc, since it initialises the contents of the array to 0 - when we ask PHP for an array, it has to use something for the elements if they're to exist).
Getting an element of the array:

Code:

a = foo[4]; b = *foo;

becomes

PHP Code:

$a = $foo[0][$foo[1] + 4];
$b = $foo[0][$foo[1]];

And that's probably the big drawback to this scheme.
Advancing the array pointer:

Code:

foo += 3;

is just

PHP Code:

$foo[1] += 3;

Copying a pointer

Code:

c = foo;

is just

PHP Code:

$c = $foo;

Copying an offset pointer is probably best done by copying the pointer and then offsetting the copy.

Code:

d = foo + 3;

=>

PHP Code:

$d = $foo; $d[1] += 3;

. The alternative is somewhat messier and basically involves taking the emulated pointer apart and putting it back together again:

PHP Code:

$d = array(&$foo[0], $foo[1] + 3);

This is necessary so that the new pointer continues to refer to the same allocated memory - that the first element is a reference, the second is a value.
For strings, a couple of utility functions are generally called for, especially with regard to PHP's interpreting the integer 0 as the one-character string '0' instead of an ASCII NUL.

C doesn't have strings - it has arrays of type char: "foo" is actually shorthand for {'f', 'o', 'o', '\0x00'}

False. C does not have a string type in the core language or standard library, but C does indeed have strings, where "a string is a contiguous sequence of characters terminated by and including the first null character".

Originally Posted by Weedpacket

C doesn't have arrays - it has pointers into allocated regions of memory: foo[4] is actually shorthand for *(foo+4)

False. C does have arrays. An array is not a pointer, although in most contexts an array is converted to a pointer to its first element.

That said, these would not be important if you are translating C code to PHP since the former is a matter of definition, and if the C code suffers from the pitfalls of failing to understand that arrays are not pointers, then the C program would likely be incorrect to begin with.

C does indeed have strings, where "a string is a contiguous sequence of characters terminated by and including the first null character".

Well, that's pretty much what I said (and you quoted); strings in C are syntactic sugar. The only difference is whether you choose to describe it in English as "a contiguous sequence of characters terminated by and including the first null character" or illustrate it by example in C as {'f', 'o', 'o', \0x00}.

Originally Posted by laserlight

C does have arrays. An array is not a pointer, although in most contexts an array is converted to a pointer to its first element.

True, but the exceptions don't apply for the emulation, since they mainly result from considerations of the byte size of the array's base type - something which in PHP is neither relevant to the size of the array nor well-defined anyway (since PHP's arrays don't have a "base type"). So that makes C's sizeof operator useless here (something I hinted at in my mention of malloc) - even more useless is the C idiom sizeof(arr)/sizeof(arr[0]) for finding out how many elements an array has. (Incidentally, that's not the only housekeeping issue that drops out during the conversion; although you can implement free($foo) as $foo[0]=null;, there isn't really much point in the long run, and you might just as well set $foo=null directly (yep, a null pointer is best implemented as a null) and let the runtime scrub up iff it was the last live pointer into its chunk of memory.)

The downside is that you can't declare an array of long and then turn around and treat it as an array of int; in PHP you have to code that conversion directly. (Of course, if you do that in C anyway you're either (a) looking for a smack, or (b) also using it as an environment test.)

Similarly, the & operator is also useless because in PHP we again don't actually have that information; and as for string literals, that case is one I addressed explicitly. Otherwise an array is as I described, and a variable of type "array of int" is indeed a "pointer to int". Equivalent to a pointer to int, if you insist, but the whole point of emulation is to construct something (functionally) equivalent.

Thank you!

I signed up just to say "Thank you!" to Weedpacket for posting this slick bit of code. It was exactly what I needed. I'm linking some keywords in a block of text, but I only want to link them if they aren't already in a link. This method works perfectly.

Alternate sort method

Just a quick note...

PHP's custom sort functions take a comparator function as callback (of two given elements, decide which is larger). Sometimes though it's easier to give each array element a score and then order by that (it's often somewhat faster too, since sorting with a comparator function means that each element basically gets examined twice - this way it only needs to be scored once).

Consider it a refinement of my earlier "Mutant Schwartzian transforms".

Repeatable concurrent RNGs

I needed to have several independent (pseudo-)random number generators, where "independent" meant that the numbers yielded by one generator weren't affected by calls to any of the other generators (as would happen if they all ran off PHP's internal RNG).

I didn't need Mersenne Twister levels of randomness for my purpose, so just picked the first algorithm to hand; specifically, the RNG that appears in the GNU Scientific Library under the name of "zuf".

The generator (actually, the generator generator) is implemented as a single function. As usual, examples of its use follow.

Stable sorting

Like most real-world libraries, PHP uses Quicksort for its sorting algorithm. Done right, this is a nice and efficient algorithm for general-purpose use (in specific circumstances other algorithms are better choices).

If Quicksort has one big drawback, it's the fact that it's not stable. If two elements in the input array are considered "equal" for the purposes of comparison, the order in which they appear in the output is undefined. This can be a drawback, because it interferes with constructing complex sort criteria.

When I was faced with this problem I started out implementing Mergesort (which is stable and its worst-case performance is comparable to Quicksort's average-case performance), but then I realised I could stabilise Quicksort instead. I wrote this as a drop-in replacement for usort; comparable "ssort", "srsort", "sasort", etc. can be written using the same model.