Parameters

If start is non-negative, the returned string
will start at the start'th position in
str, counting from zero. For instance,
in the string 'abcdef', the character at
position 0 is 'a', the
character at position 2 is
'c', and so forth.

If start is negative, the returned string
will start at the start'th character
from the end of str.

length

Maximum number of characters to use from str. If
omitted or NULL is passed, extract all characters to
the end of the string.

encoding

The encoding
parameter is the character encoding. If it is omitted, the internal character
encoding value will be used.

Return Values

mb_substr() returns the portion of
str specified by the
start and
length parameters.

Changelog

Version

Description

5.4.8

Passing NULL as length
extracts all characters to the end of the string. Prior to this version
NULL was treated the same as 0.

See Also

User Contributed Notes 12 notes

Passing null as length will not make mb_substr use it's default, instead it will interpret it as 0.<?phpmb_substr($str,$start,null,$encoding); //Returns '' (empty string) just like substr()?>Instead use:<?phpmb_substr($str,$start,mb_strlen($str),$encoding);?>

As you often need to iterate over UTF-8 characters inside a string, you might be tempted to use mb_substr($text,$i,1).The problem with this is that there is no "magic" way to find $i-th character inside UTF-8 string, other than reading it byte by byte from the begining. Thus a loop which calls mb_substr($text,$i,1) N times for all possible N values of $i, will take much longer than expected. The larger the $i gets, the longer is the search for $i-th letter. As characters are between 1 to 6 bytes long, one can convince oneself, that the execution time of such loop is actually Theta(N^2), which can be really slow even for moderately long texts.One way to work around it is to first split your text into an array of letters using some smart preprocessing, and only then iterate over the array.Here is the idea:<?phpclass Strings{ public static function len($a){ return mb_strlen($a,'UTF-8'); } public static function charAt($a,$i){ return self::substr($a,$i,1); } public static function substr($a,$x,$y=null){ if($y===NULL){$y=self::len($a); } return mb_substr($a,$x,$y,'UTF-8'); } public static function letters($a){$len = self::len($a); if($len==0){ return array(); }else if($len == 1){ return array($a); }else{ return Arrays::concat(self::letters(self::substr($a,0,$len>>1)),self::letters(self::substr($a,$len>>1)) ); } }?>As you can see, the Strings::letters($text) split the text recursively into two parts. Each level of the recursion requires time linear in the length of the string, and there is logarithmic number of levels, so the total runtime is O(N log N), which is still more than theoretically optimal O(N), but sadly this is the best idea I've got.

I'm trying to capitalize only the first character of the string and tried some of the examples above but they didn't work. It seems mb_substr() cannot calculate the length of the string in multi-byte encoding (UTF-8) and it should be set explicitly. Here is the corrected version:

A serious pitfall when using mb_substr() set to HTML-ENTITIES encoding is that the function performs a number of conversions before returning the value, the worst one being that html special characters are not just counted but decoded.