User Contributed Notes 70 notes

Since I feel this is rather vague and non-helpful, I thought I'd make a post detailing the mechanics of the glob regex.

glob uses two special symbols that act like sort of a blend between a meta-character and a quantifier. These two characters are the * and ?

The ? matches 1 of any character except a /
The * matches 0 or more of any character except a /

If it helps, think of the * as the pcre equivalent of .* and ? as the pcre equivalent of the dot (.)

Note: * and ? function independently from the previous character. For instance, if you do glob("a*.php") on the following list of files, all of the files starting with an 'a' will be returned, but * itself would match:

It does not match just a.php and aa.php as a 'normal' regex would, because it matches 0 or more of any character, not the character/class/group before it.

Executing glob("a?.php") on the same list of files will only return aa.php and ab.php because as mentioned, the ? is the equivalent of pcre's dot, and is NOT the same as pcre's ?, which would match 0 or 1 of the previous character.

glob's regex also supports character classes and negative character classes, using the syntax [] and [^]. It will match any one character inside [] or match any one character that is not in [^].

glob("[^ab]*.php") will return nothing because the character class will fail to match on the first character.

You can also use ranges of characters inside the character class by having a starting and ending character with a hyphen in between. For example, [a-z] will match any letter between a and z, [0-9] will match any (one) number, etc..

glob also supports limited alternation with {n1, n2, etc..}. You have to specify GLOB_BRACE as the 2nd argument for glob in order for it to work. So for example, if you executed glob("{a,b,c}.php", GLOB_BRACE) on the following list of files:

a.php
b.php
c.php

all 3 of them would return. Note: using alternation with single characters like that is the same thing as just doing glob("[abc].php"). A more interesting example would be glob("te{xt,nse}.php", GLOB_BRACE) on:

tent.php
text.php
test.php
tense.php

text.php and tense.php would be returned from that glob.

glob's regex does not offer any kind of quantification of a specified character or character class or alternation. For instance, if you have the following files:

a.php
aa.php
aaa.php
ab.php
abc.php
b.php
bc.php

with pcre regex you can do ~^a+\.php$~ to return

a.php
aa.php
aaa.php

This is not possible with glob. If you are trying to do something like this, you can first narrow it down with glob, and then get exact matches with a full flavored regex engine. For example, if you wanted all of the php files in the previous list that only have one or more 'a' in it, you can do this:

Do note that this command has a security hole.If the $path or $pattern given includes special characters, for example a command substitution such as "`rm index.php`", the shell will process it and execute that command.

You can fix the problem by either escaping the characters properly (use shell_escape) or by writing a function that actually calls glob(), or opendir()/readdir()/closedir(), recursively.

Don't use glob() if you try to list files in a directory where very much files are stored (>100.000). You get an "Allowed memory size of XYZ bytes exhausted ..." error.You may try to increase the memory_limit variable in php.ini. Mine has 128MB set and the script will still reach this limit while glob()ing over 500.000 files.

The more stable way is to use readdir() on very large numbers of files:<?php// code snippetif ($handle = opendir($path)) { while (false !== ($file = readdir($handle))) {// do something with the file // note that '.' and '..' is returned even}closedir($handle);}?>

If you have open_basedir set in php.ini to limit which files php can execute, glob(...) will return false when there are no matching files. If open_basedir is not set, the very same code will return an empty array in the same situation.

This is unfortunate as a seemingly innocuous change causes different functionality that breaks code like:

Just be careful when using GLOB_BRACE regarding spaces around the comma:{includes/*.php,core/*.php} works as expected, but{includes/*.php, core/*.php} with a leading space, will only match the former as expected but not the latterunless you have a directory named " core" on your machine with a leading space.PHP can create such directories quite easily like so:mkdir(" core");

alan at ridersite dot org 18-Mar-2007 03:26 -- Stated '*.*' is the same as '*' -- This is not true as * alone will return directories too and *.* will only return files with an extension such as .pdf or .doc or .php.

A couple of notes: glob() handles symbolic filesystem links and resolves references to './', '../' nicely and handles an extra '/' character , at least on X-systems. e.g., glob("../*") will do next higher dir.

This is good to use so warnings or errors show as "../foo" and not your system's full path.

Several of the examples use a notation "*.*" when just plain "*" does the same thing. The "*.*" notation is misleading as it implies foo.ext will not be found with "*" because the "." is not present.

Watch the flags must not be strings. They are defined constants. Thus, glob("../*", GLOB_ONLYDIR) works; glob("../*", "GLOB_ONLYDIR") does not.

First off, it's nice to see all of the different takes on this. Thanks for all of the great examples.

Fascinated by the foreach usage I was curious how it might work with a for loop. I found that glob was well suited for this, especially compared to opendir. The for loop is always efficient when you want to protect against a potential endless loop.

After fiddling with GLOB_BRACE a bunch, I have found the most items that can be included in the braces is about 10 before glob no longer returns any matches.

I have a scenario where there can be a thousand or more files to check for where I can't pattern match and need to check specific names. I was hoping to batch them in large groups to see if it would help performance. However, if I include more than 10 in a GLOB_BRACE the function will return FALSE.

I needed a function to create an unlimited multidimensional array, with the names of the folders/files intact (no realpath's, although that is easily possible). This is so I can simply loop through the array, create an expandable link on the folder name, with all the files inside it.

This is the correct way to recurse I believe (no static, return small arrays to build up the multidimensional array), and includes a check for files/folders beginning with dots.

The answer for the difference in the dirsize function of "management at twilightus dot net":

glob('*') ignores all 'hidden' files by default. This means it does not return files that start with a dot (e.g. ".file").If you want to match those files too, you can use "{,.}*" as the pattern with the GLOB_BRACE flag.

<?php// Search for all files that match .* or *$files = glob('{,.}*', GLOB_BRACE);?>

An alternative to this glob function. Like what edogs [at] dogsempire.com said, opendir should be faster than glob. I have not tested timing for this function but it works perfectly for me on my PHP v5.2.2 server.

I've written a function that I've been using quite a lot over the past year or so. I've built whole websites and their file based CMSs based on this one function, mostly because (I think) databases are not as portable as groups of files and folders. In previous versions, I used opendir and readdir to get contents, but now I can do in one line what used to take several. How? Most of the work in the whole script is done by calling

glob("$dir/*")

Giving me an array containing the names of the items in the folder, minus the ones beginning with '.', as well as the ones I specify.

<?php

/* alpharead version 3: This function returns an array containing the names of the files inside any given folder, excluding files that start with a '.', as well as the filenames listed in the '$killit' array. This array is sorted using the 'natural alphabetical' sorting manner. If no input is given to the function, it lists items in the script's interpreted folder. Version 3 fixes a MAJOR bug in version 2 which corrupted certain arrays with greater than 5 keys and one of the supposedly removed filenames.written by Admiral at NuclearPixel.com */

Note that on Windows, glob distinguishes between uppercase and lowercase extensions, so if the directory contains a file "test.txt" and you glob for "*.TXT" then the file will not be found!That bug only happens when you use patterns containing "*", like the example above. If you for example search for the full filename "test.TXT" then everything works correctly.

Whilst on Windows, a path starting with a slash resolves OK for most file functions - but NOT glob. If the server is LAUNCHED (or chdir()ed) to W:, then file_exists("/temp/test.txt")returns true for the file "W:/temp/test.txt".But glob("/temp/*.txt") FAILS to find it!

A solution (if you want to avoid getting drive letters into your code) is to chdir() first, then just look for the file.<?php$glob="/temp/*.txt";chdir(dirname($glob));// getcwd() is now actually "W:\temp" or whatever

glob caused me some real pain in the buttom on windows, because of the DOS thing with paths (backslashes instead of slashes)...

This was my own fault because I "forgot" that the backslash, when used in strings, needs to be escaped, but well, it can cause a lot of confusion, even for people who are not exactly newbies anymore...

For some reason, I didn't have this problem with other file operations (chdir, opendir, etc...), which was the most confusing of all...

So, for people running scripts on Windows machines (Dos95, 98 or WinNT or DosXP), just remember this:

glob('c:\temp\*.*'); // works correctly, returns an array with files.glob("c:\temp\*.*"); // does NOT work... the backslashes need to be escaped...glob("c:\\temp\\*.*"); // that works again...

This is especially confusing when temporary writable directories are returned as an unescaped string.

$tempdir = getenv('TEMP');// this returns "C:\DOCUME~1\user\LOCALS~1\Temp"so in order to scan that directoy I need to do:

glob($tempdir . "\\*.*");

Or perhaps it's easier to replace all backslashes with slashes in order to avoid these kinds of confusions...

glob("c:/temp/*.*"); // works fine too...

I know I'm not contributing anything new here, but I just hope this post may avoid some unnecessary headaches...

Here is a way I used glob() to browse a directory, pull the file name out, resort according to the most recent date and format it using date(). I called the function inside a <select> and had it go directly to the PDF file: