In this tutorial, we are going to build a command line app with PHP. The app we are going to build is going to receive certain keywords and optionally a folder and file extensions and it will store the files that correspond to the first page of Google search results for your keywords and for the file extensions you request on your storage drive. In the meantime, we will also show you how to work PHP’s built-in DOMDocument class. We called that script Getty because it GETs files.

Getty

We print some instructions as to how to use the app in the command line:

1

2

3

4

5

6

7

<?php

echo"--------- This program will search Google for files matching certain keywords and file types and extract the first result page ---------\r\n".

"Files are saved into a general folder where your script is by default. By default, the script extracts pdf and pptx files.\r\n".

"Mandatory input:\r\nKeywords - text that you want the files to contain.\r\n".

"Optional input:\r\nFolder to save - separated from the keywords with a | (pipe) sign\r\nFile types to extract separated from the folder with | and each file type separated with another through &".

We receive a single input line from the input stream and turn it into an array of values, as much values as we have separated by the pipe character in the input:

1

$line=explode("|",trim(fgets(STDIN)));

Then we set the particular array elements into variables according to our expected input order, if a value is missing we set it to null:

1

2

3

$folder=(isset($line[1]))?$line[1]:null;

$keywords=(isset($line[0]))?$line[0]:null;

$fileTypes=(isset($line[2]))?$line[2]:null;

If the user has provided any keywords we execute the function that will store the files from Google:

1

2

3

if($keywords){

getFrontPageFiles($keywords,$folder,$fileTypes);

}

Inside the function’s getFrontPageFiles body we set the $fileTypes variable to hold “pdf&pptx” if it is not already filled and set the folder where the files will be saved to general (only if the user has not provided us with a folder name). Aditionally, we create an array of file types by exploding the string where the character & (ampersand) is found.

1

2

3

4

functiongetFrontPageFiles($keywords,$folder,$fileTypes){

$fileTypes=(!$fileTypes)?"pdf&pptx":$fileTypes;

$fileTypes=explode("&",$fileTypes);

$folder=(!$folder)?"general":$folder;

We loop over each file type and perform a Google search query for the provided keywords and each of the specific file types.

We create a new DOMDocument class and import the string with the Google search results in it. Then, we get the search results container div and save it in a variable.

1

2

3

$doc=newDOMDocument();

$doc->loadHTML($pdf);

$container=$doc->getElementById("center_col");

If the directory where the user wants to save the files does not exist – we create it.

1

2

3

if(!is_dir($folder)){

mkdir($folder);

}

Now, we loop over each h3 element within our container div, get its first child (which is an anchor/<a> tag) and get its href attribute. This represents the link to the file of the type that we searched for. Therefore, we get its contents; get the name of the file by parsing the href attribute (the link) and save the file in the folder the user selected or the default folder.

After all files of a particular file type are saved we print to the output stream how many files of what file type were saved and where they were saved.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

$totalFiles=0;

foreach($container->getElementsByTagName("h3")as$urlContainer){

$url=$urlContainer->firstChild;

$href=$url->getAttribute("href");

$file=file_get_contents("https://google.com".$href);

$fileName=explode("/",$href);

$fileName=$fileName[count($fileName)-1];

$fileName=explode("&",$fileName)[0];

file_put_contents("$folder/$fileName",$file);

++$totalFiles;

}

echo"Saved $totalFiles files of type $fileType into the path $folder that matched the keywords: $keywords\r\n";

}

Now we can run that script by opening up our Command Line or Terminal, navigating to the folder where it is placed and typing php Getty.php or the name you gave to the script:

Figure 1: Using the Getty script

We have searched for PowerPoint presentations related to IT security and we downloaded 10 ppt and 10 pptx files on the topic in our chosen folder.

Figure 2: Downloaded presentations relating to IT security

Conclusion

I hope Getty will prove useful to you, it certainly helps me when I need to skim through files on a particular topic quickly. What do you think of Getty? What kind of command line applications do you want to make or have made?

Ivan is a student of IT, a freelance web designer/developer and a tech writer. He deals with both front-end and back-end stuff. Whenever he is not in front of an Internet-enabled device he is probably reading a book or traveling. You can find more about him at: http://www.dimoff.biz. facebook, twitter