[Script] phpFox - Community Project

I want to start a new project with the help of the community and Sven if we get stuck at some point

I know that the platform trainer will be released some day but I hope that this step by step guide will help you to understand how GSA works. This could help anyone who wants to fine tune existing scripts or scripts build with the platform trainer one day. Don't hesitate to ask any questions and assist me as good as you can.

As a side note I've tried to script some engines and none of it worked for me without the help of Sven, lol.

Google: "powered by phpfox". This is the standard keyword to begin a search for all platforms. If you don't find anything related to your platform you should search for a demo site for that platform and try to figure out some unique footprints on that sites.

In this case I've noticed that phpFox has unique footprints for all releases they publish.

Now we have a good chunk of footprints to begin with:

"powered by phpfox version"

"Powered By phpFoX Version 1.5.1."

"powered by phpfox version 1.6"

"Powered by phpFoX Version 1.6.20."

"Powered by phpFoX Version 1.6.21."

"Powered by phpfox version 2.0"

"Powered By Phpfox Version 2.0.4"

"Powered by phpfox version 2.0.5"

"Powered By phpFox Version 2.0.6."

"Powered By phpFox Version 2.0.7."

"powered By Phpfox Version 2.1.0"

"Powered By phpFox Version 2.1.0beta2"

"Powered By Phpfox Version 2.1.1"

"Powered By phpFox Version 3.0.0."

"Powered By phpFox Version 3.0.0beta1."

"Powered By phpFox Version 3.0.0beta5."

"Powered By phpFox Version 3.0.0rc1."

"Powered By phpFox Version 3.0.0rc3."

"Powered By phpFox Version 3.0.1."

"Powered By phpFox Version 3.0.2"

"Powered By phpFox Version 3.2.0."

"Powered By phpFox Version 3.2.0beta1."

"Powered By phpFox Version 3.2.0rc1."

"Powered By phpFox Version 3.3.0"

"Powered By phpFox Version 3.3.0beta1."

"Powered By phpFox Version 3.3.0beta2."

"Powered By phpFox Version 3.3.0rc1."

Next thing I like to do is to take a look into the URLs and registration form of these websites. Footprints like "Sign up for phpFox" or "/index.php?do=/user/browse/" (= inurl:"/index.php?do=/user/browse/") would be just fine, if you get good results in google. As both footprints gave me not as much results as I hoped for I leave them out for now.

Once we have figured out some footprints we use the most common one "powered by phpfox" for detection as GSA needs to know what kind of platform it deals with. As some sites will hide these footprint or the footprint isn't as unique as it should be it is always a good idea to investigate the source code of the platform. The best results we would get to identify a platform is to find a code snippet that is unique and a component of every homepage that is driven by that platform engine.

You can open the source code with "ctrl+u" (in chrome). I have found "content="phpFox"" and "Phpfox.init" in some of those sites and these snippets will help for the beginning.

Now that we've done our research for footprints and identifiers we can compile our first very basic script:

[setup]

enabled=1

default checked=0

page must have=powered by phpfox|content="phpFox"|Phpfox.init

;as you can see these our code snippets and most common footprint without quotes

;page must have=!PAGE SHOULD NOT HAVE

engine type=MyEngines

description=Social Network

search term="powered by phpfox version"|"Powered By phpFoX Version 1.5.1."|"powered by phpfox version 1.6"|"Powered by phpFoX Version 1.6.20."|"Powered by phpFoX Version 1.6.21."|"Powered by phpfox version 2.0"|"Powered By Phpfox Version 2.0.4"|"Powered by phpfox version 2.0.5"|"Powered By phpFox Version 2.0.6."|"Powered By phpFox Version 2.0.7."|"powered By Phpfox Version 2.1.0"|"Powered By phpFox Version 2.1.0beta2"|"Powered By Phpfox Version 2.1.1"|"Powered By phpFox Version 3.0.0."|"Powered By phpFox Version 3.0.0beta1."|"Powered By phpFox Version 3.0.0beta5."|"Powered By phpFox Version 3.0.0rc1."|"Powered By phpFox Version 3.0.0rc3."|"Powered By phpFox Version 3.0.1."|"Powered By phpFox Version 3.0.2"|"Powered By phpFox Version 3.2.0."|"Powered By phpFox Version 3.2.0beta1."|"Powered By phpFox Version 3.2.0rc1."|"Powered By phpFox Version 3.3.0"|"Powered By phpFox Version 3.3.0beta1."|"Powered By phpFox Version 3.3.0beta2."|"Powered By phpFox Version 3.3.0rc1."

;in "search term=" you have to fill in your footprints seperated with a "|"

Copy and paste this script into your text editor and save it as "phpFox.ini". This little script is very useful right now as it helps to find and identify websites powered by phpFox.

Next you have to do is to paste phpFox.ini into your GSA engines folder (...GSA Search Engine Ranker\Engines). You can scrape for phpFox sites with the "Search online for URLs"-Tool (Options -> Tools -> Search online..) now. Just choose your footprints by "Add predifined footprints" -> MyEngines -> phpFox. It is helpful to save the results to a custom file as well as the unknown ("Save Unknown") once scraping is done. The unknown sites can help to improve your script later and to identify more of these platform.

Good idea on a community project, im in, and will add my own projects as and when.

Im having some success digging into sick submitter templates as a starting point. Unfortunately, sic templates have a logic element (eg if then) so they are not an easy translation, but its a good start.

I will start a thread on it when i have a bit more luck, and maybe we can pool sic templates to work on (simply because its been around for a couple years and has a large community and thus many many platforms).

@Sven: I would like if you let us play with the script to some point and help out if you see any problems or comment on some variables (what they do, when to use, etc.) and share your experience about some topics (like where to look up for identifiers in the source code) with us.

What I want to do next is to explain how to find the fields GSA needs to fill out, the [Registration] and [Log In] steps and debugging with the tools we've got. I think that I can cover all these things within the next few days.

If you want to add new engines than I will post my scripts of TWiki (FosWiki), WikkaWakki, Confluence (all Wikis) and bbPress-forum this evening. These are all projects I got stuck at some points and the code will be a total mess, but will have some value at all, I think.

As we know so far how to identify and crawl some of the phpFox sites we should have a basic url list to work with. Next is to register to one or two of this sites manually and take notes for the script.

For this tutorial I use 3 example sites as each of them has some merrit for us.

First take a look at these sites. Each one of them has a "Log In" button, but only one of them a registration form on its root url (-> faxenexus). No "Register"-button for the other two sites.

Facenexus has already a registration form on its root url. This looks like a good thing at first glance but as this is not standard AFAIK (until now). So I try to find a way which is compatible to most phpsites we like to register to. I try to keep the registration process as simple as I could.

The "Log In" button looks like it wants to get clicked, right? We click it and a new page is loading up.

This page of "Alliance Weaver" has some huge information for us.

It shows us:

a) the url of the login form "/index.php?do=/user/login/"

b) the name of the link to get to the registration form: "Sign up for" OR "Registriere dich auf" (german site)

c) if you hover over the "Sign up..." link you'll the actual url of the registration site.

If we click "Log In" on the other 2 phpFox sites we see that it opens basically the same pages.

Facenexus differs from the "Alliance"-site and the german site again as you see in this image.

We notice that the URL of the Login form ("/user/login/") and the link to the registration page is slightly different ("/user/register/"). This is valuable information for us.

Up next we click on that "Sign Up..." button to open the registration page of each site.

The alliance site won't let us register to their site. That's OK for us. Just write down the reason for that (..."is an invite only community. Enter your email below if you have received an invitation.")

FaceNexus and the german site looks fine though.

They are slightly different from each other again and we notice this

URL =/user/register/OR/index.php?do=/user/login/

Sign up Button = Sign UpORRegistrieren

There are also some different fields to investigate, which we do next.

For this we have to take a look into the source code of that registration page. To do this you can click "CTRL+U" (in chrome) and the source code opens. To make a long story short you should click "CTRL+F" and search for "name="" in your browser window and scroll down to the form names.

What you will notice here is the name, type and some values of each field.

search term="powered by phpfox version"|"Powered By phpFoX Version 1.5.1."|"powered by phpfox version 1.6"|"Powered by phpFoX Version 1.6.20."|"Powered by phpFoX Version 1.6.21."|"Powered by phpfox version 2.0"|"Powered By Phpfox Version 2.0.4"|"Powered by phpfox version 2.0.5"|"Powered By phpFox Version 2.0.6."|"Powered By phpFox Version 2.0.7."|"powered By Phpfox Version 2.1.0"|"Powered By phpFox Version 2.1.0beta2"|"Powered By Phpfox Version 2.1.1"|"Powered By phpFox Version 3.0.0."|"Powered By phpFox Version 3.0.0beta1."|"Powered By phpFox Version 3.0.0beta5."|"Powered By phpFox Version 3.0.0rc1."|"Powered By phpFox Version 3.0.0rc3."|"Powered By phpFox Version 3.0.1."|"Powered By phpFox Version 3.0.2"|"Powered By phpFox Version 3.2.0."|"Powered By phpFox Version 3.2.0beta1."|"Powered By phpFox Version 3.2.0rc1."|"Powered By phpFox Version 3.3.0"|"Powered By phpFox Version 3.3.0beta1."|"Powered By phpFox Version 3.3.0beta2."|"Powered By phpFox Version 3.3.0rc1."

This is my first attempt of the script with the values we've got so far. It is far from perfect and Sven has to correct me if I'm wrong

Anyway, what we know right now is that we have to use a username and a password for registration. It is also our goal to post a link somewhere on that site.

GSA has to know that these fields are needed for a successful submission. Because of that we have to create

[URL] => this variable is for our %url% we like to promote on the target site somewhere

[Login] => GSA has to create a username for us. The variable is called %login% later on

[Password] => Same thing for %password%

As the variables %login% and %password% must always be the same if GSA tries to log in to the site it has to be "static=1".

The passord/username-length has to be specified in "min length=". It could be that we have to change the password- or username-length afterwards, if the platform has other minimum requirements regarding these variables. We leave it for now as we don't know yet.

[REGISTER_STEP1]

find link=Login

find url=*/index.php?do=/user/login/|*/user/login/

alternative url=/index.php?do=/user/login/

just download=1

If you remember correctly the first step we have to take to find the registration form is to click the "Log In"-button. This is covered by the script in [Register_Step1].

find link = name of the button or rather the name of the anchor

find url = part of the url the anchor is linking to. The "*" is a wildcard for the rest of the url which change from site to site

alternative url = if GSA finds nothing it tries to add this snippet to the root url of that homepage

just download = this command tells GSA just to click the link and move on to the next page

[REGISTER_STEP2]

find link=Sign up for*|Registriere dich auf*

find url=*/index.php?do=/user/register/|*/user/register/

alternative url=/index.php?do=/user/register/

just download=1

form name=*Sign Up|*Registrieren

form url=*/index.php?do=/user/register/|*/user/register/

;form id=

;submit success=

submit failed=is an invite only community. Enter your email below if you have received an invitation.

;page must have=!is an invite only community. Enter your email below if you have received an invitation.

val[full_name]=%spinfile-names.dat% %spinfile-lnames.dat%

val[user_name]=%login%

val[email]=%your e-mail%

val[password]=%password%

val[month]=%random_option%

val[day]=%random_option%

val[year]=%random_option%

val[gender]=%random_option%

val[country_iso]=%random_option%

val[agree]=1

Same as in the step above GSA needs to find a link/url to the registration page or it has to fill it out automatically if nothing is found. If it matches to "Sign up for*" GSA will click it and just download the page to get to the registration page.

form name = as form name we take the name of the button text. Please notice that we have to keep in mind that it might differ from language from language. For further optimisation we have to find other languages with other button text as well

form url = this is the URL of the registration page. different URLs can be seperated by a "|"

submit failed = we put in the "fail" text of the alliance site in this line. This is the only fail text we know so far

val[full_name]=%spinfile-names.dat% %spinfile-lnames.dat% = the spinfiles belong to GSA itself. They are stored in your GSA directory and there are other spinfiles as well (adress spinfiles etc.)

val[user_name]=%login% = the variable %login% we have created in the header ([setup]) of our script

val[email]=%your e-mail% = a standard variable. GSA will use the email we have entered in our project settings

val[password]=%password% = %password% which we defined in [setup]

val[agree]=1 = the checkbox of the ToC. Value "1" means that we accept the ToC as we have learnt in the source code of the registration page

val[*****]=%random_option% = random option is used if we have to select ab value like the month of our birth date. GSA will choose a random value

I hope I can finish this guide within the next 3-4 steps. 1st step will be to finish registration and log in, 2nd step will be to post our link. 3rd step will be debugging. 4th step will be Sven to correct all the nonsense I've done

On with the registration. I've to clarify that this is a "live" tutorial and I have no clue what expect me next when I write this script. Of course I have tested the registration on a random site first to see, where I can post a link, but each registration can differ from site to site as I've learnt so far with this engine along with different language support for phpFox. This makes it more difficult for me but on the other hand I can make more errors you can avoid for the future

This is a good example. As I've filled out the registration form of that german site I've seen that it is a 2-step registration. Right after I clicked the "Registrieren"-button a second form is loading.

As we have completed the registration process we will have to code the [LOGIN_STEP] and find a way to post our link somewhere on a site.

The [LOGIN_STEP] works similarly as the [REGISTER_STEP]. First we have to find a way to get to the "log in"-page, then have to log in with our username/email and password. On phpFox sites you have to log in with %your e-mail% as well as %password%.

Once we have logged in we have to find an indicator for GSA that the log in was successful. This can be a welcome message on the homepage or something in the source code. In the source code I've found some good code snippets:

>Logout<|>Abmelden<|*/index.php?do=/user/logout/|*/user/logout/

">Logout<" is very common on almost any sites. This is a very good indicator for a successful submission to GSA.

To find also some failed submission I just tried to log in with an invalid email and password to see what error messages we get. You should also investigate the source code for "error" as well.

Right after we logged in successfully we have to inspect the site for possible link targets.

There are some possible fields to post our link to. We can add a (nofollow) link to our profile as well as post a (dofollow) link to the community. In this tutorial I show you only how to update to your profile, because..... I simply don't know how to post in the link field yet

Gladly there is a link to edit your profile ("Profil ändern"|"Edit profile" <=> */index.php?do=/user/profile/|*/user/profile/) on the welcome page. Sadly our profile is only shown to the community and not to anyone for now. We have to change that first in [STEP1]

To do that we have to change our privacy settings. The url of the setting page is fortunately easy to find for GSA as we can search for an url named "/user/privacy/" respectively "/index.php?do=/user/privacy/" ("Privacy Settings"|"Privatspähre-Einstellungen")

Once opened that url it shows us a lot of different fields.

I've cutted out most of those fields in the image to show you just the important one:

val[privacy][profile.view_profile]

... and its value to choose from:

0 - show everyone

1 - show community only

2 - show friends only

3 - we don't know yet

4 - show no one

Since there is no need to give each field its variable it is smart to instruct GSA to left all other fields with:

If you have been very attentive in the previous post you have noticed that the facenexus site won't allow public profiles. This means that the profile page is kinda useless for us and we have to find another way to publish our link on that site

Whatever, I just stick to the plan and continue with [STEP2] -> get to the profile page and post my link and verify my submission.

To do that we have to find the "Edit Profile|Profil bearbeiten"-link first.

Fortunately we can get to the edit profile page directly.

[STEP2]

find link=Edit Profile|Profil bearbeiten

find url=*/index.php?do=/user/profile/|*/user/profile/

form name=Update|Aktualisieren

form url=*/index.php?do=/user/profile/|*/user/profile/

;form url=*/index.php?do=/user/profile/#|*/user/profile/#

set unknown variable=%leave%

custom[1]=%url%

As we have posted our link (hopefully) on that profile page we have to find a way how GSA has to verify the submission.

OK, some good news and some bad news.

Good news first: The url we have posted is contrary to my presumption a dofollow link on both test pages. This is not always the case as I've known from another phpfox site.

Bad news: This script get more and more complex. You might have noticed that the profile-url of both pages differ. On that facenexus-site we see our %login%-name in the url (../hott69/info/), on the german site we see a predefined %userid% ("/index.php?do=/profile-208/info/").

I don't know if my solution is working for this problem but I will show you how to get that userid and how GSA will use it to verify the submission. To do that we have to specify the variables [userid] and [profile_link] in the [setup] header.

To extract the %userid% we have to define one or more "front" parts of an URL and one or more "back" parts. The URL in question is "community/index.php?do=/profile-208/". So you have to define the front part of the profile-number and the back part. I have defined a second back part with a regular expression ("/n" -> regex for "line break"), which I hope to work just in case it is needed

[userid]

type=extract

front=/profile-

back1=/

back2=/n

I hope that the %profile_link%-variable will work to solve the problem with different profile-page-urls. I don't know yet as I've never used it and copied from the "moodle"-script.

[profile_link]

type=extract

find link=Profile Views|Profil anzeigen

find url=*/index.php?do=/profile-%userid%|*/%login%

Last thing before I assemble all variables into the script is to show you how GSA has to verify the backlink.

verify submission=1

verify by=url

verify url=%profile_link%/info

verify interval=30

verify timeout=300

verify on unknown status=1

Thats all for now. Next step is to assemble the script and debugging/optimisation.

I'm gonna be honest, this may as well be written in Chinese to me as I have no idea what you're doing, but appreciate it all the same. As long as when it's all sorted there's a big clear button that says 'Push This For Magic Stuff To Happen' then I'm interested.

It depends on the way not only the end result. With your explanations a lot people should be able to create new engines/scripts. You show them how you work. Of course doing such stuff is not a thing for everyone. Basic programming experience (and knowing what a regex is... http://xkcd.com/208/) gives huge advantages.

For me it's a symbolic thing what you do. You give INPUT and you CONTRIBUTE and SHARE. You improove the community and perhaps a few other people will follow your example. Thats the real value! SER is strong, but such a community/such people like you give a huge extra value.

Before I post the script let me mention that I've not tested it so far. However I changed some variables (%about_yourself instead of %url%), added some variables such as "submit success skip verify=" and "set unknown variable=" and optimized it for italian language also as phpfox makes it easy to change the language. Also I changed the footprints for the search engines to some more general search terms ("search term=")

Right afer I've posted the script I'm going to test it and show you how to use the debugger afterwards with hopefully some usefull results.

A word to the debugger. If you have activated the debugger in Options -> Advanced -> Run in debug mode

- if you hover over a project you'll see the log. You also can open the log for that submission with Right Click - View Log

- if you doubleclick the URL a mirror of that site in question pop up in your browser. This helps you to identify the problem

@Sven: I'm stopping this project for now as it was very time consuming. Feel free to add it to GSA once you have time for debugging this. You should be aware that it is also possible to post a link here:

I've noticed that at least one site needed a profile picture to complete registration. Maybe it is possible to add some random image like for the "video"-platforms.

Tot hose not noticing, this engine was added to the latest release of the software. However it was a bit complex as there was a lot ajax interaction and I had to change a lot things. In fact was one of the more complicated engines. You can study the ini file from the release to see what I mean.

But don't worry, most of the engines are really easy to do so don't give up