If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Trying to access difficult website with CURL or simple_html_dom

I have twice posted a job on Freelancer for someone to write a script that will do a simple query on a website but in spite of several attempts nobody has succeeded. Not sure why it is so difficult. I just want to pass some search values to the script from my PHP code and get a response.

I need to be able to search births, deaths or marriages (dropdown). I will be supplying first name and surname (eg. John Smith), a date range plus the Regn no. I simply want a response 0 (not found) or 1 (found).

(To test it search deaths for John Smith for any year from 1800 - 1983. You have to change the date dropdown to Yes and enter a date range. eg. 01 01 1920 to 31 12 1920)

Is anyone able to explain why the site is so difficult to access please? I believe it uses $_post, session variables and cookies. I have tried disabling cookies in my brower and the search still works so cookies can presumably be ignored.

I have twice posted a job on Freelancer for someone to write a script that will do a simple query on a website but in spite of several attempts nobody has succeeded. Not sure why it is so difficult. I just want to pass some search values to the script from my PHP code and get a response.

I need to be able to search births, deaths or marriages (dropdown). I will be supplying first name and surname (eg. John Smith), a date range plus the Regn no. I simply want a response 0 (not found) or 1 (found).

(To test it search deaths for John Smith for any year from 1800 - 1983. You have to change the date dropdown to Yes and enter a date range. eg. 01 01 1920 to 31 12 1920)

Is anyone able to explain why the site is so difficult to access please? I believe it uses $_post, session variables and cookies. I have tried disabling cookies in my brower and the search still works so cookies can presumably be ignored.

As an experiment, try disabling Javascript ;-)

I wouldn't say it's impossible, but having done this sort of thing before @work, this isn't necessarily a walk in the park. Not to mention, it may be against the site's TOS (I'm not a lawyer and haven't even looked to see if they *have* a TOS, but that's a consideration with some sites).

Does the organization have an API available?

/!!\ mysql_ is deprecated --- don't use it! Tell your hosting company you will switch if they don't upgrade!/!!!\ ereg() is deprecated --- don't use it!

The right approach is to approach the site owners (in this case it seems to be your government: a department of the civil service) to request access to an API that will enable you to perform the search properly and efficiently. If no such API exists, perhaps one can be agreed upon and implemented.

Originally Posted by pm1306

Disabling javascript prevents the website from working. The initial dropdown does not work.

And that partly answers the question of "why the site is so difficult to access" via a script: it is not necessarily a simple matter of just mimicking the form and then submitting it since the Javascript on the site must be taken into account. There does not appear to be cross site request forgery protection, but if there was (e.g., "hidden" within the Javascript), then you would also need to access the page and parse for the CSRF token (not necessarily difficult, but definitely inefficient).

It has taken about 10 years of requests from thousands of users to get them to implement the current "enhanced" version of their website (2 months ago), which is full of bugs and totally un-user unfriendly. A typical government system designed by a committee. (eg. to search one year you now have to click a dropdown then type the year twice, versus just entering the year once in their previous system).

In that case, you'll need to explore all the AJAX calls (if any) and Javascript that run to figure out how the site works. Also, you should know that intentionally violating a website's Terms of Service can be considered fraud and therefore criminal behavior. You should read the terms of service and make sure you are not in violation.