How do I read document headers with PHP/cURL?

RedBMedia

Proficient

Posts: 315

3+ Months Ago

I am using cURL to grab some data from some different sites. However, the pages that I am crawling are dynamically selected based on user input. Because, of this, i need a way to check if the page exist prior to extracting the data from the markup. Is there a way to get the HTTP status code in the doc header with cURL so I can check for 404 errors? Heres the cURL implementation that I am executing:

The name of a callback function where the callback function takes two parameters. The first is the cURL resource, the second is a string with the header data to be written. The header data must be written when using this callback function. Return the number of bytes written.

RedBMedia

Proficient

Posts: 315

3+ Months Ago

Thanks man! I love you like a fat kids loves cake!

joebert

Fart Bubbles

Posts: 13506

Loc: Florida

3+ Months Ago

If you're going to spoof the user-agent, you might as well give it a pool of agents to select from randomly.

That is, unless you like contributing to artificially inflating the popularity of one browser to the point that nobody really knows what people are using.

RedBMedia

Proficient

Posts: 315

3+ Months Ago

Have a list of user agents that I can add to a pool?

joebert

Fart Bubbles

Posts: 13506

Loc: Florida

3+ Months Ago

http://www.useragentstring.com/pages/Browserlist/

You know I can't help but wonder, how much of the audience that appears to be browsing on things like IE6, is actually old tools using spoofed User-Agents that haven't updated the User-Agent their tool uses because "it aint broke".

RedBMedia

Proficient

Posts: 315

3+ Months Ago

Ha, I had never thought of that, it might be funny to only use IE4 on everything you do for that very reason! btw, thanks for the list!

Rabid Dog

Web Master

Posts: 3243

Loc: South Africa

3+ Months Ago

Why not create your own agent I am gonna make one called "turbo charged monkey"

Product tokens are used to allow communicating applications to identify themselves by software name and version. Most fields using product tokens also allow sub-products which form a significant part of the application to be listed, separated by whitespace. By convention, the products are listed in order of their significance for identifying the application.

product = token ["/" product-version] product-version = token

Examples:

User-Agent: CERN-LineMode/2.15 libwww/2.17b3 Server: Apache/0.8.4

Product tokens should be short and to the point -- use of them for advertising or other non-essential information is explicitly forbidden. Although any token character may appear in a product- version, this token SHOULD only be used for a version identifier (i.e., successive versions of the same product SHOULD only differ in the product-version portion of the product value).

Rabid Dog

Web Master

Posts: 3243

Loc: South Africa

3+ Months Ago

OH MY! Someone refered to an rfc for clarification! Joebert you are close to becoming my favourite person!

Surely this info is available on any http response? or do you need curl to read that response? never used it

joebert

Fart Bubbles

Posts: 13506

Loc: Florida

3+ Months Ago

I'm confused about what you're asking RD.

Rabid Dog

Web Master

Posts: 3243

Loc: South Africa

3+ Months Ago

Just asking if curl handles the http response or if you could just use straight php to retrieve the header values?

Oh and congratulating you on the rfc link. nice to see

joebert

Fart Bubbles

Posts: 13506

Loc: Florida

3+ Months Ago

The manual page for fsockopen seems to suggest that cURL isn't required if you want to deal with headers of a response.

And yes, RFC documents are nice.

Rabid Dog

Web Master

Posts: 3243

Loc: South Africa

3+ Months Ago

Yeah I figured you wouldn't need additional libraries. After all it is the Hypertext Pre Processor isn't it LOL