If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Pulling a div from remote site

I would like to provide a generic way to pull DIV's from remote sites.
Let's say, for argument sake, a DIV with a specific ID.

Technically, it should be feasible, starting with a normal, server-side HTTP request.

What is also worth discussing are any legal issues.
I would argue, that anything available over HTTP is "for free".
(That may be terribly wrong)

Thanks in advance for any thoughts on this idea, which is most definitely a common issue, although my brief prior research on the web only returns contributions pertaining to AJAX requests, that will not work anyway.
(A server-side pull should work, though)

What is also worth discussing are any legal issues.
I would argue, that anything available over HTTP is "for free".
(That may be terribly wrong)

You're right, it is terribly wrong. Publishing something on the web is not equivalent to saying, "Take this and use it as you please." Copyrights have taken a beating in the Internet age, but they are necessary and useful, even though they're subject to abuse by all involved. Intellectual property is ethereal, but property nonetheless. And you can get into serious trouble by ignoring that fact. If nothing else, common courtesy dictates that if you want to copy something, you ask for permission and respect the owner's wishes if that permission is denied.

thanks for the rapid reply and the legal feedback.
I thought so, to be honest.
That's if one treats this cleanly and not "laissez-faire".
I want to be straight and not promote abuse.

However, I will only be in the position of a middle-man.
The idea is to provide the generic framework for setting up "New HTML tags" that are capable of pulling a piece of data from a remote site.
That would mean that the ultimate responsibily is that of the end-user.

I understand, that it is my job to make clear that Copyright and possibly other legal issues are to be respected. It is then up to the end-user to treat the technology responsibly.
Would you agree on that?

Reading between the lines of your post, I would infer, that you think that it is technically not much of an issue.

In PHP, I can think of at least one way:

- Pull complete HTML of the page
- Feed this into the DOMDocument class
- Filter out the DIV by ID and pull it's "innerHTML"
(PHP provides a small add-on class in conjunction with DOMDocument, I already use, for "innerHTML")

Does anyone agree vaguely, that the above pseudo-code would work?

Definitely the legal stuff is to be taken more carefully.
As a rule of thumb, for most use-cases, it would probably be sufficient to provide attribution of Copyright to the source page, as a footer note, right?
(say I would like to provide a couple of example-tags, like e.g. "Dow Jones!" - please see my website http://4nf.org/ for details on these tags)

No, providing attribution does not circumvent the rules of copyrights. It is a part of them. You can use copyright material without explicit permission within the "fair use" rules, but determining fair use is often difficult. A short quote would likely be permissible. Copying a significant portion of a document may well not be. The purpose and usage enters into the issue, and one size never fits all in this regard. If the copyright owner decides the issue is worth pursuing, the least you risk is a nasty letter from his attorney and probably a DMCA take-down request filed with your hosting service and/or the search engines, along with increased scrutiny of your website by all involved. Or, as I say, you could ask for permission.

From a technical standpoint, grabbing a specific <DIV> from another website is fairly trivial using either JavaScript or a server-side script. Resolving any relative URIs contained in the <DIV> for links or <img>s is not exactly trivial, but certainly doable as well. I just don't think there's any great need for a general purpose tool for this function, especially in light of the pitfalls.

I can't think of a faster way. You could write all the code individually to sort through the html one line at a time, but I don't think you'd find any efficiency gains that way and you probably would find that it is slower or more resource intensive.

I agree. It's probably necessary to pull in the whole HTML whether one uses DOMDocument or not...

I don't see any way you could only pull the div you want without pulling the whole html document. Unless the site you are pulling from provides a service you can call for the specific content directly that is.