Giacomo Fiorentini writes:
> I'd like to know what's the semanthic differents of the next two
> fields in the HTRequest structure :
>
> struct _HTRequest {
> .....
> HTParentAnchor * anchor ; /* The client anchor for this
> request */
> HTParentAnchor * parentAnchor ; /* For refere fields */
> .....
> } HTRequest ;
>
> My doubt come out from the current implementation of the Robot: in
> the HText_beginAnchor definition, every call to the method HTRequest_parent
> return a null pointer. The problem is that this method returns the second
> field of the HTRequest structure that is NULL all the times. The solution is
> trivial: we must simply reach the first field. That is :
Correct! Let me start by showing the full function implemented in HTRobot.c
which is one of the example applications available from
http://www.w3.org/pub/WWW/Distribution.html
It is using the HText interface a bit different than the Line Mode browser
does in that it doesn't present anything to the user. Instead, the HTML parser
calls out to the HText interface whenever it finds an anchor element in a HTML
document. This version of the HText interface then starts a new request if not
the URL is already getting loaded.
PUBLIC void HText_beginAnchor (HText * text, HTChildAnchor * anchor)
{
if (text && anchor) {
/* Get the application context for thie request */
Robot * mr = (Robot *) HTRequest_context(text->request);
/* Get the main destination of this anchor */
HTAnchor * dest = HTAnchor_followMainLink((HTAnchor *) anchor);
/* Take the parent of the main destination (may be itself) */
HTParentAnchor * dest_parent = HTAnchor_parent(dest);
/* Get the state of the parent anchor */
HyperDoc * hd = HTAnchor_document(dest_parent);
/* Test whether we already have a hyperdoc for this document
** We also check if we are at the specified bottom level of the tree
** for example 3 hops from the start page
*/
if (mr->flags & MR_LINK && dest_parent && !hd) {
/* Find the parent anchor which started this request */
HTParentAnchor * parent = HTRequest_parent(text->request);
/* See if this parent has a hyperdoc associated with it */
HyperDoc * last = HTAnchor_document(parent);
/* If it has then get the depth else assume top node (depth = 0) */
int depth = last ? last->depth+1 : 0;
/* Create a new request and put it into context */
HTRequest * newreq = Thread_new(mr, METHOD_GET);
/* This is where we need to make the link between this request
** and the previous request
*/
/* Create a new hyperdoc object and associate it with this anchor */
HyperDoc_new(mr, dest_parent, depth);
if (SHOW_MSG) {
char * uri = HTAnchor_address((HTAnchor *) dest_parent);
TTYPrint(TDEST, "Robot....... Loading `%s\'\n", uri);
free(uri);
}
/* Start the load */
if (HTLoadAnchor((HTAnchor *) dest_parent, newreq) != YES) {
if (SHOW_MSG) TTYPrint(TDEST, "Robot...... URI Not tested!\n");
Thread_delete(mr, newreq);
}
}
}
}
> Here I have another question :
> In the Robot, why is the start HyperDoc object not linked to
> the first anchor created in the Main function ? Perhaps for question of
> flexibility ?
You're correct that this is indeed missing - good point! This means that if
there is a link to itself in the start page then this will be loaded twice.
> Congratulations. You've made a very good job.
Thanks - so have you! I have included a set of patches below which fixed the
problems and which add
the ability of using depth as a means of specifying how many hops we are going
from the start page. You can also get the full updated version directly from
http://www.w3.org/pub/WWW/Robot/Implementation/HTRobot.c
It compiles without chanegs to the version of the Library that I announced two
days ago with the new set of Windows make files.
Henrik
0
\ /
-- CLIP -- CLIP -- CLIP -- x -- CLIP -- CLIP -- CLIP -- CLIP -- CLIP -- CLIP --
/ \
0
===================================================================
RCS file: /afs/w3.org/hypertext/WWW/Robot/Repository/Implementation/HTRobot.c,v
retrieving revision 1.6
diff -r1.6 HTRobot.c
34a35
> #define DEFAULT_DEPTH 0
270a272,274
> **
> ** BUG: This doesn't work as we don't get the right request object
> ** back from the event loop
275a280
> #if 0
277a283
> #endif
312a319
> char * uri = HTAnchor_address((HTAnchor *) dest_parent);
314a322,323
> if (SHOW_MSG) TTYPrint(TDEST, "Robot....... Found `%s\' - ", uri ? uri : "NULL");
>
322,325c331,338
< if (SHOW_MSG) {
< char * uri = HTAnchor_address((HTAnchor *) dest_parent);
< TTYPrint(TDEST, "Robot....... Loading `%s\'\n", uri);
< free(uri);
---
> HTRequest_setParent(newreq, HTRequest_anchor(text->request));
> if (depth >= mr->depth) {
> if (SHOW_MSG)
> TTYPrint(TDEST, "loading at depth %d using HEAD\n", depth);
> HTRequest_setMethod(newreq, METHOD_HEAD);
> HTRequest_setOutputFormat(newreq, WWW_MIME);
> } else {
> if (SHOW_MSG) TTYPrint(TDEST, "loading at depth %d\n", depth);
328c341
< if (SHOW_MSG) TTYPrint(TDEST, "Robot...... URI Not tested!\n");
---
> if (SHOW_MSG) TTYPrint(TDEST, "not tested!\n");
330a344,345
> } else {
> if (SHOW_MSG) TTYPrint(TDEST, "duplicate\n");
331a347
> FREE(uri);
464a481,482
> mr->depth = (arg+1 < argc && *argv[arg+1] != '-') ?
> atoi(argv[++arg]) : DEFAULT_DEPTH;
506a525
> HyperDoc_new(mr, mr->anchor, 0);