lynx-dev cache control (was: Keeping browsers from caching)

From:

Klaus Weide

Subject:

lynx-dev cache control (was: Keeping browsers from caching)

Date:

Sat, 11 Sep 1999 21:42:50 -0500 (CDT)

On Fri, 10 Sep 1999, Philip Webb wrote:
> 990910 Klaus Weide wrote:
> > On Fri, 10 Sep 1999, someone wrote:
> > > how can I tell Lynx to ignore Cache-Control?
> > You can't, unless you want to change the source
>
> yes i do, at least for my to-do list,
> if i ever find time to start exploring Lynx source:
> i'ld like to be able to say, eg in lynx.cfg ,
> IGNORE_CACHE_CONTROL:YES
> do you have any pointers, comments or suggestions?
Well I'd have lots if you were going to start doing something now...
There should be much more detailed control; just one binary flag
doesn't cut it...
Lynx should implement the right checks for when to reload a document
from the network. It should check whether its copy is 'fresh' according
to the right logic, which is in the HTTP specs. Even doing only part
of all that (there are numerous Cache-control directives and other
relevant headers) would be a worthy improvement.
The current logic isn't very good when it comes to look at Expires
headers (when there is no 'no-cache' in other headers). Other
relevant info is completely ignored (other Cache-control directives
than no-cache, Lat-Modified headers...) All cacheableness is just
collapsed by lynx into one no_cache flag, which is set to either
TRUE or FALSE at the time when a document is received (or sometimes
before, when we create the request). That's a poor model of what
should happen.
For example if Expires is present, the expiration time is compared
to the current _local_ time (after taking account of the timezone
of course). If it appears to be in the past, the doc is marked
no_cache. If it is in the future it isn't. For small time differences
this depends on clocks being "right", if they aren't exactly
synchronized it can lead to arbitrary results. Or, if a server sets
a small lifetime like "expires 2 minutes in the future", lynx should
really honor that (at least have mode where it does). But it doesn't,
the copy is non-no_cache as long as we keep it. Such "softer"
expirations, as well as often pre-expiration (Expires has a time in the
past), are in general more worth honoring than the blatant "no-cache" -
somebody has thought about it, there may even be a good reason.
And so on. As well as a more detailed mechanism, there should then
be more options to control the behavior. Not just whether to
honor no-cache and Expires or not, but possibly which of them,
ignore them only in META (where the more clueless authors put them),
specify a max-stale or min-fresh time (by how much do I want to
keep a document longer than the server said it is "fresh"), adjustable
heuristics for documents with Last-Modified...
That's only part of the complex. In addition to controlling when
_we_ refetch the document, we also need to determine which headers
and directives to put in the request. Sending no-cache or not makes
an important difference for users of proxy caches (not to mention other
more subtle Cache-control request directives).
Then there's If-Modified-Since which we don't send and 304 responses.
There has been rather detailed discussion of this area before.
(Very roughly around Oct - Dec 98, IIRC - and probably other times)
Is that enough "pointers, comments or suggestions" for now? :)
Klaus