std blocks vs blocks+cache ( was: block behavior)

std blocks vs blocks+cache ( was: block behavior)

Author

Message

Astro#1 / 71

std blocks vs blocks+cache ( was: block behavior)

Quote:

>> >> > 4IM is intentionally not ANS compliant in many ways. BUFFER is a >> >> > constant that points to the beginning of the only memory area where >> >> > a BLOCK is read into. BLOCK takes a number but doesn't leave anything >> >> > on the stack. There's a separate cache that keeps a block from being >> >> > read from disk more than once, but you can only access the current >> >> > block, and only at BUFFER.

>> >> Oh dear.

>> > Please elaborate. Do you think it is worst than the classic behaviour >> > or worst than the behaviour of the standard?

>Yes, much worse than both (they aren't mutually inconsistent).

>> Given that the standard more or less codifies the classic behaviour >> this is certainly worse, yes. Having only a single buffer results in >> some very {*filter*} behaviour. Load blocks don't work too well, for >> example. Database indexes too.

From the standard:

7.3.3 Block buffer regions The address of a block buffer returned by BLOCK or BUFFER is transient. A call to BLOCK or BUFFER may render a previously-obtained block-buffer address invalid, as may a call to any word that:

parses [...] displays characters on the user output device [...] controls the user output device, [...] receives or tests for the presence of characters from the user input device such as [...] waits for a condition or event, such as [...] manages the block buffers, such as [...] performs any operation on a file or file-name directory that implies I/O, such as [...] implicitly performs I/O, such as text interpreter nesting and un-nesting when files are being used (including un-nesting implied by THROW [...]

If I interpret this correctly this means that you have, among other things, to save a copy of the buffer if you want to print it out. A standard system should also not rely on the availability of multiple buffers in the system. In other words the user has to copy in a safe place it's buffer because it may vanish if the washing machine turns on. If one takes advantage of particular knowledge of it's system, in partcular the conditions under which the buffer actually go fishing and how many buffers can be accessed simultaneously, things are better because one can spare the cost of the copy thanks to informed assumptions.

Quote:

>The classic/standard behavior is the result of efforts over the years >to produce optimal performance and utility. Actual reads and writes

I'm not sensitive to this kind of argument. There's several obsolete words in the standard.

Quote:

>are minimized (because that's what determines performance), and >provision is made for a number of variations that support further >optimization based on knowledge of how a system will be used. >For example, in a large multiuser system, a large number of buffers >will ensure that more blocks can remain in memory longer. When >there will be many sequential operations, however, a small number >of buffers (e.g. 2) gives the best performance. A single buffer >such as you have pretty much ensures the worst possible performance.

My interpretation of the standard is wrong, then ? "A call to BLOCK or BUFFER may render a previously-obtained block-buffer address invalid, as may a call to any word that: [wash dishes]"

Quote:

>Cheers, >Elizabeth

The interface I have adopted for BUFFER and BLOCK is design as if only one buffer is available. It simplifies greatly user applications because the block buffer is at a constant location. On the system side this does not prevent to have a block cache to guarantee good performance. I have implemented such a cache on the standalone version. It features write-thru and read-ahead and it works pretty well. From my experience in also simplifies the cache code. Regarding the DOS version I did not care about having a disk cache, because an external one ( smartdrive for instance) can be used.

Besides, I think that the copy of the buffer required to compensate the lack of multiple block buffers is compensated by the fact that one don't have to drag in and out the addresses, and multiple calls to BUFFER. If someone could be so kind to provide me sample code for a simple database management, I could verify it by myself.

Amicalement, Astrobe

Sat, 15 Oct 2005 20:06:10 GMT

Michael L Gassanenk#2 / 71

std blocks vs blocks+cache ( was: block behavior)

Quote:

> The interface I have adopted for BUFFER and BLOCK is design as if only > one buffer is available. It simplifies greatly user applications > because the block buffer is at a constant location.

If your code has the dependency on single buffer at constant location, your code will most likely break on other systems.

I myself (years ago) found it valuable to have at least 3 buffers: 2 are needed to copy one block to the other and 1 more is needed for input source interpretation.

As to writing portable code, you have to call BLOCK before

at the same time. BTW, even if you have 2 buffers, you can UPDATE only one block.

> On the system side this does not prevent to have a block cache to > guarantee good performance. I have implemented such a cache on the > standalone version.

I think, the manufacturer of your HDD has already implemented a cache of 2M bytes.

Quote:

> If someone could be so kind to provide me sample code for a > simple database management, I could verify it by myself.

Do you mean block or DB code? If you are asking for a sampel BLOCK word set implementation, there are many around.

Sat, 15 Oct 2005 23:45:48 GMT

andre..#3 / 71

std blocks vs blocks+cache ( was: block behavior)

Quote:

>> The interface I have adopted for BUFFER and BLOCK is design as if only >> one buffer is available. It simplifies greatly user applications >> because the block buffer is at a constant location. > If your code has the dependency on single buffer at constant location, > your code will most likely break on other systems.

Not if it's correct.

Quote:

> I myself (years ago) found it valuable to have at least 3 buffers: > 2 are needed to copy one block to the other and 1 more is needed > for input source interpretation. > As to writing portable code, you have to call BLOCK before

> at the same time. BTW, even if you have 2 buffers, you can UPDATE > only one block. > For example, the code for exchanging contents of 2 blocks becomes:

Above you say "you cannot rely on 2 blocks being in memory at the same time", so you know this can't work.

Andrew.

Sat, 15 Oct 2005 23:07:38 GMT

Jonah Thoma#4 / 71

std blocks vs blocks+cache ( was: block behavior)

Quote:

> From the standard: > 7.3.3 Block buffer regions > The address of a block buffer returned by > BLOCK or > BUFFER is transient. > A call to BLOCK or BUFFER may render a previously-obtained > block-buffer > address invalid, as may a call to any word that: > parses [...] > displays characters on the user output device [...] > controls the user output device, [...] > receives or tests for the presence of characters from the user > input device such as [...] > waits for a condition or event, such as [...] > manages the block buffers, such as [...] > performs any operation on a file or file-name directory that > implies I/O, such as [...] > implicitly performs I/O, such as text interpreter nesting and > un-nesting when files are being used (including un-nesting implied > by THROW [...] > If I interpret this correctly this means that you have, among other > things, to save a copy of the buffer if you want to print it out.

No, it doesn't.

For example, you can do 26 LIST and it does BLOCK and displays the buffer as text. Say you want to do it yourself, for some reason you only want to display lines 3 and 12.

: .LINE ( block# line# -- ) 64 CHARS * SWAP BLOCK + 64 TYPE ;

: MY-LIST ( block# -- ) DUP CR 3 .LINE CR 12 .LINE ;

You don't save the buffer contents, you just get them fresh just before you use them. Any time you do something that might mess up the buffer, grab it again.

The list of things that might mess up the buffer are the same things that will do PAUSE and a task switch on a traditional cooperative multitasker. When you do BLOCK you get a task switch, and when your task starts up again your buffer is ready. Then you can use it for awhile, and the first time you do any sort of I/O some other task starts up and maybe replaces your block. (I'm not sure that's how it works but it looks plausible.) So you get your block back and use it. No big deal.

No matter how many block buffers you have, you can't be sure that some block buffer you care about won't get overwritten. So it makes sense to do BLOCK before you use the buffer, each time.

Quote:

> A standard system should also not rely on the availability of > multiple buffers in the system. In other words the user has to copy > in a safe place it's buffer because it may vanish if the washing > machine turns on.

The washing machine won't turn on until you do PAUSE . Your buffer will last until after you've used it, unless you mistakenly do a CR or something before you use it.

The korean hForth was the first one I saw that used a single block buffer. I was surprised to find it can work with just one buffer, but it can. The thing that gave me the most trouble about a single block buffer was that sometimes I wanted to MOVE data from one block to another, and it just doesn't work like that although that works when you have 2 buffers.

Sat, 15 Oct 2005 23:31:38 GMT

Bill Parke#5 / 71

std blocks vs blocks+cache ( was: block behavior)

Quote:

> The technique is not to keep track of the address of the buffer but to > call BLOCK with the block number whenever the address is needed.

I believe I understand now what to expect when using 4IM. Your statement above makes me curious about what happens in other FORTHs which are more aligned to the ANSI standard.

If one uses BLOCK to retrieve the address of the buffer, what happens to changes you have made to the buffer but have not necessarily decided to commit via UPDATE yet? Assuming no other disk I/O is done in-between, is each call to BLOCK guaranteed to return the same buffer address? Will that buffer still contain your uncommitted changes or will it always be reloaded with the original disk contents?

Quote:

> > On the system side this does not prevent to have a block cache to > > guarantee good performance.

> Sure it does. If there is only one buffer, you have no cache. > Unless, of course, you want to do yet another layer of buffering > beneath the Forth buffers, and what's the point of that? You've made > the system more complex for no gain.

For what it is worth, I'm thinking there may be an advantage for my intended use. It appears that Astrobe has minimized the required footprint in the x86 64K code/data segment. For me, this is going to be a big deal and had I started with a system which allocated multiple buffers then I would have ended up removing them anyway sooner or later. On the other hand, I would have gladly lived without a disk cache. Now, 4IM has me thinking that the disk cache buffers can just live up in all those other unused 64K segments similar to what would happen if I was running DOS and a RAMDISK in HIGH memory.

I could be wrong about this...I'm still real early in the process here.

Sun, 16 Oct 2005 00:02:36 GMT

andre..#6 / 71

std blocks vs blocks+cache ( was: block behavior)

Quote:

> If one uses BLOCK to retrieve the address of the buffer, what > happens to changes you have made to the buffer but have not > necessarily decided to commit via UPDATE yet?

You'll lose them. Or rather, you might lose them.

Quote:

> Assuming no other disk I/O is done in-between, is each call to BLOCK > guaranteed to return the same buffer address? Will that buffer > still contain your uncommitted changes or will it always be reloaded > with the original disk contents?

It depends on what other system activity is going on.

Andrew.

Sun, 16 Oct 2005 00:14:15 GMT

andre..#7 / 71

std blocks vs blocks+cache ( was: block behavior)

Quote:

> No, it doesn't. > For example, you can do 26 LIST and it does BLOCK and displays the > buffer as text. Say you want to do it yourself, for some reason you > only want to display lines 3 and 12. > : .LINE ( block# line# -- ) > 64 CHARS * > SWAP BLOCK + 64 TYPE ; > : MY-LIST ( block# -- ) > DUP CR 3 .LINE CR 12 .LINE ; > You don't save the buffer contents, you just get them fresh just > before you use them. Any time you do something that might mess up the > buffer, grab it again.

I don't think it's safe to TYPE out of a buffer.

Andrew.

Sun, 16 Oct 2005 00:15:33 GMT

Elizabeth D Rathe#8 / 71

std blocks vs blocks+cache ( was: block behavior)

Quote:

>> ... >>For example, you can do 26 LIST and it does BLOCK and displays the >>buffer as text. Say you want to do it yourself, for some reason you >>only want to display lines 3 and 12.

>>: .LINE ( block# line# -- ) >> 64 CHARS * >> SWAP BLOCK + 64 TYPE ;

>>: MY-LIST ( block# -- ) >> DUP CR 3 .LINE CR 12 .LINE ;

>>You don't save the buffer contents, you just get them fresh just >>before you use them. Any time you do something that might mess up the >>buffer, grab it again.

> I don't think it's safe to TYPE out of a buffer.

To amplify Andrew's comment a little, the whole issue here revolves around the historical principle that in a multiuser Forth the users share the buffer pool. A buffer address is valid so long as you don't do anything that can let another user request a buffer (and maybe get yours). The list of prohibitions in the standard includes words that may involve a task swap.

In the case of TYPE, a multitasking system would certainly relinquish the CPU for the typing. Therefore, your buffer address is no longer valid.

Our approach has been to make a word >TYPE that can be used to type from a buffer, moving the string to be typed to a temporary location.

Jonah is correct in observing that one should call BLOCK for every component you want, rather than assuming your buffer address will persist over time.

"Forth-based products and Services for real-time applications since 1973." ==================================================

Sun, 16 Oct 2005 01:30:23 GMT

Elizabeth D Rathe#9 / 71

std blocks vs blocks+cache ( was: block behavior)

Quote:

> [summary of Standard cautions snipped] >>If I interpret this correctly this means that you have, among other >>things, to save a copy of the buffer >>if you want to print it out.

As noted above, a better approach is to buffer each string you're typing.

Quote:

> A standard system should also not rely on >>the availability of multiple buffers in the system.

> This isn't a standards compliance issue. This is a quality of > implementation issue. A system with only one block buffer will in > many cases perform poorly.

Moreover, a _system_ can depend on its own implementation characteristics. It's only _programs_ that wish to be portable that need to be sensitive about portability issues. A system should provide whatever resources it thinks can guarantee good performance for users.

Quote:

>>>The classic/standard behavior is the result of efforts over the years >>>to produce optimal performance and utility. Actual reads and writes

>>I'm not sensitive to this kind of argument.

A Standard isn't a tutorial. Many books on Forth Iincluding mine) explain the assumptions around the block I/O concept.

Quote:

>>>are minimized (because that's what determines performance), and >>>provision is made for a number of variations that support further >>>optimization based on knowledge of how a system will be used. >>>For example, in a large multiuser system, a large number of buffers >>>will ensure that more blocks can remain in memory longer. When >>>there will be many sequential operations, however, a small number >>>of buffers (e.g. 2) gives the best performance. A single buffer >>>such as you have pretty much ensures the worst possible performance.

>>My interpretation of the standard is wrong, then ? >>"A call to BLOCK or BUFFER may render a previously-obtained >>block-buffer address invalid, as may a call to any word that:

>>[wash dishes]"

The key word here is "may". In a multiuser system, it's a statistical game that depends on what other users are doing. A block will stay in a buffer until it is reused for another block; therefore, repeated calls to BLOCK for a block# which is already in a buffer won't cause I/O. In a single-user system, there's no competition for buffers.

Quote:

>>The interface I have adopted for BUFFER and BLOCK is design as if only >>one buffer is available. It simplifies greatly user applications >>because the block buffer is at a constant location.

> It doesn't, unless the application is exceedingly badly written. The > technique is not to keep track of the address of the buffer but to > call BLOCK with the block number whenever the address is needed.

Exactly. The application can't assume the address is constant without having a dependency on the unusual implementation strategy of having only one buffer. And if you're doing something like copying data from one block to another, having only one buffer will slow you down a lot. As I noted above, repeated calls to BLOCK for a block already in a buffer doesn't cause more I/O, it should be extremely quick.

Quote:

>>On the system side this does not prevent to have a block cache to >>guarantee good performance.

> Sure it does. If there is only one buffer, you have no cache. > Unless, of course, you want to do yet another layer of buffering > beneath the Forth buffers, and what's the point of that? You've made > the system more complex for no gain.

The buffer scheme is intended as a cache. That's its primary purpose.

Quote:

>>I have implemented such a cache on the standalone version.

> So why not have a few block buffers and be done with it? Really, this > is ridiculous. Have you ever written an application on a system that > uses the standard scheme?

Having a single buffer and a separate, invisible cache is an added layer of complexity and inefficiency that you can avoid by understanding the main purpose of the buffer scheme to begin with.

"Forth-based products and Services for real-time applications since 1973." ==================================================

Sun, 16 Oct 2005 01:51:04 GMT

Elizabeth D Rathe#10 / 71

std blocks vs blocks+cache ( was: block behavior)

Quote:

>>If one uses BLOCK to retrieve the address of the buffer, what >>happens to changes you have made to the buffer but have not >>necessarily decided to commit via UPDATE yet?

> You'll lose them. Or rather, you might lose them.

Exactly. One should use UPDATE immediately following any change (!, C!, MOVE, etc.) that you wish to ensure is recorded permanently.

UPDATE merely sets a flag, it isn't expensive since it doesn't trigger a write until the buffer is needed for a different block.

Quote:

>>Assuming no other disk I/O is done in-between, is each call to BLOCK >>guaranteed to return the same buffer address?

Yes. In a single-user system, you know whether or not you're doing other disk I/O. In a multi-user system, other tasks may be accessing the buffer pool. Subsequent calls to BLOCK for a block already in a buffer should return that address with no I/O done.

Quote:

>>Will that buffer >>still contain your uncommitted changes or will it always be reloaded >>with the original disk contents?

> It depends on what other system activity is going on.

Exactly. So long as your buffer has not been needed for another block, everything is unchanged (including your updates). However, if you make changes and do _not_ mark the block as UPDATED, if the buffer is reused changes will not be recorded and the next time you request the block you'll get the last recorded version.

"Forth-based products and Services for real-time applications since 1973." ==================================================

Sun, 16 Oct 2005 02:05:54 GMT

Bill Parke#11 / 71

std blocks vs blocks+cache ( was: block behavior)

Quote:

> >>Assuming no other disk I/O is done in-between, is each call to BLOCK > >>guaranteed to return the same buffer address?

> Yes. In a single-user system, you know whether or not you're > doing other disk I/O. In a multi-user system, other tasks may > be accessing the buffer pool. Subsequent calls to BLOCK for > a block already in a buffer should return that address with > no I/O done.

> >>Will that buffer > >>still contain your uncommitted changes or will it always be reloaded > >>with the original disk contents?

> > It depends on what other system activity is going on.

> Exactly. So long as your buffer has not been needed for another > block, everything is unchanged (including your updates). However, > if you make changes and do _not_ mark the block as UPDATED, if > the buffer is reused changes will not be recorded and the next > time you request the block you'll get the last recorded version.

Think I've got it - it all seems to make sense. I hadn't even considered the issue of a multi-user/multi-tasking environment. (Which is why you won't ever see me on a standard's committee...Lord knows that it's hard enough just doing a good job solving today's problem for today's customer - I can't imagine trying to pull off the balancing job required to address almost everybody's concerns across the spectrum of hardware and environments. More power to those of you who can contribute in that way!)

Thanks, Bill

Sun, 16 Oct 2005 02:52:43 GMT

Jonah Thoma#12 / 71

std blocks vs blocks+cache ( was: block behavior)

Quote:

> > No, it doesn't. > > For example, you can do 26 LIST and it does BLOCK and displays > > the buffer as text. Say you want to do it yourself, for some > > reason you only want to display lines 3 and 12. > > : .LINE ( block# line# -- ) > > 64 CHARS * > > SWAP BLOCK + 64 TYPE ; > > : MY-LIST ( block# -- ) > > DUP CR 3 .LINE CR 12 .LINE ; > > You don't save the buffer contents, you just get them fresh just > > before you use them. Any time you do something that might mess > > up the buffer, grab it again. > I don't think it's safe to TYPE out of a buffer.

You could be right. If TYPE is a high-level word that uses EMIT and EMIT does PAUSE then you're right. If TYPE is a primitive that does PAUSE when it's done, then it works.

I figured it *ought* to be safe because it gets too inconvenient if it isn't. I could be wrong.

Sun, 16 Oct 2005 02:54:43 GMT

Elizabeth D Rathe#13 / 71

std blocks vs blocks+cache ( was: block behavior)

...

Quote:

>>I don't think it's safe to TYPE out of a buffer.

> You could be right. If TYPE is a high-level word that uses EMIT and > EMIT does PAUSE then you're right. If TYPE is a primitive that does > PAUSE when it's done, then it works.

> I figured it *ought* to be safe because it gets too inconvenient if it > isn't. I could be wrong.

Among the special circumstances called out in the Standard & quoted at the beginning of this thread is:

> ... >>> I don't think it's safe to TYPE out of a buffer. >> You could be right. If TYPE is a high-level word that uses EMIT and >> EMIT does PAUSE then you're right. If TYPE is a primitive that does >> PAUSE when it's done, then it works. >> I figured it *ought* to be safe because it gets too inconvenient if it >> isn't. I could be wrong. > Among the special circumstances called out in the Standard & quoted at > the beginning of this thread is: > "displays characters on the user output device [...]" > Sounds like TYPE to me.

Yes, I wanted to hope that TYPE would do PAUSE after it gets the string, and not before.