NX: When to update cached data

Hugh Johns

Ranch Hand

Posts: 36

posted 13 years ago

Hi After reading some posts on using cached data to avoid the problems with some exception throwing within the confines of the DBAccess interface, I now cache the database records into a collection, and read, update etc from the collection. My question is when to update the database file with the contents from the cached collection. I am doing it at the end of updating a record, but should also do it on shutting down the server. My plan was to implement a finalize() method in the Data class, which updates the database file when the Data object is destroyed ( would this cover server shutdown ?). But I have read in some java texts that the finalize() method should not be relied. Ideas on when to update the database file from the cache

Hi Hugh, Personally if I were using a cache, I would have a separate thread writing the updates to the file as records were updated by clients. This way you still have the performance benefit of caching, plus time required to shut down will be minimised, plus if there is a real problem with writing to the database you will become aware of it sooner - so there is less chance of loosing hundreds of transactions. Regards, Andrew

Hi Andrew and Tony,Andrew: Personally if I were using a cache, I would have a separate thread writing the updates to the file as records were updated by clients. That's what I was doing in a first design : a read cache and a write one (just a linked list of pending writes), records still to be written being "fixed" while not yet written by my writer thread. Finally I changed my mind for simplicity : I have only a read cache, and create / update / delete update the cache and perform the write in the database file directly. The problem I had with a writer thread was to find an elegant way of handling IOExceptions : from its run() method it was impossible to throw it of course, nobody being there to catch them. *** Free Ad **** Free Ad *** Free Ad ***Anybody interested in a sockets discussion ? ********************************* Best, Phil. [ September 09, 2003: Message edited by: Philippe Maquet ]

Max Habibi

town drunk( and author)
Sheriff

Posts: 4118

posted 13 years ago

Originally posted by Andrew Monkhouse: Hi Hugh, Personally if I were using a cache, I would have a separate thread writing the updates to the file as records were updated by clients.

I hate to disagree with Andrew, but I think I have to here. For this assignment, I think the most straightforward approach is to update the cache directly. For one thing, If there's a problem, you want to crash right away, not at some later point when the client things all's well. For another, it's simpler to follow All best, M

Wow, I get to agree with Max for once. In addition to his stated reasons, I'll add that even if you're inclined to worry about performance, I really don't think the performance of update() is a big concern. How often is it going to get called, really? The methods that concern me are find() and read() - because every time find() is called you read() every single record. The big benefit to caching all records is that it allows you to perform a find() in a fraction of the time it takes if you read each record from a file. If the update method is "slow" by comparison, because it blocks until the update is written to a file - so what? Updates are a tiny fraction of the total server load; as long as caching speeds up find() we've already achieved 99% of the benefit of caching. Writing updates to file as they occur is a very simple and reliable technique, and has negligible performance impact, IMO.

"I'm not back." - Bill Harding, Twister

Max Habibi

town drunk( and author)
Sheriff

Posts: 4118

posted 13 years ago

Originally posted by Jim Yingst: Wow, I get to agree with Max for once.

Personaly i think caching would be very good for a database, but the assingment does not require caching. If you implement caching, you also need to implement a change log. All update statements are written to the change log so you will not loose any changes. An other thread would then write all changes from the change log to the database file in a timely batch or something. This is probebly the only way to asure the ACID properties of your database. When your JVM crashed (power interrupt) the first thing to do on startup is to commit the changelog to the database. Thats ACID. When implementing a changelog a lot of complexity is added to the database that was not required in the assignment. So i doubt if this is rewarded by the sun judges.

Hi everyone, I can agree with Max's comment about wanting immediate notification if the booking cannot complete - for this reason you may want to only have read caching. But "easier to follow"? Surly no caching at all is easiest to follow Sorry Jim - I don't buy your "limited performance gain" reasoning. I think the only reason you have caching is to improve performance. In which case why would you only do half the job? Sure, there are going to be more reads than writes. But how do we know how many bookings per day / hour / minute / second are going to occur? So how do we know how big an impact the writes are going to occur? Personally I don't think that caching is required for this assignment. If we really want (and need) performance, we would probably go for a commercial database. Arjan - regarding your comment about the change log. Everyone here seems to be saying that they are only having a read cache. In which case there is no need for a change log. It is only me that feels that if you are doing caching that you might want a write cache as well. If we did have a write cache, and we implemented a change log, then we might overcome Max's concern about not getting immediate notification of update failure. But you are quite right - this is going into a level of complexity way beyond requirements, with no benefit to our possible score. Regards, Andrew

But "easier to follow"? Surly no caching at all is easiest to follow Yes, but only marginally. Regardless, Max wasn't comparing caching vs. no caching - he was comparing write-immediately-on-change vs. wrintg updates from a separate thread. Write-on-change is easily understandable even to a junior programmer with no understanding of threads whatsoever. Well, they may wonder what the synch blocks do, but if they just ignore those, they can understand the flow fine - it's nice and linear. Writing from a separate thread is less linear, and introduces issues like what happens if the JVM exits while the write queue is not yet empty. There's a bigger window of opportunity for something to go wrong, unless you implement change logs, etc, and I doubt anyone wants to do that for this assignment.Sorry Jim - I don't buy your "limited performance gain" reasoning. I think the only reason you have caching is to improve performance. In which case why would you only do half the job? Because I believe I did the half that buys me big performance gains at negligible complexity cost. The other half would be negligible performance gains at moderate complexity cost. Seemed like a good place to stop.Sure, there are going to be more reads than writes. But how do we know how many bookings per day / hour / minute / second are going to occur? So how do we know how big an impact the writes are going to occur? Well, it's an assumption, true, one which I document in my design doc. But I think it's a pretty good one. The key issue for me here is that a single find() has N times as much impact as any other method call. (N being the number of records.) If N is small, well, performance issues are negligible anyway. It's when N is large that we might care about performance at all - and even if users perform only one find() per 1000 update()/create()/delete()s - once N exceeds 1000, well, the find()s are more important.Personally I don't think that caching is required for this assignment. I think we're all agreed on this point.If we really want (and need) performance, we would probably go for a commercial database. Sure, if the need is great enough, and the company feels like spending the money later. But if we can get notable performance benefits now without incurring increased complexity, why not?Arjan - regarding your comment about the change log. Everyone here seems to be saying that they are only having a read cache. In which case there is no need for a change log. It is only me that feels that if you are doing caching that you might want a write cache as well. Well I'll agree that if you do caching and you're writing updates from a separate thread, you should at least consider a change log. You've got a substantially increased chance of a an error or shutdown preventing the DB from being updated properly (without the client being aware of the problem.) With write-on-change the level of risk is nearly the same as for any no-cache solution. E.g. what if the power goes out in the middle of a write? Well, that record is probably broken now - but at least the client got an error message about it. If that's not good enough, then all solutions need change logging - not just those that use caching.If we did have a write cache, and we implemented a change log, then we might overcome Max's concern about not getting immediate notification of update failure. But you are quite right - this is going into a level of complexity way beyond requirements, with no benefit to our possible score. Agreed, for write caching. Read caching is quite unlikely to benefit my score either, true, but the complexity is very low, so what the heck. [ September 10, 2003: Message edited by: Jim Yingst ]