The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

A different slant on persistence?

Hi.

The java people write a lot of the enterprise patterns books. That's OK, because most of these patterns work fine regardless. However one difference is in how much gets loaded on each request. For example in Java it's worth loading a big chunk of the database because it will still be useful on furture requests. We win by cutting down database traffic. With a Java persistence layer, you feel as if you are working with a real object that happens to have the common sense to save itself to the database every now and again. It really does feel like a "persistent object".

In PHP that is a difficult illusion to sustain.

For one thing we usually want to load a minimal amount of data, and for another we are going to have to reconstruct a lot of objects in each request. Java doesn't have everything it's own way though. Multiple transactions will break the illusion by forcing multiple copies to exist in different states. Transactions are difficult to marry up with the persistence illusion and to manage this we have the UnitOfWork pattern.

Suppose we drop the persistence illusion and adopt another metaphor. Suppose we think instead of editing a file in a text editor. There is no attempt to pretend we have the one true file and we have to manually click "save" at the end. The saving is explicit. That's not that much extra work and fits with the hit and run model of PHP a little easier. In fact that is the UnitOfWork pattern in a nutshell. If we swap the idea of local copy for persistence, but keep the useful UnitOfWork pattern does it all still work?

Well we've tried it and it seems to work a treat . The code is included.

A warning: This code is prototype quality. There are some known issues, and although it will soon find it's way into our main code base it's missing essential components such as back-up and restore and schema migration. It's a proof of concept only and there is still a lot of refactoring to do. Some of it is really icky .

The features are multiple transactions, protection against double loading, XML schema leading to an object model that is completely database ID blind, many to one and one to many relations, type system (partially implemented). The downside is Mysql only right now, although not difficult to change, and very much untested. It uses DataAccessors (DAO) rather than DataMappers. This was a conscious decision to make lazy loading easier. Also the DB is not 100% isolated from the object schema (a couple of names dribble through). It is really designed for systems where you have complete control of the schema, rather than mapping to existing schemas. Querying is currently very limited, but easy to expand.

You can see the hardcore stuff from the changes_test.php script. You will need the C++ Xalan to run the tests without editing.

Basically the Change class is the UnitOfWork with knobs on. Here are the patterns...
1) UnitOfWork: The Change class.
2) IdentityMap: The Scope class.
3) DataAccessor: The Local class. Only slightly more than a RowDataGateway because of the collections.
4) Iterator: Localiser class.
5) MetadataMapping: See the test/support/changes_test.xml for an example schema.
6) IdentityField/Proxy: DeferredId class.
7) GenerationGap: You can inherit from the generated classes andhave yours substituted.

To create a persistent object you create a schema that looks something like...

For such simple objects you don't win much of course. The fun begins when you have collections and joined rows. You might be able to have tree structures, but I haven't really tested this. You can see this in the tests, but I'll be happy to post an example if someone can't run the tests.

Just to say again. If you were to use this code in a project you would be completely mad . It's a proof of concept only.

Although I am responsible for throwing most of the code together the design was a joint effort between myself, Peter Brown and Mike Mindel all of Wordtracker.

yours, Marcus

p.s. Why doesn't SitePoint allow .tar.gz files? You need to strip off the .doc before unpacking.

Persistentcy is the weak point in PHP against Java. The one issue I have is that script A can run, load up an object, do some stuff. Mean while, script instance B will load the same object, modify it and then save it back. Script A will then do some modifications and then save the same object, but totally lose the changes made by script instance B due to the side effect of "race conditions" created by the shared nothing architecture.

Anyway, I guess I'll try and find some time to look at this if I can. It would make a difference than playing about with putting some sort of shared memory or MySQL HEAP support near all the Data Access code.

Hmm, but I think the problem is present in Java also, when you use threading. In both cases (Java and PHP) you could implement a concurrence pattern, they are just implemented differently for each technology.

Btw, The API seems very interesting. I downloaded it but haven't had enough time to give it a propper look.

The one issue I have is that script A can run, load up an object, do some stuff. Mean while, script instance B will load the same object, modify it and then save it back. Script A will then do some modifications and then save the same object, but totally lose the changes made by script instance B due to the side effect of "race conditions" created by the shared nothing architecture.

Depending on the implementation of IdentityMap, this could be prevented.
An approach I took at a project, was to store the actual data of the ActiveRecord in a global array, using tablename+pkey as hash-identifier. The only data actually stored inside the class would be the pkey (witch is unchangeable btw.) and the tablename. This gives two benifits ; 1) solves the issue described in the above quote, since the two objects have shared data, and 2) counteracts unintentional objectcloning under php4.

Edit:

I didn't even read your post did I ?
I guess race-conditions between scripts isn't something we can really solve before we get an application-server structure into PHP.
Table-locking could do it ofcause, but without having solid proof, I suspect it to be inefficient.
Anyway - even though it's a recognizeable problem, it's fundamentally not something that's caused by the persistence-layer. The exact same problem exists even without.

Depending on the implementation of IdentityMap, this could be prevented.
An approach I took at a project, was to store the actual data of the ActiveRecord in a global array, using tablename+pkey as hash-identifier. The only data actually stored inside the class would be the pkey (witch is unchangeable btw.) and the tablename. This gives two benifits ; 1) solves the issue described in the above quote, since the two objects have shared data, and 2) counteracts unintentional objectcloning under php4.

Edit:

I didn't even read your post did I ?
I guess race-conditions between scripts isn't something we can really solve before we get an application-server structure into PHP.
Table-locking could do it ofcause, but without having solid proof, I suspect it to be inefficient.
Anyway - even though it's a recognizeable problem, it's fundamentally not something that's caused by the persistence-layer. The exact same problem exists even without.

"I didn't even read your post did I ?"

I always seem to do the same trick of misreading. Yeah I was thinking table locking would work, but that's far from ideal if you have a longish request. You would want to make sure locking happens for the least amount of time like transactions do in order to remove deadlocks

And even worse - if a script terminates prematurely, it may leave the lock hanging there.

I did like your idea of using a HEAP table for shared memory, though in this case it kind of miss the point, since that would mean you have to fetch from DB anyway. In regard to building an application-server, it's an interesting idea though. Is that your own invention, or did you get it from somewhere ?

Lol, you might want to read that sentence again (hint: querying without criteria is pretty much worthless)

Lastcraft: if this is a PHP-version of something like Java's Hibernate: congratulations, you seem to have done a good job! I tried something similar a year or two ago, got pretty far, but I never finished it because it became too complex.

is there a logic behind the class-naming, except your own personal preferences ? not that i dislike them, but they are different from what i've seen elsewhere. just curious really.

They are deliberately different. When designing it we wanted to get as far from the conventional wisdom as possible and really focus on the UnitOfWork idea. It's started very much as a proof of concept, that perhaps UnitOfWork is a more productive tool than Persistence. It's worked out so well we now want to use it...and that means I'll have to tidy up the code .

if this is a PHP-version of something like Java's Hibernate: congratulations, you seem to have done a good job!

It doesn't come close . the Change class is analogous to the Hiberate Session class. Hibernate uses a text based object query language, supports inheritance and uses DataMappers.

DataMappers always seem to pull too much data too early it seems to me. A full object query language with lot's of string manipulation would be too slow. Inheritance is something we could live without. The library is deliberatly lightweight. We just wanted to get to an object schema by the most direct route possible whilst still being transactional.

Regarding more sophisticated querying, I am not sure what we would add that wouldn't complicate the interface. The Description is a QueryObject pattern (forgot that one) and hasn't been filled out yet. We would add date ranges and other useful "business" like queries, as well as a raw query get out clause, in due course. I am open to suggestions though, especially regarding constraint clauses.

The main upcoming fix is to allow proper type returns. At the moment they are all string coercions, but I would like to have...

Code:

<field key="price"><money class="Money"/></field>

...and know that my choice of Money object was coming back. Not difficult, but time hasn't been available.

Also the library suffers from the bouncing ball phenomena. For example a cascading delete involves hopping around five classes or so with everybody delegating to everybody else before anything happens. Needs a clean up.

As you can see, there's some static horrible methods floating around, but overall it's functional.

With regards to two clients editing the same object, what I'm planning to implement is object locking, whereby if you loadObjectById you have an entry placed in a lock table with your id and the object you're editing. However, this requires some way of refreshing the lock when editing, which I'm still trying to come up with an invisible manner of doing.

it sounds like you've discovered OGWYN (Only Get What You Need). It doesn't just apply to data, it applies to source code and configuration as well: if you need to rebuild all objects on every request, doing the minumum amount of work and fetching the minumum amount of code/data/configuration becomes rather important for performance.

There's another big difference between PHP objects and Java objects: their life expectancy. Java objects can exist for hours or even days, and that's great, because it means their data and behaviour are available without having to be reloaded. But in PHP the whole point is to keep the lifetime of objects as short as possible, because objects only exist for the time it takes to process a single request, and we want to respond to each request ASAP. For most BusinessDomain-based objects, this often means they only use a small percentage of their behaviour during each lifetime, and every bit of behaviour they don't use during a request is basically redundant code (bloat).

This is one of the reasons why I moved away from the BusinessDomain and decided to base my application objects on the ApplicationDomain (remember this post?).

I've taken a look at your example code, and I have two questions/comments:

Would it make sense to rename your Change class to 'Action'? Because not every action is a change, yet every change is an action. And personally, I don't really see the change in fetching a list of users, but I can see the action there...

Most of the classes in changes_test.php appear to be a collection of getters/setters, and a few SQL/Storage related methods. If you refactored the latter into a form of datatype/table 'Manager' class, your data objects could be reduced to value objects. This would move you closer to a model of the ApplicationDomain.

Regarding DataMappers, yes.I like mappers when you have to get things in one go, such as from sessions or DBM constructs. SQL you can dribble the pieces out as needed.

Originally Posted by Azmo

This is one of the reasons why I moved away from the BusinessDomain and decided to base my application objects on the ApplicationDomain (remember this post?).

The application/domain divide is onme taht is not frequently discussed (everyone talks about 3-tier rather than 4-tier). Could you provide some eamples of where you are going. I probably agree, but I think I am cutting my app. layer a little higher.

Originally Posted by Azmo

I've taken a look at your example code, and I have two questions/comments:

Would it make sense to rename your Change class to 'Action'? Because not every action is a change, yet every change is an action. And personally, I don't really see the change in fetching a list of users, but I can see the action there...

Well Action is overused. It used to be called Edit, but we may settle back on the more conventional Transaction once it gets rolled out. I've never liked Session.

Originally Posted by Azmo

Most of the classes in changes_test.php appear to be a collection of getters/setters, and a few SQL/Storage related methods. If you refactored the latter into a form of datatype/table 'Manager' class, your data objects could be reduced to value objects. This would move you closer to a model of the ApplicationDomain.

One of the advantages of this set-up is that only the Change has to be committed, not the objects. This means taht the resulting objects can be passed around. I am thinking of making the Descriptions fetch the objects rather than having have to go back to the change. this would mean that the Descriptions could be passed around as well, always maintaining an invisible thread back to the Change.

Value objects gets us back into DataMappers which seem difficult to manage with incremental fetching of dependents.

I like the UOW for persistence aswell. I think in a INSERT/UPDATE/DELETE heavy environment if you can leave the call to commit() until after output is sent to the browser there are big advantages. Of course for this to happen you need a way to get Keys from the database, which is no easy task.

Could you comment on the decision not to put deletes in the UOW? I assume because you usually wont have the whole object in memory already?

Originally Posted by lastcraft

They are deliberately different. When designing it we wanted to get as far from the conventional wisdom as possible and really focus on the UnitOfWork idea.

When naming things I take the opposite approach and find it much easier to stick to conventional wisdom where naming is concerned. I would find the code a bit easier to read if Objects were named after their patterns such as Change named UnitOfWork, Description named Query etc.

I like this approach aswell. I think what you have in the Description class (functions like must equal) would probably be refactored into a Constraint (Criteria) class after a little anyway. A simple interface on Description is important though.

For anyone who wants to see what the code generated by this looks like and doesnt have Xalan but has php5/xsl, put this code in a file in the test directory and run it. The files will be created in the temp dir.

The Change class (UnitOfWork) has deletes, but not the DAOs which I think is what you mean, yes? I didn't add it to the DataAccessors because I have never understood the logic of loading an object so as to delete it. Allows block deletes this way as well.

Deleting an object which is being worked on is an excellent way to bring the system to it's knees though. Might have a separate class for that so as to force deletes and updates into separate transactions.

Originally Posted by Brenden Vickery

When naming things I take the opposite approach and find it much easier to stick to conventional wisdom where naming is concerned.

We just wanted to take a fresh look. As it gets factored into our system I am sure the names will get a lot less interesting .

The Change class (UnitOfWork) has deletes, but not the DAOs which I think is what you mean, yes? I didn't add it to the DataAccessors because I have never understood the logic of loading an object so as to delete it. Allows block deletes this way as well.

Deleting an object which is being worked on is an excellent way to bring the system to it's knees though. Might have a separate class for that so as to force deletes and updates into separate transactions.

Yup, makes sense.

I havent fully thought through this but I kind of think the advantage of UOW is a way to put the transaction after output of the HTML so users wait the smallest amount of time possible. Let the UOW have a preOutputCommit() and a postOutputCommit() or something.

PreOutputCommit takes all the new objs, inserts them and fills their ids. Some sort of key generator could provide a smaller preOutputCommit load, so you could move inserts into the postOutputCommit.

PostOutputCommit takes all the objs to be updated or deleted and happens after the output has been sent to the browser.

Am I making sense? Would the gains be noticeable? This may be a more framework specific application of the UOW.

PostOutputCommit takes all the objs to be updated or deleted and happens after the output has been sent to the browser.

At the moment, all of the action happens when the Change is committed. The DataAccessors (Local derivatives) stack up their changes until the write is due. Change::commit() broadcasts (via Scope) the actual commit message. It shuffles the commit order a little to minimise failures due to missing ids, but that's still roughly it. So I think it already does this.

The motivation was simply one of not forgetting to commit() an object. If everything comes from the same transaction (same database connnection) anyway, why not use that to remember what has to be committed? I hadn't thought of it as a way to optimise the web page.

What happens if the transaction fails and you have already sent a page signalling all is well?

Could you post some sample code showing how it would work with this set-up?

I think brenden pretty much hits the spot - nevertheless i mocked up something here. I haven't actually tried executing the code since i couldn't get your tests up running, and you wont be able to either, since i presuppose some methods in the connection-class witch you haven't got. They are pretty trivial though, so i wont bore you with details. It should make the point though.

At the moment, all of action happens when the Change is committed. The DataAccessors (Local derivatives) stack up their changes until the write is due. Change::commit() broadcasts (via Scope) the actual commit message. It shuffles the commit order a little to minimise failures due to missing ideas, but that's still roughly it. So I think it already does this.

The motivation was simply one of not forgetting to commit() an object. If everything comes from the same transaction (same database connnection) anyway, why not use that to remember what has to be committed? I hadn't thought of it as a way to optimise the web page.

What happens if the transaction fails and you have already sent a page signalling all is well?

yours, Marcus

Im not saying dont use the UOW to keep track of changes that need to be made to the database. Im saying that a possible extention of the usefullness would be to optimise the web page. Just food for thought.

The point about transactions failing is a good one. Easy answer is to send a redirect, but thats not ideal. What happens now when the transaction fails? I dont work much with transactions so Im not even sure what types of things would cause this, but I can see how it would be a problem.

I dont work much with transactions so Im not even sure what types of things would cause this, but I can see how it would be a problem.

Basically either another request has done something so that your own view of teh data is incorrect at the time of commit (optimistic locking or the highest isolation level). Unfortunately MySQL is pessimistic and will blow you out early complaining of locked rows.

The library is not yet battle tested with this, but it should eventually return false from the commit() allowing the display of a different page. There are heaps of things that could go wrong.

For example after the failure I have a feeling that MySQL issues an automatic rollback. That means subsequent queries will go out directly . As everything happens in the Change object it should be a doddle to manage this.

It should also be possible to retry the commit in case the conflict was temporary. I have to make some changes to Local to affect this, but in practice you only get a few extra percent of successes on retries. Thus I skipped it, especially as all of the insertion and updating is done at the end in one go which should eliminate this type of conflict.

Ah OK, I think I get it. We have an the extra flex point of being able to extend the where statements by creating new Criteria classes rather than being restricted to what is available in the Description class. It effectively makes the Description open for modification.

That makes sense if this were any kind of official library. As an in-house job we can just add methods to Description so we don't actually need it .

The code published here is public domain by the way. Feel free to steal it and write something much better...