The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

PHP Hibernate, OQL and other goodies

A question came up on a recent thread (due to various circumstances understood by some I shan't link to it) the thought of a possible way to create a PHP port of Hibernate came up. I thought I would kick things off by starting a new thread to do some brainstorming.

Just to get anyone interested started I'll post a few links so y'all (pardon my vernacular, my wife is from Texas ) can get up to speed. I would like to offer more information than just links but I honestly would have to read up on the technologies before I would purport to make any kind of an authoritative statement on any of them. I just pulled these off of a few quick searches in google, it'll probably save some time for some of you in the same boat as me.

It has already been noted that one of the harder things to tackle would be that we would need to parse the OQL. The best solution to this would probably be a lexer as Marcus aptly pointed out. Fortunately for us Marcus is quite versed in this area. So he'll be able to give alot more information on that. I'll repost when I have some knowledge worth sharing :P

I would like to contribute to such a project if my help is wanted
I have already written some parts of a non-intrusive O/R mapper like Hibernate/OJB including metadata parsing (read from an XML-File and cached on the disk) and generating SQL from a Query Object/Criteria including 1:1, 1:n and m:n associations.

I'd like to share my code and experiences here if anybody wants to see more.

interesting thread... for my personal object persistence framework i used a relativly simple regular expression to split the oql (well sort of) into an array and use this for further processing with good results. i was able to query rdbms and xml files by transforming the "token array" to the target language (sql or xpath).

How do generate joins if there's no direct relation between the entities, e.g. entity A has a relation to B and B to C, but only A and C exist in the query? Could this be done automatically or would the client have to supply more information?

How do generate joins if there's no direct relation between the entities, e.g. entity A has a relation to B and B to C, but only A and C exist in the query? Could this be done automatically or would the client have to supply more information?

i think it should be possible to traverse the relations and see if we can supply enough informations to build the appropiate joins. in its current stage only direct mapings are allowed.

cheers
Chris.

ps. what i actually have is a working prototype.. currently i am refactoring the whole stuff and writing unit tests.

interesting thread... for my personal object persistence framework i used a relativly simple regular expression to split the oql (well sort of) into an array and use this for further processing with good results. i was able to query rdbms and xml files by transforming the "token array" to the target language (sql or xpath).

This is a very cool feature.

Originally Posted by sike

i decided not to use a parser to minimize the overhead - i really think we should benchmark this and choose the faster one.

I think we ended up doing the same sort of thing for parsing our OQL. Im not sure if anything other than a lexer will stand up against a full blown OQL. Benchmarking is always the way to go though.

Originally Posted by hantu

In my eyes the problems is when we're traversing the relations we could find two ways to a needed entity in some cases. So it wouldn't be possible to decide how to join.

I thought maybe deeper "path expressions" might be a solution for that, like
Employee.Category.Wage > 10 or something like that.

I agree, the ability to move through the object model in a query is needed.

As for relations my current thinking is this. With a model like below:

Brenden, Thanks for posting code. This has piqued my interest as well.

Are you going straight for OQL? Or might it be better to implement something like Hibernate's $uow->createQuery('User', Expression.eq('User.firstname', '?')); first and get the lower level stuff working first before diving into the lexer.

Also, I think that rather creating a full Hibernate or Propel, this thread should create the minimal classes necessary to build a full framework. Then see where people take it.

The third parameter in the createQuery function shows the depth at which the object model should traverse. An array could also be passed.

Since you're basing this on Hibernate why not just emulate their createQuery and createCriteria API? This also has the advantage of setting the lazy relationship flag to true or false in your mapping file or setting it to an eager join. Then you could have something like this:
*NOTE: Using a mixture of PHP and Java in these examples

This way the same query can be used to lazy load the players or perform a join just by specifying a flag in the mapping file. If you need to explicity override the setting you can do so by setting the mode like this:

With the createQuery API it's a bit trickier since if you specified an eager fetch in the mapping file it will ignore it.

Code:

'
// lazy load even if we specified eager in the mapping file
$query = $uow->createQuery("from Team team where name=?", "Saints");
// To explicitly do an eager join(assuming the lazy flag is true in the mapping file) we have to do this:
$query =$uow->createQuery("from Team team join fetch team.players");

The HQL syntax you pass to createQuery is not straight forward SQL so that 2nd statement is turned into "select t.*, p.* from team AS t left outer join player AS p ON (p.team_id = t.team_id)"

Something to consider I really like the HQL syntax, it lets you express your queries in an object oriented way that is quite easy to decipher.

here's a simple parser for a very simple query language. it's based on marcus' SimpleLexer found in SimpleTest. i had some difficulties understanding the concept behind the entry and exit stuff, but hey.... it works good enough for an hour of toying around with it

what do you exactly mean with your second example ?
how would you load a team with its players in one query?
the only thing i could imagine would be a SELECT team.*, player.* FROM team JOIN...
and i am not sure if i like it

Originally Posted by Brenden Vickery

The third parameter in the createQuery function shows the depth at which the object model should traverse. An array could also be passed.

Something to consider I really like the HQL syntax, it lets you express your queries in an object oriented way that is quite easy to decipher.

Im more of a fan of JDO's interface that Hibernates. I also like JDOQL more than HQL however Im likely biased due to not having as much Hibernate experience. The interface is very important for something like this, but Im not set in my ways.

Originally Posted by sike

what do you exactly mean with your second example ?
how would you load a team with its players in one query?
the only thing i could imagine would be a SELECT team.*, player.* FROM team JOIN...
and i am not sure if i like it

Thats exactly what I mean.

Originally Posted by sike

why don't we us a lazy loaded collection object for relations ?

Im saying give the option to do either through the query object. The less you need to mess with an xml file to define how queries should work the better.
When you impose the restriction of only lazy loads you add a requirement on the users domain objects. A person should be able to use their normal domain model and not extend a gererated class from the ORM. They shouldnt have to write the code to do all the FK Mapping either.
The ideal situation is to give them the option (From the Query object, not the xml) to either lazy load or FK Map.

Are you going straight for OQL? Or might it be better to implement something like Hibernate's $uow->createQuery('User', Expression.eq('User.firstname', '?')); first and get the lower level stuff working first before diving into the lexer.

Or perhaps not. After all the query object stage could be skipped altogether and we could go straight from OQL to SQL. It would probably be faster in an interpreted langauge and there may be a lot less translation. The only problem is that this approach would be quite a bit hard to test compared with building little query objects.

I think the PHP comunity has all of the necessary experience around now to finally kick start this complex project. We have the UnitOfWork stuff (Hibernate.Session), the XSLT from different sources, DataMappers and Lexers/Parsers. It's a very difficult task, although with the fallback of the Java and C# versions as example solutions.

The ideal situation is to give them the option (From the Query object, not the xml) to either lazy load or FK Map.

I don't understand why this is ideal. The ideal solution is being able to specify lazy load from either the xml mapping or in your Query object like you can do with Hibernate as I outlined above. With Xdoclet I don't even have to mess with hibernate mapping files, the default behavior is easily adjusted in the @hibernate.set tag in a gettor method comment block.

Or perhaps not. After all the query object stage could be skipped altogether and we could go straight from OQL to SQL. It would probably be faster in an interpreted langauge and there may be a lot less translation. The only problem is that this approach would be quite a bit hard to test compared with building little query objects.

I was thinking that would be easier to build tests for some query objects and sort out the low level stuff, then build the parser side-by-side witht the query object layer. But it may be just a good to jump with both feet.

It also might be good to hack up a direct SQL interface as well for testing purposes.

A person should be able to use their normal domain model and not extend a gererated class from the ORM. They shouldnt have to write the code to do all the FK Mapping either.
The ideal situation is to give them the option (From the Query object, not the xml) to either lazy load or FK Map.

If you plan on creating this ORM like Hibernate then you cannot just use your existing domain classes without modifications.

In Hibernate it's important to properly create your domain classes to reflect all the associations you have described. Whether you do this via hbm2java or write your own you must do it correctly so the object graph can be read in properly. For instance for the Team domain class we would have this in Hibernate:

If you later change the association so that a player can belong to multiple teams creating the need for a many-to-many relationship then you must change the Player domain class to reflect this so that team is now a collection as in:
private Set teams;

Whether you use hbm2java or create your own and use xdoclet tags it's important to create the correct associations so Hibernate can read in an object graph. Whether it reads in the players collection for a team lazily or not, that collection still needs to be defined in your Team domain class.

Therefore whatever normal domain class you had before needs to be adjusted to work with Hibernate.

In Hibernate it's important to properly create your domain classes to reflect all the associations you have described. Whether you do this via hbm2java or write your own you must do it correctly so the object graph can be read in properly.
...

Right. Im saying that we dont want to get into the situation where we need to generate base files based on the xml:

Doing something like above means we can pass in an ArrayObject of Players or pass in a custom lazy load object and we arenít imposing anything on the domain model except that we cant do this in php4 without imposing the use of an iterator that cant be used in a foreach.

I have doubts as to whether or not lazy loads are necessary in php in the first place.

i had some difficulties understanding the concept behind the entry and exit stuff, but hey.... it works good enough for an hour of toying around with it

Unlike most Lexers it's stack based, rather than state based. You enter a new stack frame with a pattern marked as an entry pattern. You pop the stack with an exit pattern. All of the tokens discovered in that stack frame get sent to a parser method by that name (or mapped alias). The special tokens are different only in thet they get sent to a "special" named parser method not reflecting the current named stack frame. If you like, they push and pop the stack in one go. "Special" wasn't a very good name really .

The reason for all of this is so that coplicated regexes could be tuned for the current state, making it a lot faster. This suited the way that HTML has a different syntax inside tags, and a different syntax again inside attributes. A query language is pretty much context free, unless you are doing some kind of subselects, so a simpler state based lexer would be easier to use.

A Lexer is a pretty lightweight piece of code. All it does is assemble regexes once at the start of a run. Should be easier than hand coding.

Doing something like above means we can pass in an ArrayObject of Players or pass in a custom lazy load object and we arenít imposing anything on the domain model except that we cant do this in php4 without imposing the use of an iterator that cant be used in a foreach.

I would go for PHP5. This is a big project and database handling is mission critical. Issues like transactions and locking have to be completely resolved. They are difficult tools to useas well. You have two languages, the relational mapping and also the query language. All of this has to be documented and tutorials written. That's a big job. PHP5 will be mainstream in the year plus it will take to get to the first alpha version.

Database Abstraction
Post 0.2 WACT is changing its DBA layer in order to support a higher level persistence layer, much like the relationship between creole and propel.

Data binding
WACT uses a DataSource interface to implement generic data binding. Persistent objects in WACT will need to support this interface. This is conceptually similar to cocoa bindings (google).

Querying is separate
I see query building as an endless source of complexity. I would like an architecture that separated the process of query building from the process of mapping and object instantiation. One advantage of this is that the query building can hand coded in SQL for complex cases and the query builder can be worked on last.

Unit tested with simpletest
Everything in WACT is unit tested with simpletest.

Example driven
The preferred model of development with WACT is to create an example program representing a real use case and then only program what is necessary to support that use case, refactoring what is necessary. This is an anti-speculation technique. We don't add something until there is a proven need for it. You can relate this to the practice of user stories and acceptance testing in XP. Wact instead uses example programs as use cases, and automated web testing of those examples as acceptance tests.

Modular and layered
This is a fundamental WACT philosophy. Basically, the idea is that for simple examples, WACT only loads the barest amount of necessary support code. Then more complex cases are handled with lazy loading, decorating, etc. Simple things should stay simple.

Join optimization support
An important case

Buildless code generation
basically, no phing. Any code generation should take place behind the scenes and not require an explicit build phase.

No round-trip code generation
Generated code should never be modified by hand. I'm also wary of inheriting from generated code.

I guess thats what I can think of now. I hadn't really planned to start working on WACT persistence any time soon, but if there is interest in doing something as part of WACT, I would be willing to set things up and to work on it.