The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

I'd even be content with a good Query/Criteria implementation instead of HQL/OQL, yet OQL support would be a very cool and mighty feature.

The problems of defining query criteria at runtime are that they require query composition code in the runtime classes that only bloat your data object classes and they also prevent the evolution to a more efficient implementation of the queries using stored procedures.

Using an Object Query Language (OQL), you can use a compiler that generates optimized SQL queries at compile time. This way you take off the bloat of an eventual query criteria builder from the runtime classes and you can even have it generate stored procedures that can execute queries much faster as stored procedures are compiler by the database server at instalation time.

Metastorage uses this approach. It features its own OQL for defining object query filter criteria defined with a XML syntax, like everything else in CPML (Component Persistence Markup Language).

Below is an example from the sample CMS project that comes with Metastorage. The <filter> section defines an OQL expression. Once compiled, the getarticlesbytitleandauthor function takes runtime arguments author and title and executes an SQL query that retrieves all article objects that belong to a collection of articles of a category object that matches the given author and title names.

This is not a trivial criteria to define with simple query criteria building code as it envolves a many-to-many relationship, in this case between a category class and an article class.

Fortunately, Metastorage simplifies a lot the compilation of arbitrarily complex SQL queries. Furthermore, it generates very compact code that will run faster. In the case of this function for the category class, you may want to verify for yourself that it generates just a one liner function call. The compiled SQL criteria looks like this:

One final comment regarding the objections that some people raised against code generation based solutions, keep in mind that there is code generation and code generation. This means that not all code generation tools are as imature as you may imagine. You need to check for yourselves the generated code before you take your own conclusions so you do not make the mistake that all code generation solutions produce inappropriate code.

You may recall that the history of PHP template engines could be basically divided in: before Smarty and after Smarty. After Smarty, many template engine developers realized that it was a brilliant idea to compile templates generating equivalent PHP code to produce the same output, as it would run much faster and efficient.

The same can be said about code generated from compiled OQL. Not to mention that using persistence layers generated by mature tools from an object model, drastically reduces your development time. This way you can concentrate more time in hand coding what is really specific about your applications, which are the business rules.

Once that is done, the persistence code needs to know how to map object properties onto data store fields (and key(s)).

I like the idea of XML maps, but I would like there should also be a direct (no XML load/parse required) way to map in-scriptu. The XML map is required if there needs to be a build tool, but is only a convenience if not.

The problems of defining query criteria at runtime are that they require query composition code in the runtime classes that only bloat your data object classes and they also prevent the evolution to a more efficient implementation of the queries using stored procedures.

Agreed, however there are reasons to have a way to create queries at runtime. For example, a highly dynamic report builder that builds a query that has hundreds of possible variations.

Originally Posted by mlemos

One final comment regarding the objections that some people raised against code generation based solutions, keep in mind that there is code generation and code generation. This means that not all code generation tools are as imature as you may imagine. You need to check for yourselves the generated code before you take your own conclusions so you do not make the mistake that all code generation solutions produce inappropriate code.

I dont think there are objections to code generation in general (Unless I missed it). There is some objection to extending a code generated class or directly using code generated classes. Doing either of these things imposes too much on a users design, IMO.

The opportunities for code generation exist but should be transparent to the user. When the xml map is first read, maps should be generated for each class that is persisted. These generated maps take the place of the classMap and fieldMap classes in my example. Also generated should be INSERT, UPDATE and DELETE statements with "?" for variables.

Using an Object Query Language (OQL), you can use a compiler that generates optimized SQL queries at compile time. This way you take off the bloat of an eventual query criteria builder from the runtime classes and you can even have it generate stored procedures that can execute queries much faster as stored procedures are compiler by the database server at instalation time.

I kind of alluded to this earlier and I agree. It's actually quite hard to write such a solution in one go (query objects are easier to test with). I think it's worth it long term to drop the object solution when the query engine works. Hibernate has top spot because it made no compromises making itself the "best" solution. The difficulty of implementation not withstanding. When you read Hibernate tutorials almost no one even bothers to explain the query object solution anymore. They just go straight to the query language.

It's a difficult task though. I would certainly allow these intermediate objects whilst the system was being written.

Originally Posted by mlemos

One final comment regarding the objections that some people raised against code generation based solutions, keep in mind that there is code generation and code generation. This means that not all code generation tools are as imature as you may imagine. You need to check for yourselves the generated code before you take your own conclusions so you do not make the mistake that all code generation solutions produce inappropriate code.

I don't think anyone wants to avoid code generation because it's somehow untrustworthy. The trade off is better run time performance with code generation against greater deployment headaches. I have always gone for the code generation approach in the past in this arena. The reasons have been the poor reflection capabilities in PHP, the need to parse the schema and because XSLT makes creating these solutions very easy.

I think code generation is the way to go. Given that deploying an optimised ORM tool is non-trivial in any case, the extra code generation step won't affect it. Because this is not a tool for beginners, the extra complications of explaining it's performance signature (big hit first time around) also fail to rule it out.

Regarding reports from optimised queries, just use SQL. You would have a custom report object that just displayed SQL data types (with very simple passive objects or DAOs). ORM is just not needed in this case. I think it is safe to say we want to do complex things to small amounts of data and we want to change our minds on how we do it often. That is actually 50%+ of all development work I have ever witnessed.

With all this theoretical talk I thought I would hack something up just to learn a little about the subject. Here is some PHP5 code sets up handlers for get/set and tracks whether a property has been fetched or is dirty. I assume that an ORM would plug into the set handler to fetch the data and then do the commit. Obviously a system to map the properties for each object to a table/field/key needs to be added. There is example code a the bottom so you can run it.

Agreed, however there are reasons to have a way to create queries at runtime. For example, a highly dynamic report builder that builds a query that has hundreds of possible variations.

You do not have to create queries at runtime for that. What you need is to generate a query string that has conditional sections that may or may not be included in the actual query that is executed depending on boolean values that are passed to the query function at runtime. That is something that in Metastorage todo list for a while to be implemented in an upcoming release.

Originally Posted by Brenden Vickery

I dont think there are objections to code generation in general (Unless I missed it). There is some objection to extending a code generated class or directly using code generated classes. Doing either of these things imposes too much on a users design, IMO.

I am not sure what you are saying. At least Metastorage only generates data object classes that have the variables and functions that you ask be specifying them in the component definition. You do not have to subclass them for anything because they already do what you want. The generated classes also do not inherit from any sort of base class that does not belong to your model definition.

Originally Posted by Brenden Vickery

The opportunities for code generation exist but should be transparent to the user. When the xml map is first read, maps should be generated for each class that is persisted. These generated maps take the place of the classMap and fieldMap classes in my example.

Why bloating the runtime enviroment with a map compiler that is not really needed to execute your application operations? It is not like you will need to change your classes that often while your application is running. Isn't it much simpler to regenerate everything in your development environment if and when you really need to change any details in your classes?

Originally Posted by Brenden Vickery

Also generated should be INSERT, UPDATE and DELETE statements with "?" for variables.

This is what Metastorage generates, except that it does not need a OQL compiler at runtime as the generated PHP code already embeds the compiled SQL.

Originally Posted by Brenden Vickery

If the xml map is changed all of the cached/generated queries are deleted.

You get a huge performance gain with reads when you implement shared memory cache.

I don't think you need this complexity to cache a few SQL strings. If you embed them in the generated PHP code, it is much simpler, your application is less complex and since the runtime code is smaller it loads faster.

I kind of alluded to this earlier and I agree. It's actually quite hard to write such a solution in one go (query objects are easier to test with). I think it's worth it long term to drop the object solution when the query engine works. Hibernate has top spot because it made no compromises making itself the "best" solution. The difficulty of implementation not withstanding. When you read Hibernate tutorials almost no one even bothers to explain the query object solution anymore. They just go straight to the query language.

I am not sure what you are sayin, but I was not suggesting to drop the persistent objects when you need to execute queries. The code for the persistent objects and factory classes that Metastorage generates includes optimized SQL code that was generated at compile time.

You can compile an unlimited number of object queries. Each query may be associated to a function of the persistent objects or the factory classes. So, when you need to execute a query to retrieve objects that satisfy a given condition, just call the associated class function.

These queries may embed parameter values defined only at runtime. Such parameter values can be passed to the class as arguments of the functions that execute the objects queries.

Originally Posted by lastcraft

Regarding reports from optimised queries, just use SQL. You would have a custom report object that just displayed SQL data types (with very simple passive objects or DAOs). ORM is just not needed in this case. I think it is safe to say we want to do complex things to small amounts of data and we want to change our minds on how we do it often. That is actually 50%+ of all development work I have ever witnessed.

Right, that is why Metastorage can also generate report classes. You just define your query criteria envolving whatever objects you need in the same query, pick whatever columns you need, the sorting and grouping criteria if you need it.

Then you define functions that your reports will have that will execute your queries, eventually using run time arguments passed to the function, and then Metastorage generates classes with those functions that return arrays of data from the result set rows so you can process them in your application the way you like.

Here is an example of a query and an associated report class function that was used to generate the listing of the main page of the PHP Classes site forum system:

I am not sure what you are saying. At least Metastorage only generates data object classes that have the variables and functions that you ask be specifying them in the component definition. You do not have to subclass them for anything because they already do what you want. The generated classes also do not inherit from any sort of base class that does not belong to your model definition.

I wasnt really saying that with metastorage in mind, however it applies. Metastorage creates a RowDataGateway. If we want to safely add any domain logic we need to extend the generated class so that on recompile we dont lose our domain logic.

Originally Posted by mlemos

Why bloating the runtime enviroment with a map compiler that is not really needed to execute your application operations? It is not like you will need to change your classes that often while your application is running. Isn't it much simpler to regenerate everything in your development environment if and when you really need to change any details in your classes?

I think you guys have convinced me to stop being so stubborn about not thinking about optimization until the end. There are many places to where code generation will speed things up, you need to be very careful about how it is implemented however. The interface and ease of use are paramount to a good orm tool.

Originally Posted by mlemos

I don't think you need this complexity to cache a few SQL strings. If you embed them in the generated PHP code, it is much simpler, your application is less complex and since the runtime code is smaller it loads faster.

I wasnt suggesting loading sql strings into shared memory if thats what you thought. I was suggesting generating the string however.

I am not sure what you are saying. At least Metastorage only generates data object classes that have the variables and functions that you ask be specifying them in the component definition. You do not have to subclass them for anything because they already do what you want. The generated classes also do not inherit from any sort of base class that does not belong to your model definition.

This solution will be fundamentally different from MetaStorage even though it is tackling a related problem. There are other libraries, MetaStorage included, that generate DataAccessors (DAOs). Even PEAR:: DB:: Data_Object does that. The idea of a Hibernate like clone is to generate the DataMappers that save domain objects. The Domain objects are written in PHP just as they would have been anyway (with a few minor compromises).

Regarding the OQL to SQL conversion, there is no trouble doing that on the fly as compared with actualy running the DB query the extra load is trivial. Possibly we could generate a translator for each domain class, but that is a pretty obscure refinement.

With all this theoretical talk I thought I would hack something up just to learn a little about the subject. Here is some PHP5 code sets up handlers for get/set and tracks whether a property has been fetched or is dirty. I assume that an ORM would plug into the set handler to fetch the data and then do the commit. Obviously a system to map the properties for each object to a table/field/key needs to be added. There is example code a the bottom so you can run it.

I think that sort of thing is an option to look at. It gives the user less control of their design but has better performance.

In this type of situation construction becomes difficult because you cant use setters. So any persistent object must: have getters/setters (__call()), have a full constructor for every way an object can be constructed, extend PersistenceBase and not have any private properties. There are other subtle things that need to be kept in mind with this approach.

I want to articulate what my thoughts are now in regards to lazy loading a code generated query.

In this type of situation construction becomes difficult because you cant use setters. So any persistent object must: have getters/setters (__call()), have a full constructor for every way an object can be constructed, extend PersistenceBase and not have any private properties. There are other subtle things that need to be kept in mind with this approach.

There are subtle things like requiring public properties, but I don't understand why you can have setters? The __call traps every getter/setter call in the form get<propertyname>() or set<propertyname>($value). You just use that naming convention and don't create your own get/set functions. Getters can fetch data if it has not been fetched (in getHandler()). And setters mark the field as dirty (in getHandler()) for commit().

The PersistenceBase class would "create" the getPay() and setPay() methods for you. And getPay() would fetch Pay from the data source if it hadn't been fetched. And setPay() would mark Pay as dirty.

You need to:

1. Hook your Query object into getHandler() to fetch in all the values when it finds that the value has not been fetched.

2. Hook your Query object into commit() to write back all the fields marked dirty.

The commit() iterates through all objects registered with the PersistenceMgr, so you should probably allow a seperate Query object for each persistent object. Pass it to newObject() and save it in PersisteneBase.

There are subtle things like requiring public properties, but I don't understand why you can have setters?

I mean that for all intensive purposes the interface of the domain object must have setters. The setter cant be used at any time unless you specifically want to make the object dirty. The setter must be used to make the object dirty.

I mean that for all intensive purposes the interface of the domain object must have setters.

I am not clear what you mean. You certainly don't need to have "public" setters if you don't want to. You could also use __set() to call the setHandler() if you want to allow $this->Pay = $value;. I assume that domain object would implement an interface like lastcraft's and just use setters internally.

The setter cant be used at any time unless you specifically want to make the object dirty.

Again, I am not clear what behavior you want. You could implement it so that setPay($value) always makes Pay dirty; you could add a flag to give you an option setPay($value, false); or you could implement you own setter cleanSetPay($value) to curcumvent the persistence. You could also register the behavior of each field when you create the object. For me complexity is not hard to add.

The setter must be used to make the object dirty.

Again, not sure what behavior you want. It would be trivial to implement cleanPay() and dirtyPay() or _makeClean('Pay') and _makeDirty('Pay') or whatever. Again adding bells and whistles does not seem to be the problem here.

I am not clear what you mean. You certainly don't need to have "public" setters if you don't want to. You could also use __set() to call the setHandler() if you want to allow $this->Pay = $value;. I assume that domain object would implement an interface like lastcraft's and just use setters internally.

I mean that when you do a switch on the $prefix in PersistenceBase::__call() and one of the cases is 'set' the interface then has a set* interface. This needs to be public so that the object can be marked dirty.

The other setter/getter/no arg constructor way of state management has this drawback aswell.

Originally Posted by arborint

Again, I am not clear what behavior you want. You could implement it so that setPay($value) always makes Pay dirty; you could add a flag to give you an option setPay($value, false); or you could implement you own setter cleanSetPay($value) to curcumvent the persistence. You could also register the behavior of each field when you create the object. For me complexity is not hard to add.

I mean that you cant use a set* function for construction of a clean object. You must use a constructor, or create init methods. Not a huge deal but is still added complexity.

Originally Posted by arborint

Again, not sure what behavior you want. It would be trivial to implement cleanPay() and dirtyPay() or _makeClean('Pay') and _makeDirty('Pay') or whatever. Again adding bells and whistles does not seem to be the problem here.

Im saying that the only way to make an object dirty is by using a setter through __call().

I know all of this seems pretty trivial but for a tool I think it makes a difference. Having to code your domain model adhearing to rules imposed on you by a tool such as "dont modify a variable directly inside a class, use a set* method where * equals the variables name" is annoying to me. If these types of things can be avoided, I think they should be avoided.

I wasnt really saying that with metastorage in mind, however it applies. Metastorage creates a RowDataGateway. If we want to safely add any domain logic we need to extend the generated class so that on recompile we dont lose our domain logic.

No, Metastorage implements Aspect Oriented black box approach. This means that it only implements the aspects that you tell that your persistent objects need. This is why Metastorage generated classes are very compact and also do not need any base classes or sub classes.

Despite the support for adding arbitrary functions and variable to the persistent classes defined by you in your target language of choice (PHP in this case) is still on the todo list, I wonder if is there a real demand for arbitrary customizations, or what you want is something that can be generalized and be implemented in the Metastorage compiler a base type of functions that Metastorage could generate from a small description provided by the developer.

For instance, some people wanted to perform object searches with custom criteria. Now, they can do that with Metastorage OQL with the advantage that Metastorage generates search functions with code highly optimized either in size and in execution speed.

So, I wonder what things you think you would need to add to Metastorage generated classes to customize them for your needs?

This solution will be fundamentally different from MetaStorage even though it is tackling a related problem. There are other libraries, MetaStorage included, that generate DataAccessors (DAOs). Even PEAR:: DB:: Data_Object does that. The idea of a Hibernate like clone is to generate the DataMappers that save domain objects. The Domain objects are written in PHP just as they would have been anyway (with a few minor compromises).

Metastorage is not a library. Sometimes I wonder if people are not confusing Metastorage with Metabase. Metastorage is a compiler. Its approach it totally generative. It generates autonomous code. So you do not have to bundle any part of the compiler (over 40,000 lines of PHP code) to execute any of the Metastorage generated code.

Metastorage also generates code for important things, like generating reports, that Hibernate does not help you.

As I mentioned before, although customizing domain objects with native PHP code written by the developer is still on the to do list, I wonder if that is really necessary or whatever customizations you need, couldn't you just make Metastorage also generate them for you from a description of what you want, and benefit from the eventual project life cycle speed up, not to mention about the reliability of the automatically generated code.

I have not had the need to make Metastorage generate classes that integrate hand written code (customizations) or else that would already be an available feature now in Metastorage. I understand that other people may have different needs than me that would require hand written code. I just would like to hear about those needs so I can determine whether it would be worth automatic the generation of more aspects or provide the ability to integrate custom code sooner.

Originally Posted by lastcraft

Regarding the OQL to SQL conversion, there is no trouble doing that on the fly as compared with actualy running the DB query the extra load is trivial. Possibly we could generate a translator for each domain class, but that is a pretty obscure refinement.

I am afraid you are underestimating the complexity of the problem. Even if you can develop such an engine in PHP soon enough to make it useful for enough people to test it and provide helpfull feedback, I doubt that you really can make the extra load trivial (not just in CPU time but also in memory usage).

Metastorage is a compiler. Its approach it totally generative. It generates autonomous code.

Ok, but in this context a minor distinction. We are not out to write a new compiler (I hope), but to enhance the PHP language with a relational mapping layer. The proportion of that which turns out to be generative is an implementation detail against that goal.

Originally Posted by mlemos

I am afraid you are underestimating the complexity of the problem. Even if you can develop such an engine in PHP soon enough to make it useful for enough people to test it and provide helpfull feedback, I doubt that you really can make the extra load trivial (not just in CPU time but also in memory usage).

Interesting. Can you give some examples of the type of difficulty encountered? The HQL grammer is not particularily large and over half of that is aggregation (SUM, etc), which is a pretty unimportant feature. Obviously not a trivial task, but getting basic objects to load doesn't look so difficult. At least on a first look.

Some food for thoughts from Gavin King (I was happy not to have EJBs in PHP when I watched this clip )

I'm just as happy to not have EJBs in Java, which is easily doable with POJOs and a light-weight IoC container.

EJB3 is heavily influenced by Pico and Spring but after reading up on it I was a bit let down. Dependency Injection in EJB3 is limited to JNDI managed objects and is for Java 1.5+ only because you must use annotations like:

Ok, but in this context a minor distinction. We are not out to write a new compiler (I hope), but to enhance the PHP language with a relational mapping layer. The proportion of that which turns out to be generative is an implementation detail against that goal.

Sorry, it seems I was not very clear. What I meant to point is that Metastorage is not a library. It just generates standalone code that works by itself. Such code does not require a runtime library to assist it in the operations that the generated code executes to manipulate the persistent objects.

This Metastorage approach contrasts with other solutions that need to bundle bloated base libraries that implement the necessary object persistence features. Solutions that require bloated libraries tend to impose difficulties in scalability. I mean that the solutions may scale, but at a cost of more hardware.

Metastorage minimizes the eventual scalability costs by anticipating as much as possible the computation of information that is static through out time, resulting in code that not only is smaller but also consumes less memory, as most of the information to be used at runtime is pre-computed and embedded in the generated code.

Interesting. Can you give some examples of the type of difficulty encountered? The HQL grammer is not particularily large and over half of that is aggregation (SUM, etc), which is a pretty unimportant feature. Obviously not a trivial task, but getting basic objects to load doesn't look so difficult. At least on a first look.

It is not so much of a problem of difficulty, but more of the complexity of the situations that you will have handle to make it useful. Evaluating and translating simple HQL expressions to SQL is relatively simple.

Things that make it more complex are relationships. One-to-one and one-to-many relationships are translated to simple joins. Many-to-many relationships are much more complex due to the intermediate table that they require.

For every class of persistent objects that is reference, you need to load and parse its definition into memory so you can verify whether the HQL expression is consistent. The more classes are involved in each expression, more definitions you need to load and parse in memory.

All this adds to the CPU and memory resources that it will consume to parse. With Metastorage, I usually need to raise the PHP memory limit to 32MB. Doing all the process on a Web server scope it is possible but it is not convenient for most people to do on a production server as they may not be allowed to change their limits.

This is not a problem to Metastorage because you can compile components of persistent object classes with arbitrary complexity in your development environment where you can control its configuration limits.

There are other problems, but these should be enough to give you an idea of the time and effort that you need to make to develop your own solution. Metastorage is being developed for almost 3 years. It is based on MetaL that is being developed for almost 6 years.

I am not saying that you setting yourself to an impossible mission, but rather an effort that you should not underestimated its complexity .

Some people complain that I keep pushing whatever are my developments, maybe because they prefer to do it all by themselves or at least not rely on existing tools like I am suggesting. The truth is that for complex projects like this, trying to do it by yourself eventually reinventing the wheel instead of adopting something proven to be convenient, it may turn out to be a mistake and a big waste of time.

Therefore, what I want to make clear is that rather than jumping on yet another me too project, I suggest that you take a closer look at Metastorage (or other proven solutions), and discuss the aspects that do not seem satisfactory so I can work on adding the necessary improvements to make Metastorage a better solution for more people.