The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

I think Manuel has a point here that I was trying to get to earlier. PHP being a stateless non compiled language will require some key differences exactly when it comes to the point of persistance. You can either solve this in a very un-PHP kind of way by using something like an application server and thereby essentially making PHP stateful or you can take a path like MetaStorage which essentially moves all the brains into a separate generation stage. The end result is not really a persistance layer in the classic java sense but more a code generator that creates the code necessary to communicate with your storage layer. So its not really what the thread tried to achieve. The question is just if the goal of the thread is feasible in PHP. Only time will tell :-)

Sorry, it seems I was not very clear. What I meant to point is that Metastorage is not a library. It just generates standalone code that works by itself.

That's cool. I understand.

Originally Posted by mlemos

Many-to-many relationships are much more complex due to the intermediate table that they require.

I must admit I have always ducked this issue when writing these types of libraries. That may well be because as a data modeller I have tended to avoid these relationships anyway, preferring named associations.

Originally Posted by mlemos

For every class of persistent objects that is reference, you need to load and parse its definition into memory so you can verify whether the HQL expression is consistent. The more classes are involved in each expression, more definitions you need to load and parse in memory.

I agree this part is ideally precalculated with code generation.

Originally Posted by mlemos

I am not saying that you setting yourself to an impossible mission, but rather an effort that you should not underestimated its complexity .

I agree. I'll reiterate that this is a big job, even with Hibernate as reference material. I would exstimate 2 years to get to the point of mapping multitable joins, inheritance, import/export, schema migration and multiple identity mapping strategies. That's at Sourceforge speeds with say 4 core developers. Still, that would be the main functionality of Hibernate sorted out. The rest is just sugar coating.

Would you consider this a reasonable estimate?

Originally Posted by mlemos

The truth is that for complex projects like this, trying to do it by yourself eventually reinventing the wheel instead of adopting something proven to be convenient, it may turn out to be a mistake and a big waste of time.

There is no doubt that MetaStorage is an impressive project, and along with the phpclasses site, makes you personally one of the most effective PHP developers around. However a one size fits all approach is equally damaging. There are lot's of reasons to have several systems in the community, not least in allowing evolution and friendly competition (something PEAR could learn from). As an application developer you usually want a library/tool with a close fit to the needs of the application and development team and don't want to import any more complexity than necessary.

Metastorage/MeTaL has a significant learing curve and involves learning a new language (with an XML syntax). It also replaces your domain objects as far as I can work out. Hibernate has a very different interface, mapping already hand coded PHP classes. That's a really different tool. There is certainly both room and a need for both tools in the "marketplace".

Reuse doesn't just have to be building one Ubertool and it's not usually a good idea to do so. Otherwise we would all be using Java and PHP would never have existed.

You can either solve this in a very un-PHP kind of way by using something like an application server and thereby essentially making PHP stateful or you can take a path like MetaStorage which essentially moves all the brains into a separate generation stage.

I don't want to prejudge the solution, but here is my 2c worth as best guess: code generation for the mappers, multiple identity map strategies (including a memory cache option).

I think Manuel has a point here that I was trying to get to earlier. PHP being a stateless non compiled language will require some key differences exactly when it comes to the point of persistance. You can either solve this in a very un-PHP kind of way by using something like an application server and thereby essentially making PHP stateful or you can take a path like MetaStorage which essentially moves all the brains into a separate generation stage. The end result is not really a persistance layer in the classic java sense but more a code generator that creates the code necessary to communicate with your storage layer. So its not really what the thread tried to achieve. The question is just if the goal of the thread is feasible in PHP. Only time will tell :-)

The goal of persistence is to store the information of an object in a non-volatile storage container, in such way that the object can be restored even after the application that created or modified it has ended.

Prevalence is a radical approach to implement object persistence. It works by loading all objects created by your application to the main memory when the application is started and keep them in memory while the application is being run.

To implement persistence does not require that you use a prevalence approach. As a matter of fact, prevalence is an unrealistic approach for most types of Web applications.

The prevalence approach is radical because it assumes that there is always sufficient memory to keep all objects. This means that it is only usable with applications that deal with a small number of objects.

Even when you use your own dedicated server, you cannot assume that you can use all the memory you have because memory is a shared resource. If one process uses all the memory, the rest of the system will practically halt, even when you use virtual memory. When you use all your memory in a shared server, your hosting provider will kick you out.

Using databases or other persistent storage containers is a compromise between data object access speed and a realistic use of server memory. Most modern database servers and operating systems make already plenty of use of smart caching to reduce the time to access frequently used data. You may also use application aware object caching techniques to reduce further storage access time.

You do not need to resort to prevalence, even because it is an unrealistic approach in most cases. In the Java world, prevalence is implemented by Prevayler. Application servers do no implement prevalence. What they do is to queue the requests to store a bunch of objects to retard database access without holding on application clients. Hibernate just implements direct storage access persistence just like the code generated by Metastorage.

Actually we have just started looking at prevayler as a serious solution .

We are thinking of using it only for handling sessions though. This is nice non critical data and makes good use of the 4 gig we are putting into the new web servers. The real data is still on a mixture of MySQL and various full text engines, but all of that happens on the second tier servers hidden behind a messaging protocol. And yes, this is all PHP...

I agree. I'll reiterate that this is a big job, even with Hibernate as reference material. I would exstimate 2 years to get to the point of mapping multitable joins, inheritance, import/export, schema migration and multiple identity mapping strategies. That's at Sourceforge speeds with say 4 core developers. Still, that would be the main functionality of Hibernate sorted out. The rest is just sugar coating.

Would you consider this a reasonable estimate?

I recommend that if you really want to go ahead, first implement something that works well for you in a real world project, and then announce it when it is ready and documented well enough to be useful to others.

The Open Source users community is tired of projects that sound promising when they are announced, but then for some reason the projects died because people got over-excited with a project that got plenty of initial attention, but then did not go anywhere either because the developers did not have enough time or b*lls to do it.

Sourceforge is also know as the "Open Source graveyard" because it is full of projects that did not go anywhere because of what I mentioned above. I recommend also that you take a look at this paper about Sustainable Open Source development that mentions a few points of why so many Open Source projects fail.

By all means I am not questioning your capabilities, but rather warning about the feasibility of your project given that so far it is just an idea. I don't know what you do for a living, but I suppose it is consultancy or something else that takes most of your day time. If my assumption is correct and you do not have much free time (like most people that work), it will be a long time until you reach something usable.

Metastorage took me about 3 months to make an initial release with basic features and documentation. However, I worked on it almost full time. I could justify doing it because it provided me a productivity tool that allowed me to implement important features of the PHP Classes site in much less time if I had to work by hand. As you may understand, I work full time in the PHP Classes site because I can generate enough income from advertising. This way I can make viable continued development of Metastorage and many other Open Source projects of mine.

Originally Posted by lastcraft

There is no doubt that MetaStorage is an impressive project, and along with the phpclasses site, makes you personally one of the most effective PHP developers around. However a one size fits all approach is equally damaging. There are lot's of reasons to have several systems in the community, not least in allowing evolution and friendly competition (something PEAR could learn from). As an application developer you usually want a library/tool with a close fit to the needs of the application and development team and don't want to import any more complexity than necessary.

I completely agree that each developer that needs to be sensible before making "build or buy" decisions. What does not seem to me to be very sensible is to replicate the effort of multi-man-year project before going through further enquiries on how can the existing projects be adapted or enhanced to make them suite your needs.

Originally Posted by lastcraft

Metastorage/MeTaL has a significant learing curve and involves learning a new language (with an XML syntax).

Not a new language, just a new format. So, does Hibernate because it does not work without the mappings definition that is in XML.

Hibernate has a very different interface, mapping already hand coded PHP classes. That's a really different tool. There is certainly both room and a need for both tools in the "marketplace".

I understand that you want to depart from existing PHP Classes and add persistent capabilities. However, PHP is not Java. Unlike Java classes, PHP classes miss a lot of definitions like for instance the data types of variables and function arguments. Since that information is missing in PHP classes, the developers will have to add it manually either in comments in the PHP Classes file code or in a separate XML definitions file. Basically it is the same effort of defining classes in the CPML format used by Metastorage.

It seems to me that it would be easier to write a simple tool with PHP tokenizer functions available since PHP 4.3.0 and generate a CPML definition that would help the migration to start using Metastorage. That would be a nice add-on that anybody develop without depending on Metastorage developments.

There is a section of the Metastorage FAQ that suggests add-on projects that anybody can implement and can benefit developers using Metastorage. One of the ideas that were already picked up was a tool to translate from XMI to CPML and vice-versa so the developers can use UML graphical modeling tools to design they class models. I will probably add the idea of translating between PHP classes to a CPML definition as another suggestion of a tool that could be useful to some developers.

Anyway, my point is not to detour from your goals of implementing a PHP Hibernate port. I am just giving you food for thought so you can reflect if it is really worth going through the effort to developing that port or adopting Metastorage as a real solution available now.

If that decision would depend on the adding certain feature to Metastorage that you find important, I would certainly would like to know about what would be such features, as it could be something that I would like to add to Metastorage and address my goals better.

Actually we have just started looking at prevayler as a serious solution .

We are thinking of using it only for handling sessions though. This is nice non critical data and makes good use of the 4 gig we are putting into the new web servers. The real data is still on a mixture of MySQL and various full text engines, but all of that happens on the second tier servers hidden behind a messaging protocol. And yes, this is all PHP...

Just because you have server with 4GB of RAM, it does not mean it is always available. The available RAM usually determines the limit of processes your Web server can fork to handle simultaneous connections.

Usually I dedicate half of the available RAM and divide by ~12KB, which is the typical size of the Apache process memory usage when serving PHP pages on Linux using Turck MMCache as accelerator. The result of the division is the number that is use to set the maximum number Apache processes before it starts using virtual memory and the server performance starts to degrade.

For sessions, I use to store them in a database but I use an arbitrary data caching class to cache session data in files. At least in Linux, the filesystem is very good at caching because Linux uses free memory to buffer disk accesses. This almost eliminates disk I/O. If the memory used by the filesystem buffers is needed by a process, the memory is released for that process and there is no need to resort to virtual memory.

If you want to reserve some memory to make session files access faster, you can mount a ram disk virtual partition and drop the session files in there.

I don't know what kind of architecture do you use, but if you want to reserve a good chunk of memory to provide fast access to sessions, you probably would be better off with Turck MMCache session handler that stores the session data in shared memory. That is a solution that is available on PHP now and it basically can work as prevailer.

If my assumption is correct and you do not have much free time (like most people that work), it will be a long time until you reach something usable.

It is correct, but it's actually worse than that . I already have an OS project (SimpleTest) and project manage two others (Arbiter, SpamProofWiki). And then I had a son and now it's really fallen apart...

I have no interest in running such a project. I am trying to sound out others that all seem to be implementing their own solution and trying to rally tham together a little. I would love to see a Hibernate port, but as a user, not an author .

Judging by how quickly this thread has died down I suspect that sufficient enthusiasm is not yet there.

Originally Posted by mlemos

I completely agree that each developer that needs to be sensible before making "build or buy" decisions. What does not seem to me to be very sensible is to replicate the effort of multi-man-year project before going through further enquiries on how can the existing projects be adapted or enhanced to make them suite your needs.

Well, I'll leave that up to others to decide. I am in the "already built" category for my current projects. There seems to be a real desire to get beyond DAOs into DataMappers though. It would be (have been?) nice to see this spawn a cohesive project.

Originally Posted by mlemos

So, does Hibernate because it does not work without the mappings definition that is in XML.

That's true, although it is very SQL like.

Originally Posted by mlemos

Basically it is the same effort of defining classes in the CPML format used by Metastorage.

This is the same amount of effort as in Java. In Java you have to add types to classes whether they are persistent or not. They didn't come for free.

Originally Posted by mlemos

It seems to me that it would be easier to write a simple tool with PHP tokenizer functions available since PHP 4.3.0 and generate a CPML definition that would help the migration to start using Metastorage.

The tokeniser doesn't really do very much . I doubt we will get anything like Doclet anytime soon either as the PHPDocumentor team (of two) have lost control of their code to some degree and are requesting volunteers to help clean it up. It would mean writing a PHP parser or hacking the Zend source to extract theirs.

This is the same amount of effort as in Java. In Java you have to add types to classes whether they are persistent or not. They didn't come for free.

I suppose that is why Rails pulls that info from the database, rather than tring to generate the database from the classes or an XML config file. PHP doesn't seem to have the dynamic code generation power that Ruby has, so I can't see an obvious way to port ActiveRecord... would be nice though.

Been following this thread for a good bit now and I agree for the moment that there is not a lot of interest in a Hibernate port for PHP at the moment. But much like 18 months ago there wasn't much interest either in MVC or other design patterns but look now aye?

Every other day now there is a new discussion about MVC or another design pattern so I think Hibernate is about 12 - 18 months away before people take a geniune interest in it.

PHP still for the moment does not have the large enterprise projects that Java takes for granted and a lack of tools may well be a falling point for PHP but only to some degree in my view, as PHP is not really ready yet.

But for everyone discussing this, please continue as we can all learn something from what is being discussed...

It's hard to come up with valuable feedback for such a complex topic, but after reading and thinking for a while there are a couple of things that are very clear for me.

First off, the thing I like the most about using a unit of work is that I don't need to manually save every single object I load on a request (or process for cron jobs).

For CRUD type applications -like CMS's- that is not a big gain since your business logic is just changing and saving content. In my case, I'm working on a system that is growing in complexity over time in a way that I often need to update different objects inside a single action, using Facades for most cases. Take this case as an irresponsibly simple example:

Suppose I need to send an email to large set of affiliates, filtering on the kind of content they agreed to receive.

This code is ignoring persistance at the moment, but If I want to persist all changes using a DAO or a Mapper, I have to add that logic to the process, making the code look a little bit like this:

PHP Code:

// after the notification was sent $notification->save();// or$notificationMapper->save($notification);

// and the same for each affiliate$affiliate->save();

And the logic can get worse when you start to add features. I don't like how the code gets cluttered with a lot of saving stuff that isn't really business logic.

With a UoW solution the client code stays almost the same, and you would only require the UoW to start() at the top and commit() at the end (ideally). This is VERY different to the solutions we have at the moment for PHP, so is not really re-inventing the wheel, but I do believe that we can use what's alredy done.

In particular, we could be better off if we register objects to the UoW from the domain objects and/or finders, as in:

Which could be done very neat when using AoP to generate the registering code. This leads me to my second point.

I have not used Metastorage yet, but I did went and see the tutorials and tried to understand what it is all about. Even though I wasn't thorough reading and testing stuff, I'd suggest to start from there since the code generation seems to be pretty mature and relationships are very well implemented.

I think I read it uses AoP, so the question is... would it be to hard to icorporate the UoW registering stuff as an aspect?

Ive started a project on sourceforge for a object/relational persistance and query tool. Ill add it to my sig when its up.

Attached is an iteration plan with the features Id like to see. The estimated hours (in brackets) add up to 263 right now. This is probably low.

Im going to keep the interface similar to the prototype I posted earlier. Some major differences will be in the code generation department. Im going to start off with php5.

There will be support for transparent persisted objects (no-arg constructor, setters/getters) and non-transparent (uow is kept with domain objects). This will be added early so we can weigh benefits, performance considerations etc.

How are you guys using transactions? Are you doing a single transaction for each request, or multiple transactions for each request? Should a single transaction be used for a single UoW? Should this be implicit?

What do you think about using doc comments for metadata mapping instead of xml? Example:

Attached is an iteration plan with the features Id like to see. The estimated hours (in brackets) add up to 263 right now. This is probably low.

The way to find out if your underestimating is to break a task down into subtasks and estimate each one. If that comes out higher than your original estimate, then you don't have a handle on a chunk of that size. When you further break down subtasks your total estimates will rise again unless you get down to tasks that you have done before or are comparible with them. Once that happens you are getting accurate. Now take this final score and ratio it with your original estimate fo ethat strand. Then scale the project up by that amount giving you a first order estimate.

Let's say you estimate a task at 10 hours, but when you break it down it adds up to subtasks of 4, 4, 5 and 3 hours each. You take the 5 hour task and it divides into three of 2, 4, and 1 hours. You take the 4 hour task and it really is two 2 hour tasks of the type you have done before. That's a scaling of 7/5 * 16/10 or 224% more than your original.

Everyone is over optimistic, but they are usually consistently over optimistic .

Originally Posted by Brenden Vickery

How are you guys using transactions?

Wrap it in the UnitOfWork as this allows the application writer to choose. This is the killer pattern for PHP.

Originally Posted by Brenden Vickery

Are you doing a single transaction for each request, or multiple transactions for each request?

95% single transaction. An example of multiple is having session related stuff on another database, but you don't usually use a persistence layer for this unless it is a very complicated, long lived session. Another example is reading from one replicated database, but writing to another, but there are other ways of doing this.

It would not be unreasonable to assume one transaction for each request, but why do so? If the application writer chooses to make the transaction global in some way it will take them just a few lines of code andthey get to choose the mechanism to boot. I would assume multiple.

Originally Posted by Brenden Vickery

Should a single transaction be used for a single UoW? Should this be implicit?

If not then what else would it be? A long lived transaction over multiple requests? That's really hard work and probably not a good idea (an application level command pattern undo is a more testable approach).

Originally Posted by Brenden Vickery

What do you think about using doc comments for metadata mapping instead of xml?

The PhpDocumentor project won't currently be able to help with this as they have got their own problems to sort out first. I would start with XML. Anyway, you can have much more compicated mappings with XML and if you change the syntax you can ship XSLT conversions to port old schemas and data so that users are not overly inconvenienced.

You have to do everything a database could do. Two issues I haven't seen you mention, but without which it's really just a toy . Backup/restore (don't forget to store the schema with the data) and data porting after a schema change. I find this, and transactions, are so fundamental that it's best to get these up and running from the word go. It's then easier to keep them working as you add functionality and these systems will aid in testing and installation.

In fact I don't think your plan is realistic . I haven't got one either, but I do know I cannot see more than a couple of iterations ahead on a project.

Try this. Write down all the features (and their estimates) in order of usefulness. Don't worry about dependencies or commonality, just write them down in terms of actions that people would like to do. For example, dont have a data retrieval without the ability to put data in and index it. It doesn't make sense to the user to have one without the other. Once you havedone this, sort them in order of dependency. For example you cannot have ForeignKeyMapping without first having a DataMapper. Dont look too closely at this as I tend to find programmers are very good at coming up with imagined dependencies by prejudging the infrastructure. The upcoming "feature" become the mission statement of the next iteration.

For example, a first iteration might be to store and retrieve an object (of a specific class) with no data. Remember we are doing this transactionally and with backup/restore, an install script and schema migration when we change, say, the class name to something else. Say MySQL only. We have then cut a vertical slice through the application rather than build one layer at a time, otherwise you inadvertently prejudge your architecture. You also spend too long building infrastructure and producing very little in the way of concrete results to "play interfaces" with.

Just hack the thing together on a first pass. When you add PostgreSQL support, then create a DB connection layer. By spreading the infrastructure over the lifetime of the project, the whole thing becomes more predictable. A couple of iterations in you should have some feedback on how clear your estimates are. You can then use these to scale your estimate on how long a first release will take.

It's also easier to decide on features. If it's a one year project, just drop everything after the first year on your original feature list. The system is called the Scrum backlog, and is very effective at nailing a project to a schedule.

Hey Marcus,
I was hoping to get to take advantage of your coaching again.

Thanks for the transaction feedback. Your thoughts are exactly in line with what I was thinking. I just wanted to see if there were any dissenting views.

Originally Posted by lastcraft

The PhpDocumentor project won't currently be able to help with this as they have got their own problems to sort out first. I would start with XML. Anyway, you can have much more compicated mappings with XML and if you change the syntax you can ship XSLT conversions to port old schemas and data so that users are not overly inconvenienced.

This actually isnt that difficult in php5. I was thinking that this would be a good way to possibly generate your xml config file. A quick prototype I knocked up earlier is:

You have to do everything a database could do. Two issues I haven't seen you mention, but without which it's really just a toy . Backup/restore (don't forget to store the schema with the data) and data porting after a schema change. I find this, and transactions, are so fundamental that it's best to get these up and running from the word go. It's then easier to keep them working as you add functionality and these systems will aid in testing and installation.

How do you imagine this working? A tool scrapes the database into a xml file?
Any examples of an existing project I can get ideas from?

Originally Posted by lastcraft

In fact I don't think your plan is realistic . I haven't got one either, but I do know I cannot see more than a couple of iterations ahead on a project.

Im fairly confident in the early iteration estimates, its after that things get blurry. I usually write out the later iterations just so I have a feel for the features that are coming and change order/estimates where appropriate when things become clearer.

Originally Posted by lastcraft

Try this...

Ok, Im going to trust you on this and give it a shot. Id like to keep to best practices on this project, so if you have any more suggestions like this please feel free.

For example, a first iteration might be to store and retrieve an object (of a specific class) with no data. Remember we are doing this transactionally and with backup/restore, an install script and schema migration when we change, say, the class name to something else. Say MySQL only. We have then cut a vertical slice through the application rather than build one layer at a time, otherwise you inadvertently prejudge your architecture. You also spend too long building infrastructure and producing very little in the way of concrete results to "play interfaces" with.

Im wondering why you suggest needing transactions/backup/restore/install for the first iteration? If the first iteration is to store/retrieve an object why arent we only writing the code to store/retrieve the object and then in iteration 2 for example adding transactions?

I think I can see the benefit of not prejuding the solution by specifying layers early, Im just wondering at what point do certain features need to come in. We know this is going to use a UoW but until thats a feature of an iteration we dont think about it right?

Hopefully Im not answering my own question but, if the first iteration is to store/retrieve an object. We know we are going to need minimal O/R mapping. Since O/R mapping is not a "feature" from the client perspective, this is not an iteration, but merely one of the minimal things that needs to be done to pass the acceptance test?

Dont want to get too far off topic.. maybe a new thread to discuss this type of stuff if anyone is interested. Any book recomendations in this area?

Simplistic solution: how about building out phpMyAdmin to generate ORM classes when changes are made to the database? Or build out an open source DB management app for a different database if we don't want to use MySQL? Using a DB with built in foreign key support rather whan rewriting it in PHP would massivley cut down on the workload. All the final app needs is a strong PHP interface.

You have to do everything a database could do. Two issues I haven't seen you mention, but without which it's really just a toy . Backup/restore (don't forget to store the schema with the data) and data porting after a schema change. I find this, and transactions, are so fundamental that it's best to get these up and running from the word go. It's then easier to keep them working as you add functionality and these systems will aid in testing and installation.

Originally Posted by Brenden Vickery

How do you imagine this working? A tool scrapes the database into a xml file?
Any examples of an existing project I can get ideas from.

Wanted to get on paper a couple more thoughts on this. The main tools to convert the mappings into different formats seem fairly straight forward.

I was working on a similar DocComment solution to what you have in mind. I was just copying what Hibernate use and then just converted the @ tags to XML to look like the XML mapping that would be used.

If the first iteration is to store/retrieve an object why arent we only writing the code to store/retrieve the object and then in iteration 2 for example adding transactions?

It's just that transactions have a big impact on the interface of the object model. They pretty much dictate a UnitOfWork either as an instance or some global structure (yuk).

Originally Posted by Brenden Vickery

Hopefully Im not answering my own question but, if the first iteration is to store/retrieve an object. We know we are going to need minimal O/R mapping. Since O/R mapping is not a "feature" from the client perspective, this is not an iteration, but merely one of the minimal things that needs to be done to pass the acceptance test?

Your interfaces: The object model and the mapping (meta data) and so both are subject to testing. Possibly the database too if you are generating objects from the DB.

I suppose that is why Rails pulls that info from the database, rather than tring to generate the database from the classes or an XML config file. PHP doesn't seem to have the dynamic code generation power that Ruby has, so I can't see an obvious way to port ActiveRecord... would be nice though.

I am afraid this cannot be done reliably. The problem is that people tend to emulate the types of data that they use in classes with alternative database data types.

For instance, many databases do not support boolean types. It is common to map a boolean type to CHAR(1) .

When you go and try to read a database schema to figure the original types that are meant to be stored in each field you may have to deal with ambiguous decisions like: is CHAR(1) a text data type limited to 1 character or is it a boolean?

Same goes for INTEGER: is it an integer or a foreign key? If the database supports foreign keys and you have used them, fine, the doubt is cleared, but what if the database did not support foreign keys?

Metabase is a database abstraction package that has a module for schema reverse engineer. It tries to read the schema and figure the original data type. When the datatype is ambiguous, it returns a list of possible data type mappings leaving the decision for a human developer to clear. This is something that cannot be done reliably at the application runtime as even the most likely datatype choice may be wrong.

Suppose I need to send an email to large set of affiliates, filtering on the kind of content they agreed to receive.

Unless I misunderstood something, it seems to me that you are using the wrong approach. To send notifications, you only need at most the e-mail address, the name of the person a message template. You do not need to bring piles of objects to the main memory to send a mailing to a list. That will needlessly consume too much memory and take ages to finish if your mailing list is large.

A better approach is to create a report class that just queries the object tables and retrieve the values of the fields that you need.

That is the approach that I use to present listings or for other purposes that require some kind of read only batch processing of job that only needs the values of some variables of objects of one or more related classes.

For this kind of batch processing I am using the recently added report class generation support of Metastorage. I just define a query expression in Metastorage OQL, the variables of the classes that I need, and eventual result sorting and row range limiting support, and then I associate a report class function to execute that query and return the results as row arrays. When convinient I may also retrieve the generated SQL code to pass to some mailing list processing code.

Here is an example of a report query and and execution and retrieval function for retrieving the subscribers of a forum of a package of the PHP Classes site that

How do you imagine this working? A tool scrapes the database into a xml file?
Any examples of an existing project I can get ideas from?

If I got you right what you mean, as I mentioned, Metabase can do that.

Metabase has a database schema manage class that lets you define tables, fields, sequences, indexes in a database independent format based on XML.

Once you define your database schema in Metabase XML format, you just need to tell the schema manager to create all the tables as you described.

Later, you can even modify your schema and tell the schema manager to upgrade the installed schema by executing the necessary ALTER, CREATE or DROP statements to change the schema without disturbing the any data that you may have already inserted in the database tables since the schema was installed for the first time or upgraded for the last time.

Additionally, to help people migrating their database applications to start using Metabase, there is also support to reverse engineer an installed database schema and generate a Metabase XML schema definition from it.

Metastorage generates data object classes and also a schema management class that use Metabase to install or upgrade your database schema, making the development of applications with persistent object classes generates by Metastorage a breeze.

This means that in practice, you can change your persistent object classes definition anytime you want and have Metastorage regenerate all the code and database schema in Metabase format in a few seconds.

In a few more seconds, you just call the schema manager class generated by Metastorage and it calls Metabase schema manager class to use the necessary SQL DDL classes to update your class schema in a safe way, immune to the common SQL errors that usually may make and can scr*w up a whole database.

BTW, for those that would like to use a tool like Metastorage but prefer that it uses a database abstraction package more like to PEAR flavour, I will be adding support to Metastorage so it can generate PEAR::MDB code too. Just mail me privately if you are interested in participating in the tests as I am not a PEAR::MDB user and it would help if real users could test it.

I am afraid this cannot be done reliably. The problem is that people tend to emulate the types of data that they use in classes with alternative database data types.

Perhaps theoretically, but PHP isn't statically typed in the first place. 1 is as good as a True, so just store a 1 or a zero in the DB and you don't need to worry about typing.

Originally Posted by mlemos

Same goes for INTEGER: is it an integer or a foreign key? If the database supports foreign keys and you have used them, fine, the doubt is cleared, but what if the database did not support foreign keys?

Considering a foreign key doesn't even need to be an integer, looks like you'll need a totally different method. Rails uses naming conventions, so a foreign key would look like other_table_id. How does Metabase gather this info? Is it all done manually?

Originally Posted by mlemos

When the datatype is ambiguous, it returns a list of possible data type mappings leaving the decision for a human developer to clear.

When is data type mapping needed for PHP? PHP is dynamically typed... is it just for form generation? I find that most of my view code is hand written anyway, otherwise you get in a position where you are trying to describe your form in some 3rd language which gets translated to HTML, then for anything complex you have to rewrite the same data again directly in HTML anyway.

Originally Posted by Brenden Vickery

How do you imagine this working? A tool scrapes the database into a xml file?
Any examples of an existing project I can get ideas from?

Wouldn't a simple SQL dump work in most cases? If you are moving from one type of database to another, then you might need some SQL translation, but why does it need to be a new language? Sure, you'll need some sort of parsed state for storage in memory as you translate, but why would this ever need to be written to the file system?

When is data type mapping needed for PHP? PHP is dynamically typed... is it just for form generation? I find that most of my view code is hand written anyway, otherwise you get in a position where you are trying to describe your form in some 3rd language which gets translated to HTML, then for anything complex you have to rewrite the same data again directly in HTML anyway.

Couple of places. When youre not using direct SQL you need to know the datatype in the database to ensure proper quoting. This may be a simple as knowing its a string, so you quote it, or knowing its an int so you dont.

When dealing with things like dates and timestamps using objects as datatypes can become easier. If you have a php timestamp do you have it mapped to a mysql timestamp or a mysql date column? I would say a mechanism to use custom objects for datatypes is important aswell.

Originally Posted by DougBTX

Wouldn't a simple SQL dump work in most cases? If you are moving from one type of database to another, then you might need some SQL translation, but why does it need to be a new language? Sure, you'll need some sort of parsed state for storage in memory as you translate, but why would this ever need to be written to the file system?

Dont know fully as Ive never dealt with this. I assume there are some differences in the way INSERT statements are done in different languages. Its easier to deal with these differences if you move to xml.

One thing may be data types in the database. Is a Timestamp is mysql the same as a timestamp in oracle? Im not sure about all the idiosyncrasies of each database.