The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Ideas for lightweight ORM implementation

Arborint and I are going to be working on a lightweight ORM in the same vein as our Pagination classes. Basically, we want to write something that will fill the void between a Table Data Gateway/Active Record implementation and heavy duty ORMs like Doctrine or Propel. Ideally it would be a layered solution, possibly even built over Skeleton's existing TDG or AR classes. We are using Fowler's ORM patterns as a starting point, but may go in a couple of different directions depending on where the code takes us.

I was wondering if anyone had any suggestions as to what they would like to see in a lightweight ORM. Any input would be very much appreciated.

I have for a while maintained a simple database query library, which you might label a lightweight orm. It might offer you some inspiration. It doesn't try to map relations and it doesn't try to track identity, which are the two most problematic issues of orm. I think the moment you venture down that route, it's hard to stop before you have a full blown orm.

Last edited by kyberfabrikken; Jun 2, 2009 at 02:37.
Reason: formatting of link

I have for a while maintained a simple database query library, which you might label a lightweight orm. It might offer you some inspiration. It doesn't try to map relations and it doesn't try to track identity, which are the two most problematic issues of orm. I think the moment you venture down that route, it's hard to stop before you have a full blown orm.

Cool. I've seen that library before, but I'll check it out. We definitely want to map relations and will probably be tracking identity, so if it does end up a full-blown ORM, so be it. We're more committed to providing proper persistence for a rich domain layer than staying lightweight, though we would prefer to keep it simple if at all possible.

Originally Posted by oddz

1.) mapping relations

Definitely.

Originally Posted by oddz

2.) eager loading

Definitely, and lazy loading/batch lazy loading. However, I think I remember your AR solution doing these at runtime... we will probably be setting up default loading styles in our mappings. If you think it's worthwhile to support modifying the defaults at runtime, we'll definitely look into it.

Originally Posted by oddz

3.) projection elimination

What is this?

Originally Posted by oddz

4.) support for calculated columns defined at run-time (specifically for groups with joins)

Sure, though I don't know about at runtime. That seems like something that should be set up in the mappings, and not something that should change at runtime.

I'm pretty sure you already know about my project, but you may want to check out phpDataMapper, and especially the goals page. It also aims to be a super lightweight ORM that already has support for table relations and a few other nice things. I would love to have some help on developing it further if you and your partner are interested in helping. It's obviously not an active record (AR) like you mentioned wanting to make, but I think if you really look at both patterns, the data mapper pattern is much better and more loosely coupled from the data itself. Let me know.

It's obviously not an active record (AR) like you mentioned wanting to make, but I think if you really look at both patterns, the data mapper pattern is much better and more loosely coupled from the data itself.

Hmm... I hope I didn't say that! I'm not a fan of AR and have no intention of writing an ORM that implements AR

I have three ORMs that do this:
- one very lightweight: no relationships, used more as a DTO/DAO for small projects, 100-200 lines of code. (model only contains the fields of the table)
- one simple one: has all relationships, used as a DTO/DAO, 200-300 lines of code. (model only contains the fields and keys of the table)
- one more complicated: has all relationships, validation rules, caching, etc, used more as a framework, ~1000 lines. (model also contains some validation rules and messages for each field).

Once late static binding becomes available (php 5.3.0+), the my classes should become much faster.

I appreciate you guys posting your interfaces, but I think that sort of belies the complexity of the code behind the interface. Just for starters, how does the system get from a database table to an object [and back again]? How much flexibility does the developer have in separating the domain and data layers? These are the first of many questions to be asked, and none really have an easy answer.

Yeah, yeah, I see how this ends... I bet I'll spend months on this thing, pour my heart and soul into it, and then come back looking for feedback and get a "It looks too complicated..."

Seriously though... if I had this library done right now, and told you to go check it out, what would it include? What features are a must for an ORM? I think I've got methods that start with find covered...

I posted all the requirements of an ORM.
From the usage point of view, you won't need more than the stuff I posted in my previous post.

See oddzes post, he has the same functionality I do, but presented in a different way.

How you make it work on the inside is not important, that's the beauty of OOP, all you need to know is the INs and OUTs, the stuff in between can change no problem (and usually changes when it gets optimized).

Ps:
- I posted my class line numbers, to give you an idea of how much code is in there, a few days (with the test cases), not months.

I wouldn't consider my solution "lightweight". Maybe medium weight. Just to give you an idea currently there are 38 classes and 10 interfaces. Some files are only 20 lines while others are 500+. I've been working on my own for about 7 months now on off. Within the last couple of months though is when things began to really come together to a point I'm happy with. I probably couldn't begin to discuss in any meaningful manor all the algorithms involved. As time has progressed my system has changed dramatically. Don't necessarily worry about covering everything. Just begin and use a interface that makes it painless to modify and add to existing functionality.

The ActiveRecord pattern maps object properties to table fields and tables to models. It is a object-relational mapper. Either way both the Gateway and ActiveRecord pattern provide a object-oriented interface for communicating with the database. One is not superior to the other in opinion. They both have their weaknesses and advantages.

The ActiveRecord pattern maps object properties to table fields and tables to models. It is a object-relational mapper. Either way both the Gateway and ActiveRecord pattern provide a object-oriented interface for communicating with the database.

An AR implementation can be an ORM, but it doesn't necessarily have to be. The core concept in an ORM is mapping between objects and a relational database, and that word inherently implies some differences between the two. From what I can tell from the code you've posted, your object properties map 1:1 to database fields. This is really common, and is how Ruby's AR implementation works as far as I know, but the reason ORMs are so complex is because they need to map one property to two fields, or one property to another object, or two properties to one field, or two objects to one table, etc. And that isn't even getting into loading strategies and identity maps.

There are really only subtle interface differences between a system that is a true ORM and one that maps 1:1, hence me saying the simplicity of the interface belies the complexity of the code beneath. Heck, you could even wrap an ORM in an AR interface that delegates all loading and saving to the system, that's just changing the interface. But there's a lot more to ORM than an object-oriented interface for working with the database.

but the reason ORMs are so complex is because they need to map one property to two fields, or one property to another object, or two properties to one field, or two objects to one table, etc. And that isn't even getting into loading strategies and identity maps.

I've found that handling that at run-time makes the system more flexible, generic and reusable. An identity map can be something as simple as a primary key. The one requirement in my system that every table must have a primary key. Without that primary key there isn't a way to identify unique records which would make projection elimination and recursive saving impossible.

The other constraint that goes along with this is that every table can only have one primary key. Otherwise, resolving associations between models and saving hierarchies would become a mess. Other then that though everything else is pretty much open field.

The identity of a record is then tracked based on its class name and primary key value. This makes it possible to eliminate all repeating data in a result set consisting of any number of joins. Every table that is apart of the join can be related to a a model and every record within each individual table can be uniquely identified by the primary key. Thus, the primary key of any table in a join sequence is always included.

Furthermore, I believe that the usability and flexibility of the system is more important then adhering to any pattern. Patterns exist as a guide not a solution. I think people care more about flexibility and ease of use then what patterns were used. I know I certainly do. The implementation can be as complex as it needs to be, but the interface which people will directly use the system needs to be as straightforward as possible. If a rule needs to be broken to simplify the interface then so be it.

I've found that handling that at run-time makes the system more flexible, generic and reusable.

It's flexible, generic and reusable because it makes you do all the work at runtime. If I have to define mappings at runtime I'd rather just code SQL by hand and not have to work with passing a huge array of configuration.

Originally Posted by oddz

Every table that is a part of the join can be related to a model

You're still assuming a 1:1 relationship between table and domain object.

Originally Posted by oddz

Furthermore, I believe that the usability and flexibility of the system is more important then adhering to any pattern. Patterns exist as a guide not a solution. I think people care more about flexibility and ease of use then what patterns were used. I know I certainly do. The implementation can be as complex as it needs to be, but the interface which people will directly use the system needs to be as straightforward as possible. If a rule needs to be broken to simplify the interface then so be it.

Your interface is fine. As I said before, there's only subtle differences between yours and a true ORM. It's the implementation that makes the difference. As far as patterns go: I don't care what you call it, your class is not flexible enough nor easy enough to use to map differences between the domain and the database. To me, that's an essential requirement for anything more complex than a generic Table Data Gateway that works with arrays.

Originally Posted by oddz

I'm a huge proponent of OO thinking, but personally I rather do this:

I'm not sure what the second example is representative of, but here's what I would do:

PHP Code:

$userMapper->findById(89);

Originally Posted by oddz

So the common ground between the two is to use arrays and have the system convert those arrays to the proper object. This eliminates an extra step on behalf of the person using the system.

Oh, I think I understand. You're talking about how to configure an object to get what you want. I think both examples are indicative of your preference for setting configuration at runtime. However, a little work when you write the class can pay off: you said yourself IDs were required in your system. In that case, a method like findById ($id) makes things easy so you don't have to configure, you just pass the necessary data.

Of course, you can take that too far by having a method for every single property in your domain object. Some sort of generic finder methods needs to be included that either use arrays, as yours does, an Object Query Language (OQL), or straight SQL. Personally, I find large nested arrays difficult to work with, and easy to mess up because there's no real interface to use. I've struggled myself on finding a good OQL that I like, but to be honest I think I'd prefer writing fragments of SQL for non-standard finders. You can't beat it's expressiveness, at least not in PHP.

The problem with hard coding different find methods is that the model file must be changed. By providing a interface to do anything at run-time the model never needs to be touched. Furthermore, routing everything to one method to find everything supports the DRY principle. How the model finds what it is looking for is best a decided at run-time in my opinion. Otherwise, you end up with a bunch of similar methods that aren't very flexible.

Originally Posted by allspiritseve

Personally, I find large nested arrays difficult to work with, and easy to mess up because there's no real interface to use.

Actually the finder does have a interface. The array is converted by the system though.

PHP Code:

<?php/** ActiveRecord will only communicate with find config through this interface*/interface IActiveRecordFindConfig {

/* * Determines what specific fields to select from model. If not supplied all fields selected. regardless of whether * or not this is specified the primary key for the model will be selected. */const findSelect = 'select';

/* * Any "made up" fields you would like to essentially overload into the model. For example, this may be used to add * a calculated field that uses fields from various included models. Ie. array('href'=>'Concat('<a href="',Bid.user_id,'">',Project.title,'</a>')') * The system will go through a replace the model names with the appropriate aliases if used in this way. */const findDynamic = 'dynamic';

/* * The main difference between a filter and a condition is that a filter can be transformed and the key name is the column with one * exception. Tat exception being that any ( or ) character are extracted and reapplied, then what is left is used as the column name. This * is done to allow grouping of conditions easilly. Ie. array('(id'=>array('? OR id=? OR id=?)',9,8,7)). Filters are also magical in the sense * that you need not specify a filter key. You may place filters directly in the argument array and anything that isn't a keyword * will assumed to be a filter. For example. array('limit'=>9,'id'=>10) - In this instance id will be extracted as a filter becasue limit * is a keyword for the finder mechanism. */const findFilter = 'filter';

/* * A condition is essentially the same as a filter but, allows precise control over input. A condition uses the keys * within as names of the condition. These need not relate to columns in the model though. They are just names which * may be refered to in the filterMap. The value of a condition has keys 0-2 (3). The first is the left side * second operator third right side. If a array is used for either key 0 or 2 the first key inside that array is embedded * and the rest are bound. So you should use placeholders ? to determine where that bound data goes. * . Ie. 'condition'=>array('myFilter'=>array('Project.created','>=',array('FROM_UNIXTIME(?)','5'))) * Conditions are not based on belonging to the model which the argument resides. Therefore, if you have included a blog_comment * instead of specifying a second argument array you may just use a condition and the model will be aliased as appropriate. * Ie. array('include'=>'blog_entry','condition'=>array('id'=>array('BlogEntry.id','=',9))). */const findCondition = 'condition';

/* * The join type for a related table. This is essentially ireelevant for the first table/main model */const findJoinType = 'join';

/* * Similar to findJoinType but this option is less specific and shouldbe a boolean. If the boolean is true * and the join type has not been declared then join type will default to inner. If the boolean is false * then the join type will default to left. However, if the joinType has been specified then this option * is essentially ignored becasue joinType option is more specific. */const findRequireJoin = 'require';

/* * Allows precise control over how conditions are placed together via name. This option * works alongside the condition option by using the names of the conditions and replacing them * with the actual condition values. Ie. 'filterMap'=>'({name} OR {name2})' This would look * to the conditions and find conditions with the specified names then place then replace the name with the appropriate string * and use that as the filter. You may also pass a array for this option. The values that follow the first will be bound * to the query. Therefore, you would use ? placeholders in the filgterMap to specify where the bound data goes. */const findConditionMap = 'conditionMap';

const findInvisible = 'cloak';

// deselects all columns including primary key. This is useful for subqueries where one // may only wish to return one column

public function getInclude(); public function getLimit(); public function getOffset(); public function getSelect(); public function getNonSelect(); public function getDynamic(); public function getCondition(); public function getConditionMap(); public function getFilter(); public function getGroup(); public function getSort(); public function getJoinType(); public function getRequireJoin(); public function getHaving(); public function getMagicalFilter(); public function getInvisible(); public function getEmpty(); public function getAssociation(); public function getAssociationPropertyName(); public function getAssociationPropertyType();

public function hasInclude(); public function hasLimit(); public function hasOffset(); public function hasSelect(); public function hasNonSelect(); public function hasDynamic(); public function hasCondition(); public function hasConditionMap(); public function hasFilter(); public function hasGroup(); public function hasSort(); public function hasHaving(); public function hasJoinType(); public function hasRequireJoin(); public function hasMagicalFilter(); public function hasInvisible(); public function hasEmpty(); public function hasAssociation(); public function hasAssociationPropertyName(); public function hasAssociationPropertyType();

public function getClassName(); public function getTable(); public function getFields(); public function getPrimaryKey(); public function getUniqueKeys(); public function getForeignKeys(); public function getTransformations(); public function getDataTypes(); public function getRequiredFields(); public function getDefaultValues(); public function getCascadeDelete(); public function getLinks(); public function gethasOne(); public function getHasMany(); public function getBelongsTo(); public function getBelongsToAndHasMany();

public function hasClassName(); public function hasTable(); public function hasFields(); public function hasPrimaryKey(); public function hasUniqueKeys(); public function hasForeignKeys(); public function hasTransformations(); public function hasDataTypes(); public function hasRequiredFields(); public function hasDefaultValues(); public function hasCascadeDelete(); public function hasLinks(); public function hasOne(); public function hasMany(); public function hasBelongsTo(); public function hasBelongsToAndHasMany();

All forms of communication between the outside world happen through those interfaces for find and model configs. Arrays are just easier to manage in terms of practical usage in my opinion. However, everything has a interface of some sort.

I'm not sure I would call that subtle. I think it's a very big difference if the User can retrieve itself from the database, which couples the domain layer with the persistance layer, and if the domain layer is unaware of the persistance.

I'm not sure I would call that subtle. I think it's a very big difference if the User can retrieve itself from the database, which couples the domain layer with the persistance layer, and if the domain layer is unaware of the persistance.

Well... I agree

I guess my point was just that the interface isn't really that indicative of the underlying complexity. For instance, I could make a properly separated Mapper look like AR by passing it in the constructor of a domain object and delegating finders and save() to it. You wouldn't know from looking at the interface compared to a standard AR that one contains persistence logic and the other doesn't. Hence the subtle.