For now, if you need to emulate the soon-to-be PropelRS, you can just doSelectRS() and manually hydrate. This is the current (and it should be future-safe) way to iterate many many objects in a memory- efficient way.

In the future, the proposed PropelRS will let you use the Iterator interface to achieve lazy-hydration.

Alan

On Oct 4, 2006, at 3:26 PM, Sven Tietje wrote:

> Alan Pinstein schrieb:>> Yeah, I don't get it either.>>>> Right now, if you use XXXPeer::doSelect() you get an array with >> all objects populated. If you plan on using *all* of the objects >> in the result set, then it doesn't improve performance to hydrate >> them on-demand; it just changes WHERE the cycles happen, but this >> doesn't matter, because the overall performance won't change..> yes, it just one iteration over the resultset, that you do later on > - thats true! you are absolutely right. sometimes, you have to > iterator twice or three times over one resultset (different places > of output, logical iterator). creating them on the first iteration > will save one foreach - don`t laugh!!! :-)>> somestimes, you iterator over a resultset and set values. later on > in your application, you iterate again over the resultset and want > to do something and to save your objects. a caching is need to make > compatible to older versions:>> $rs = XXXXPeer::doSelect();>> $rs[0]->setName('test'); // object#1> $rs[1] //object#2> ... your application....>> $rs[0]->getName(); // object#1> $rs[0]->setFirst​Name('test2'); should always work on the same > object instance.>> but sometimes, you habe to iterate once - then caching should / can > be disabled:> $rs = XXXXPeer::doSelect(Criteria, Propel::NO_CACHING);> $rs[0] // object #1> later on again would result in:> $rs[0] // object #2 ()>> i just want to have one resultset having the functions of the > actual array propel gives to me, but also the possibilty to do the > "real iterate way" to save memory. for the last variant my idea was > to use FETCH_INTO, but alan convineced me: "the only overhead is > __construct" (when you real can call it overhead :-))>> --------------------​--------------------​--------------------​---------> To unsubscribe, e-mail: dev-unsubscribe@prop​el.tigris.org> For additional commands, e-mail: dev-help at propel dot tigris dot org>

Alan Pinstein schrieb:> Yeah, I don't get it either.>> Right now, if you use XXXPeer::doSelect() you get an array with all > objects populated. If you plan on using *all* of the objects in the > result set, then it doesn't improve performance to hydrate them > on-demand; it just changes WHERE the cycles happen, but this doesn't > matter, because the overall performance won't change..yes, it just one iteration over the resultset, that you do later on - thats true! you are absolutely right. sometimes, you have to iterator twice or three times over one resultset (different places of output, logical iterator). creating them on the first iteration will save one foreach - don`t laugh!!! :-)

somestimes, you iterator over a resultset and set values. later on in your application, you iterate again over the resultset and want to do something and to save your objects. a caching is need to make compatible to older versions:

$rs[0]->getName(); // object#1$rs[0]->setFirst​Name('test2'); should always work on the same object instance.

but sometimes, you habe to iterate once - then caching should / can be disabled:$rs = XXXXPeer::doSelect(Criteria, Propel::NO_CACHING);$rs[0] // object #1later on again would result in:$rs[0] // object #2 ()

i just want to have one resultset having the functions of the actual array propel gives to me, but also the possibilty to do the "real iterate way" to save memory. for the last variant my idea was to use FETCH_INTO, but alan convineced me: "the only overhead is __construct" (when you real can call it overhead :-))

Right now, if you use XXXPeer::doSelect() you get an array with all objects populated. If you plan on using *all* of the objects in the result set, then it doesn't improve performance to hydrate them on- demand; it just changes WHERE the cycles happen, but this doesn't matter, because the overall performance won't change.

The result of doSelect() is iteratable, array-accessible, and isn't re-hydrated on repeat access. So I *still* don't understand what you are trying to achieve...

>> the second variant is a none-caching variant -> not use much >> memory -> eg overview, only output etc...>> of course, i could use an array, but i like oo-style for fetching >> data. cause of this, i proposed to use FETCH_INTO - we can do it >> another way. $object->getRelatedObject() is queried on the fly, >> too. afterwards, data is gone.

This is not really true. While getRelatedObject() is on the fly, it's only the FIRST TIME! So if you getRelatedObject() on the first instance, then FETCH_INTO another row on top of this object, your "new" object still is related to the one the previous row was. OOPS! Have fun debugging that...

Alan

On Oct 4, 2006, at 2:20 PM, David Zülke wrote:

> ???>> Isn't the first method nonsense? It's not any different from what > we're doing right now.>> I think you're focusing too much on the performance of hydration. > I'll try the direct PDO hydration idea later. I bet this will be > incredibly fast. The only point where caching results makes sense > is when we want to avoid querying the database again - that's the > slow part.>> foreach($result as $row) {> // each $row is hydrated on demand and NOT stored anywhere. under > ideal circumstances, this would mean that memory use never exceeds > the amount of memory needed for the largest row in the result set> }>> This is what iterators would be about!>>> David>>> Am 04.10.2006 um 20:11 schrieb Sven Tietje:>>> hi alan,>>>> don`t want any war with you - i`m a friend of clean oo design, >> too. perhaps i didn`t make myself clear. of course, i prefer a >> single instance of each object.>>>> i`d like to have PropelResultset:>>>> class PropelResultset implements IteratorAggregate, ArrayAccess {>> }>>>> foreach ($resultset as $row) {>> private $cache = array(>> 0 => BaseObject,>> 1 => BaseObject,>> .....>> )>>>> // the objects are generated on the fly and on demand.>> // the objects are cached>> }>>>> fetching a special element afterwards will not generator a new >> instance again - will return the object generated during the first >> iteration.>> $blah = $resultset[0] => [0] is already generated - you`ll get >> the cached object back.>>>> or you iterate the resultset again => it`s not necessary to >> hydrate the objects again - its just iterated about the interal >> cached array containing the object instances.>>>> i think, that variante is the normal variant and its compatible to >> all our applications.>>>> the second variant is a none-caching variant -> not use much >> memory -> eg overview, only output etc...>> of course, i could use an array, but i like oo-style for fetching >> data. cause of this, i proposed to use FETCH_INTO - we can do it >> another way. $object->getRelatedObject() is queried on the fly, >> too. afterwards, data is gone.>>>> the first variant should still be the prefered and default variant >> of handling data.>>>> greets sven>>>> Alan Pinstein schrieb:>>> Why? Because re-using objects is:>>>>>> 1. BAD OO design... one instance == 1 LOGICAL instance. If you re- >>> use it, you are pretty much breaking black-box. "Clients" of your >>> objects don't expect this behavior, because it's bad OO design.>>>>>> What if in the course of using FETCH_INTO, you have some >>> relationships:>>>>>> foreach (FETCH_INTO loop) {>>> $myObj->setRelat​edObject($Y);>>> }>>>>>> Well this is going to end up an awful mess, with no referential >>> integrity in your object model.>>>>>> 2. There is no meaningful benefit to FETCH_INTO. Seriously, name >>> one potential benefit of doing this. Are you thinking >>> performance? Lots of things in OO would be faster if you break >>> the OO... but it shouldn't be done.>>>>>> 3. With your example:>>>>>>> Now, i wanna publish a List of Person with their fullnames. I >>>> don`t want to>>>> change data or something -> just want to print table containting >>>> the>>>> fullname and some columns additional information. I`ll implement >>>> a method>>>> getFullname in my Object-Class:>>>>>> Ok great! That's a completely reasonable thing to want to do. >>> HOWEVER, I don't see why getting back DISTINCT instances for each >>> iteration prevents this from happening. It doesn't use more >>> memory and it isn't much slower (only real speed difference is >>> calling the constructor).>>>>>> However, I can promise you that getting back the SAME instance >>> each time, but it having different data for *some* columns, will >>> definitely cause you hours of painful debugging, followed by "Why >>> am I using FETCH_INTO"?>>>> --------------------​--------------------​--------------------​--------->> To unsubscribe, e-mail: dev-unsubscribe@prop​el.tigris.org>> For additional commands, e-mail: dev-help at propel dot tigris dot org>>>>>> --------------------​--------------------​--------------------​---------> To unsubscribe, e-mail: dev-unsubscribe@prop​el.tigris.org> For additional commands, e-mail: dev-help at propel dot tigris dot org>

Isn't the first method nonsense? It's not any different from what we're doing right now.

I think you're focusing too much on the performance of hydration. I'll try the direct PDO hydration idea later. I bet this will be incredibly fast. The only point where caching results makes sense is when we want to avoid querying the database again - that's the slow part.

foreach($result as $row) { // each $row is hydrated on demand and NOT stored anywhere. under ideal circumstances, this would mean that memory use never exceeds the amount of memory needed for the largest row in the result set}

This is what iterators would be about!

David

Am 04.10.2006 um 20:11 schrieb Sven Tietje:

> hi alan,>> don`t want any war with you - i`m a friend of clean oo design, too. > perhaps i didn`t make myself clear. of course, i prefer a single > instance of each object.>> i`d like to have PropelResultset:>> class PropelResultset implements IteratorAggregate, ArrayAccess {> }>> foreach ($resultset as $row) {> private $cache = array(> 0 => BaseObject,> 1 => BaseObject,> .....> )>> // the objects are generated on the fly and on demand.> // the objects are cached> }>> fetching a special element afterwards will not generator a new > instance again - will return the object generated during the first > iteration.> $blah = $resultset[0] => [0] is already generated - you`ll get the > cached object back.>> or you iterate the resultset again => it`s not necessary to hydrate > the objects again - its just iterated about the interal cached > array containing the object instances.>> i think, that variante is the normal variant and its compatible to > all our applications.>> the second variant is a none-caching variant -> not use much memory > -> eg overview, only output etc...> of course, i could use an array, but i like oo-style for fetching > data. cause of this, i proposed to use FETCH_INTO - we can do it > another way. $object->getRelatedObject() is queried on the fly, > too. afterwards, data is gone.>> the first variant should still be the prefered and default variant > of handling data.>> greets sven>> Alan Pinstein schrieb:>> Why? Because re-using objects is:>>>> 1. BAD OO design... one instance == 1 LOGICAL instance. If you re- >> use it, you are pretty much breaking black-box. "Clients" of your >> objects don't expect this behavior, because it's bad OO design.>>>> What if in the course of using FETCH_INTO, you have some >> relationships:>>>> foreach (FETCH_INTO loop) {>> $myObj->setRelat​edObject($Y);>> }>>>> Well this is going to end up an awful mess, with no referential >> integrity in your object model.>>>> 2. There is no meaningful benefit to FETCH_INTO. Seriously, name >> one potential benefit of doing this. Are you thinking performance? >> Lots of things in OO would be faster if you break the OO... but it >> shouldn't be done.>>>> 3. With your example:>>>>> Now, i wanna publish a List of Person with their fullnames. I >>> don`t want to>>> change data or something -> just want to print table containting the>>> fullname and some columns additional information. I`ll implement >>> a method>>> getFullname in my Object-Class:>>>> Ok great! That's a completely reasonable thing to want to do. >> HOWEVER, I don't see why getting back DISTINCT instances for each >> iteration prevents this from happening. It doesn't use more memory >> and it isn't much slower (only real speed difference is calling >> the constructor).>>>> However, I can promise you that getting back the SAME instance >> each time, but it having different data for *some* columns, will >> definitely cause you hours of painful debugging, followed by "Why >> am I using FETCH_INTO"?>> --------------------​--------------------​--------------------​---------> To unsubscribe, e-mail: dev-unsubscribe@prop​el.tigris.org> For additional commands, e-mail: dev-help at propel dot tigris dot org>>

// the objects are generated on the fly and on demand. // the objects are cached}

fetching a special element afterwards will not generator a new instance again - will return the object generated during the first iteration.$blah = $resultset[0] => [0] is already generated - you`ll get the cached object back.

or you iterate the resultset again => it`s not necessary to hydrate the objects again - its just iterated about the interal cached array containing the object instances.

i think, that variante is the normal variant and its compatible to all our applications.

the second variant is a none-caching variant -> not use much memory -> eg overview, only output etc...of course, i could use an array, but i like oo-style for fetching data. cause of this, i proposed to use FETCH_INTO - we can do it another way. $object->getRelatedObject() is queried on the fly, too. afterwards, data is gone.

the first variant should still be the prefered and default variant of handling data.

greets sven

Alan Pinstein schrieb:> Why? Because re-using objects is:>> 1. BAD OO design... one instance == 1 LOGICAL instance. If you re-use > it, you are pretty much breaking black-box. "Clients" of your objects > don't expect this behavior, because it's bad OO design.>> What if in the course of using FETCH_INTO, you have some relationships:>> foreach (FETCH_INTO loop) {> $myObj->setRelat​edObject($Y);> }>> Well this is going to end up an awful mess, with no referential > integrity in your object model.>> 2. There is no meaningful benefit to FETCH_INTO. Seriously, name one > potential benefit of doing this. Are you thinking performance? Lots of > things in OO would be faster if you break the OO... but it shouldn't > be done.>> 3. With your example:>>> Now, i wanna publish a List of Person with their fullnames. I don`t >> want to>> change data or something -> just want to print table containting the>> fullname and some columns additional information. I`ll implement a >> method>> getFullname in my Object-Class:>> Ok great! That's a completely reasonable thing to want to do. HOWEVER, > I don't see why getting back DISTINCT instances for each iteration > prevents this from happening. It doesn't use more memory and it isn't > much slower (only real speed difference is calling the constructor).>> However, I can promise you that getting back the SAME instance each > time, but it having different data for *some* columns, will definitely > cause you hours of painful debugging, followed by "Why am I using > FETCH_INTO"?

I agree with this. The fact that we'll be using iterators will (hopefully) eliminate any memory issues, since the objects are returned on the fly and not stored anywhere first.

David

Am 04.10.2006 um 17:25 schrieb Alan Pinstein:

>> Why get an instance of each object?>> Why? Because re-using objects is:>> 1. BAD OO design... one instance == 1 LOGICAL instance. If you re- > use it, you are pretty much breaking black-box. "Clients" of your > objects don't expect this behavior, because it's bad OO design.>> What if in the course of using FETCH_INTO, you have some > relationships:>> foreach (FETCH_INTO loop) {> $myObj->setRelat​edObject($Y);> }>> Well this is going to end up an awful mess, with no referential > integrity in your object model.>> 2. There is no meaningful benefit to FETCH_INTO. Seriously, name > one potential benefit of doing this. Are you thinking performance? > Lots of things in OO would be faster if you break the OO... but it > shouldn't be done.>> 3. With your example:>>> Now, i wanna publish a List of Person with their fullnames. I >> don`t want to>> change data or something -> just want to print table containting the>> fullname and some columns additional information. I`ll implement a >> method>> getFullname in my Object-Class:>> Ok great! That's a completely reasonable thing to want to do. > HOWEVER, I don't see why getting back DISTINCT instances for each > iteration prevents this from happening. It doesn't use more memory > and it isn't much slower (only real speed difference is calling the > constructor).>> However, I can promise you that getting back the SAME instance each > time, but it having different data for *some* columns, will > definitely cause you hours of painful debugging, followed by "Why > am I using FETCH_INTO"?>> --------->> Just b/c PHP does something, doesn't mean that it's a good idea, or > that every project using it should "wrap" that functionality.>> I'd be happy to continue this conversation, IFF you can provide an > example of something that FETCH_INTO provides that you can't > otherwise do.>> Alan>> --------------------​--------------------​--------------------​---------> To unsubscribe, e-mail: dev-unsubscribe@prop​el.tigris.org> For additional commands, e-mail: dev-help at propel dot tigris dot org>>

What if in the course of using FETCH_INTO, you have some relationships:

foreach (FETCH_INTO loop) {$myObj->setRelat​edObject($Y);}

Well this is going to end up an awful mess, with no referential integrity in your object model.

2. There is no meaningful benefit to FETCH_INTO. Seriously, name one potential benefit of doing this. Are you thinking performance? Lots of things in OO would be faster if you break the OO... but it shouldn't be done.

Ok great! That's a completely reasonable thing to want to do. HOWEVER, I don't see why getting back DISTINCT instances for each iteration prevents this from happening. It doesn't use more memory and it isn't much slower (only real speed difference is calling the constructor).

However, I can promise you that getting back the SAME instance each time, but it having different data for *some* columns, will definitely cause you hours of painful debugging, followed by "Why am I using FETCH_INTO"?

---------

Just b/c PHP does something, doesn't mean that it's a good idea, or that every project using it should "wrap" that functionality.

I'd be happy to continue this conversation, IFF you can provide an example of something that FETCH_INTO provides that you can't otherwise do.

*Peer::doSelectRs() (called doSelectStmt(), I believe, in 1.3) operatesbelow the level at which the objects are cached -- so it bypasses thecaching.

Hans

Alan Pinstein wrote:> Do you mean that if we use "PropelRS" or hydrate manually, this will> skip the identity-map process and thus preserve "expected" GC for> "unbounded" mode applications?>> If so, +1.>> :)>> On Oct 4, 2006, at 11:09 AM, Hans Lellelid wrote:>>> Obviously if we use Alan's approach to large object sets (which is also>> what I do), this object caching will work fine (since no cache would be>> created).>> --------------------​--------------------​--------------------​---------> To unsubscribe, e-mail: dev-unsubscribe@prop​el.tigris.org> For additional commands, e-mail: dev-help at propel dot tigris dot org>

> 1) I just realized how much overhead we have when fetching results:>> while($row = $stmt->fetch(PDO​::FETCH_NUM)) {> $obj = new $cls();> $obj->hydrate($row);> $results[] = $obj;> }>> We can use PDO::FETCH_CLASS here. PDO will then create a new "Book" > instance and set the fields. We can even pass constructor > arguments, very schweet. I think I'll test this later. This is a > non-breaking change for 1.3 that should make things a bit faster.

Hydrating is definitely slow, esp. b/c of creole's incessant type checking and the array_key_exists issue (which I fixed for PGSQL, but would still be very slow for the other drivers).

Certainly a C implementation of this would be a great improvement. Seems that you can also specify the class to be created. My only concern would be making sure that the constructor is properly called, but I'd assume it would be.

This looks great.

> 2) Regarding this iterator thing. I guess the best idea would be to > use PDO to populate the results, basically as above. I don't think > we need FETCH_INTO, though. Consider the following situation:>> foreach (BookPeer::doSelect($c) as $book) {> echo $book->getTitle();> }>> With an iterator, each book objekt would be created and returned on > the fly. I realize that there's no way to force the GC to kick in, > but after some rows, it will discard the "old" objects still in > memory (i.e. those who'se titles have been echoed already).

Actually, I think that GC in php is not an out-of-band process... from my testing, as soon as the last reference to an object goes out of scope, it gets GC'd. PHP has no threads, so without an "inline" GC, GC would basically be pointless. It works very differently from Java GC.

All the iterator would do is simply defer hydration until the row is accessed. Also, in this case we wouldn't keep an internal array of objects, o/w the individual rows wouldn't be GC'd due to the internal reference.

The use case should be:

1. If you really want an array of all results, use doSelect() and get the completely populated array (fine for known small data sets)2. If you are fetching a *huge* data set and want to be thrifty with memory, use a new doSelectPropelRS, and use the iterator interface, which lazy-hydrates. I suppose we could add ArrayAccess interface, too, if the underlying RS supports seeking. Otherwise you'd have to iterate it only. You would know that PropelRS do not keep an internal list of the results, thus being memory-thrify (or maybe we make it an option to the PropelRS constructor?).

Something along those lines... we'd need to get more use cases from users before deciding for sure.

If you are coding efficiently in general, these are basically the 2 camps that all DB access falls into; unbounded, and bounded. Pretty much all web pages will be bounded (using some kind of pagination system) and thus work fine as-is (although faster hydration due to PDO will be nice). Usually it's scripts that operate in unbounded mode, and usually in this case you are only working with 1 row at a time anyway. Any other need is definitely application-specific, and in this case you can code your own or make sure you have enough memory to do what you need to.

> I don't really see a need for FETCH_INTO there!?

Me neither!

Alan

>>> David>>>> Am 04.10.2006 um 11:56 schrieb Sven Tietje:>>> David Zulke wrote:>>> 1) memory consumption is a lot lower, e.g. when iterating over 1000>>> rows to display them, a row is fetched, the data is displayed, and>>> the GC can then go ahead and destroy the object.>>>> Think, that should be an option. Sometimes, i want do iterator over a>> resultset and to write baseobjects with special information into >> an array or>> something like that. PDO handles it the same way - Fetchmode gives >> you>> special options for fetching as object:>>>> http://de.php.net/ma​nual/de/function.pdo​statement-setfetchmo​de.php>> Using PDO::FETCH_CLASS will result into an object-instance for >> each row: 200>> rows in your resultset will give you 200 object-instances.>>>> PDO::FETCH_INTO will give you the same object-instance for each >> row: 200>> rows, but one object-instance.>>>> Having the same feature / option for propel would be nice:>>>> 1. One Instance for all rows>> $resultset = TablePeer::doSelect(new Criteria(), PROPEL::ONE_OBJECT);>>>> foreach ($resultset as $row) {>> echo $row //will give you the same object instance>> }>> >> I would use PROPEL::ONE_OBJECT to publish big resultsets. I don`t >> manipulate>> the object. With $myrow = clone $row it is possible to get my own >> instance>> of the row.>> >> 2. An Instance for each row>> $resultset = TablePeer::doSelect(new Criteria(), >> PROPEL::EACH_OBJECT);>> foreach ($resultset as $row) {>> echo $row //will result in Object#1, Object#2 ..... Object#200>> }>>>> Internally Propel uses clone() to create a single instance of each >> row.>>>>> 2) it opens the door for a potential>>> CategoryPeer::doSele​ctJoinProducts() support, where fetching 1:n>>> relations with a join would become possible.>>>> We should discuss to use>> PDOStatement::setFetchMode ( int PDO::FETCH_CLASS, string >> classname ) or>> PDOStatement::setFetchMode ( int PDO::FETCH_INTO, object object )>> instead of hydrate();>>>> I wrote a little script to test it:>>>> $stmt = $pdo->prepare('select name as `customer.name` from >> customer');>> $stmt->execute(array());>>>> class baseobject /*implements ArrayAccess*/ {>> private $data = array(>> 'customer.name' => null,>> );>>>> // for hydrate>> public function __set($key, $value) {>> $this->data[$key] = $value;>> }>>>> public function setName($value) {>> $this->data['customer.name'] = $value;>> }>>>> public function getName() {>> if (isset($this->da​ta['customer.name'])​) {>> return $this->data['cus​tomer.name'];>> }>>>> }>> }>>>> $blub = new baseobject;>> $stmt->setFetchM​ode(PDO::FETCH_INTO,​ $blub);>>>> $arg = array();>> while ($b = $stmt->fetch()) {>> // $b->setName('blah');>> var_dump($b->getName());>> }>>>> Implement ArrayAccess into an BaseObject could be a possibility, >> too. We>> should discuss it.>>>> Your opinion? I think to use the power of pdo would be nice.>>>> --------------------​--------------------​--------------------​--------->> To unsubscribe, e-mail: dev-unsubscribe@prop​el.tigris.org>> For additional commands, e-mail: dev-help at propel dot tigris dot org>>>>>> --------------------​--------------------​--------------------​---------> To unsubscribe, e-mail: dev-unsubscribe@prop​el.tigris.org> For additional commands, e-mail: dev-help at propel dot tigris dot org>

David Zulke wrote:> My thoughts here:> > 1) I just realized how much overhead we have when fetching results:> > while($row = $stmt->fetch(PDO​::FETCH_NUM)) {> $obj = new $cls();> $obj->hydrate($row);> $results[] = $obj;> }> > We can use PDO::FETCH_CLASS here. PDO will then create a new "Book"> instance and set the fields. We can even pass constructor arguments,> very schweet. I think I'll test this later. This is a non-breaking> change for 1.3 that should make things a bit faster.

Yes, I`d like to. Excatly, what i thought of...

> I don't really see a need for FETCH_INTO there!?

Just an option - i have some pages with many output and overviews. I, icould use arrays, but losing some methods as described in my Person-Examplewould be a mess to me :D

Hans Lellelid wrote:> David Zülke wrote:>>> Isn't Sven saying that FETCH_INTO will keep memory/resource use to a>>> minimum? (Unlike current situation where you would have a new object>>> allocated for each row returned.)>>>> Right now, this is the case, but with iterators, the GC would kick in>> sooner or later and free the memory. We'd need two fetch modes then, one>> for iterators (default) and another for the oldschool array approach. Of>> course, we could always add FETCH_INTO support should we discover that>> PHP sucks and doesn't free the previously allocated memory (but come on,>> what are the odds of PHP sucking, really? harhar)>>> > Ok, fair enough. One thing to bear in mind is that in 1.3 & trunk we> are also implementing the identify mapping for objects. I think this> code is already in 1.3 -- and if not, it will be soon.> > The purpose of that, of course, is that we'd like> retrieveByPK()/doSelect() to always return the same object instances if> they've already been fetched. I'm assuming, though, that this will> effectively prevent any GC of those objects. And I'm not sure what the> FETCH_INTO implications would be ... but that might cause some crazy> problems :)>

It looks like this code has not been applied yet to 1.3 branch. Whatthe patch will do is modify populateObjects() method to first check fora previously cached version of the object.

Obviously if we use Alan's approach to large object sets (which is alsowhat I do), this object caching will work fine (since no cache would becreated).

This was something that has been requested several times -- and I thinkit makes sense -- so assuming there aren't any problems with this, I cantry to add that code in later today.

Hans Lellelid wrote:> Isn't Sven saying that FETCH_INTO will keep memory/resource use to a> minimum? (Unlike current situation where you would have a new object> allocated for each row returned.)

Yes, just would need the options of one single instance for the output ofdata. Doing it with hydrate or FETCH_INTO -> that doesn`t matter.

Example:

Class BasePerson { public function getName() {

}

public function getFirstname() {

}}

Now, i wanna publish a List of Person with their fullnames. I don`t want tochange data or something -> just want to print table containting thefullname and some columns additional information. I`ll implement a methodgetFullname in my Object-Class:

David Zülke wrote:>> Isn't Sven saying that FETCH_INTO will keep memory/resource use to a>> minimum? (Unlike current situation where you would have a new object>> allocated for each row returned.)> > > Right now, this is the case, but with iterators, the GC would kick in> sooner or later and free the memory. We'd need two fetch modes then, one> for iterators (default) and another for the oldschool array approach. Of> course, we could always add FETCH_INTO support should we discover that> PHP sucks and doesn't free the previously allocated memory (but come on,> what are the odds of PHP sucking, really? harhar)>

Ok, fair enough. One thing to bear in mind is that in 1.3 & trunk weare also implementing the identify mapping for objects. I think thiscode is already in 1.3 -- and if not, it will be soon.

The purpose of that, of course, is that we'd likeretrieveByPK()/doSelect() to always return the same object instances ifthey've already been fetched. I'm assuming, though, that this willeffectively prevent any GC of those objects. And I'm not sure what theFETCH_INTO implications would be ... but that might cause some crazyproblems :)

> Isn't Sven saying that FETCH_INTO will keep memory/resource use to a> minimum? (Unlike current situation where you would have a new object> allocated for each row returned.)

Right now, this is the case, but with iterators, the GC would kick in sooner or later and free the memory. We'd need two fetch modes then, one for iterators (default) and another for the oldschool array approach. Of course, we could always add FETCH_INTO support should we discover that PHP sucks and doesn't free the previously allocated memory (but come on, what are the odds of PHP sucking, really? harhar)

Well, I don't know... but I said that the way propel works now, if you "lazy-hydrate", then even though you return N objects, they are garbage collected properly and thus you only have 1 allocated at a time; it's just a DIFFERENT instance.

I had horrible problems with unbounded memory consumption, even with self-hydrating, because of the way the related items were stored. But since you fixed that, I no longer have the problem and thus a new "PropelResultSet" that had an iterator interface to automatically lazy-hydrate would be a perfect and appropriate improvement.

Alan

On Oct 4, 2006, at 10:56 AM, Hans Lellelid wrote:

> Isn't Sven saying that FETCH_INTO will keep memory/resource use to a> minimum? (Unlike current situation where you would have a new object> allocated for each row returned.)>> Hans>> Alan Pinstein wrote:>> But you still didn't answer my question? What is the benefit or>> appropriate use of FETCH_INTO that otherwise wouldn't work? Why is it>> needed at all?>>>> Alan>>>> On Oct 4, 2006, at 10:48 AM, Sven Tietje wrote:>>>>> Alan Pinstein wrote:>>>>>> hi alan,>>>>>>> Eeeew. FETCH_INTO sounds awful. I cringe at the idea of how>>>> many bugs>>>> this causes with people not understanding references enough and>>>> having FETCH_INTO cause terrible problems. It's 2006, we don't need>>>> to re-use object shells! That is so not OO I don't even know how to>>>> express it.>>>>>>>> FETCH_INTO seems like some odd hack around GC unless I am not >>>> seeing>>>> something.>>>>>>>> Why is it needed?>>>>>> Ok... You are not a friend of FETCH_INTO - just mentioned it for>>> reading and>>> as possible option. If you know what it means: use it, where you >>> need.>>>>>> By default propel should create a new instance for each row.>>>>>>> print "Querying for all properties to geocode...";>>>> $c = new Criteria;>>>> $c->add(MlsPrope​rtyPeer::GEOCODE_SCO​RE, 0); // only encode non- >>>> scored>>>> items>>>> $allPropsRS = MlsPropertyPeer::doS​electRS($c);>>>> print "Done!\n";>>>>>>> $rowCount = 0;>>>> $propArr = array();>>>> while ($allPropsRS->next()) {>>>> $prop = new MlsProperty;>>>> $prop->hydrate($allPropsRS);>>>> // do stuff>>>> $rowCount++;>>>> }>>>>>> Yes, of course it works, but i`m a friend of comfort :D Having a>>> resultset>>> and option to set is nice :D>>>>>> $rs = MlsPropertyPeer::doSelect($c);>>> // now just wanna read with some special calculation functions>>> $rs->setMode(Pro​pel::ONE_OBJECT /*FETCH_INTO*/);>>>>>> foreach ($rs as $row) {>>> //do stuff>>> }>>>>>> Of course, the iterator can work internally the way you mentioned >>> above.>>>>>> Was just an idea to discuss. What do you think of using FETCH_CLASS?>>>>>> Alan, i am not against your hydrate-method - just wanna discuss the>>> implementation of pdo-features.>>>>>> Greets sven>>>>>> --------------------​--------------------​--------------------​-------->>> ->>> To unsubscribe, e-mail: dev-unsubscribe@prop​el.tigris.org>>> For additional commands, e-mail: dev-help at propel dot tigris dot org>>>>>>> --------------------​--------------------​--------------------​--------->> To unsubscribe, e-mail: dev-unsubscribe@prop​el.tigris.org>> For additional commands, e-mail: dev-help at propel dot tigris dot org>>>> --------------------​--------------------​--------------------​---------> To unsubscribe, e-mail: dev-unsubscribe@prop​el.tigris.org> For additional commands, e-mail: dev-help at propel dot tigris dot org>

We can use PDO::FETCH_CLASS here. PDO will then create a new "Book" instance and set the fields. We can even pass constructor arguments, very schweet. I think I'll test this later. This is a non-breaking change for 1.3 that should make things a bit faster.

2) Regarding this iterator thing. I guess the best idea would be to use PDO to populate the results, basically as above. I don't think we need FETCH_INTO, though. Consider the following situation:

foreach (BookPeer::doSelect($c) as $book) { echo $book->getTitle();}

With an iterator, each book objekt would be created and returned on the fly. I realize that there's no way to force the GC to kick in, but after some rows, it will discard the "old" objects still in memory (i.e. those who'se titles have been echoed already).

I don't really see a need for FETCH_INTO there!?

David

Am 04.10.2006 um 11:56 schrieb Sven Tietje:

> David Zulke wrote:>> 1) memory consumption is a lot lower, e.g. when iterating over 1000>> rows to display them, a row is fetched, the data is displayed, and>> the GC can then go ahead and destroy the object.>> Think, that should be an option. Sometimes, i want do iterator over a> resultset and to write baseobjects with special information into an > array or> something like that. PDO handles it the same way - Fetchmode gives you> special options for fetching as object:>> http://de.php.net/ma​nual/de/function.pdo​statement-setfetchmo​de.php> Using PDO::FETCH_CLASS will result into an object-instance for each > row: 200> rows in your resultset will give you 200 object-instances.>> PDO::FETCH_INTO will give you the same object-instance for each > row: 200> rows, but one object-instance.>> Having the same feature / option for propel would be nice:>> 1. One Instance for all rows> $resultset = TablePeer::doSelect(new Criteria(), PROPEL::ONE_OBJECT);>> foreach ($resultset as $row) {> echo $row //will give you the same object instance> }> > I would use PROPEL::ONE_OBJECT to publish big resultsets. I don`t > manipulate> the object. With $myrow = clone $row it is possible to get my own > instance> of the row.> > 2. An Instance for each row> $resultset = TablePeer::doSelect(new Criteria(), PROPEL::EACH_OBJECT);> foreach ($resultset as $row) {> echo $row //will result in Object#1, Object#2 ..... Object#200> }>> Internally Propel uses clone() to create a single instance of each > row.>>> 2) it opens the door for a potential>> CategoryPeer::doSele​ctJoinProducts() support, where fetching 1:n>> relations with a join would become possible.>> We should discuss to use> PDOStatement::setFetchMode ( int PDO::FETCH_CLASS, string > classname ) or> PDOStatement::setFetchMode ( int PDO::FETCH_INTO, object object )> instead of hydrate();>> I wrote a little script to test it:>> $stmt = $pdo->prepare('select name as `customer.name` from customer');> $stmt->execute(array());>> class baseobject /*implements ArrayAccess*/ {> private $data = array(> 'customer.name' => null,> );>> // for hydrate> public function __set($key, $value) {> $this->data[$key] = $value;> }>> public function setName($value) {> $this->data['customer.name'] = $value;> }>> public function getName() {> if (isset($this->da​ta['customer.name'])​) {> return $this->data['cus​tomer.name'];> }>> }> }>> $blub = new baseobject;> $stmt->setFetchM​ode(PDO::FETCH_INTO,​ $blub);>> $arg = array();> while ($b = $stmt->fetch()) {> // $b->setName('blah');> var_dump($b->getName());> }>> Implement ArrayAccess into an BaseObject could be a possibility, > too. We> should discuss it.>> Your opinion? I think to use the power of pdo would be nice.>> --------------------​--------------------​--------------------​---------> To unsubscribe, e-mail: dev-unsubscribe@prop​el.tigris.org> For additional commands, e-mail: dev-help at propel dot tigris dot org>>

Isn't Sven saying that FETCH_INTO will keep memory/resource use to aminimum? (Unlike current situation where you would have a new objectallocated for each row returned.)

Hans

Alan Pinstein wrote:> But you still didn't answer my question? What is the benefit or> appropriate use of FETCH_INTO that otherwise wouldn't work? Why is it> needed at all?>> Alan>> On Oct 4, 2006, at 10:48 AM, Sven Tietje wrote:>>> Alan Pinstein wrote:>>>> hi alan,>>>>> Eeeew. FETCH_INTO sounds awful. I cringe at the idea of how>>> many bugs>>> this causes with people not understanding references enough and>>> having FETCH_INTO cause terrible problems. It's 2006, we don't need>>> to re-use object shells! That is so not OO I don't even know how to>>> express it.>>>>>> FETCH_INTO seems like some odd hack around GC unless I am not seeing>>> something.>>>>>> Why is it needed?>>>> Ok... You are not a friend of FETCH_INTO - just mentioned it for>> reading and>> as possible option. If you know what it means: use it, where you need.>>>> By default propel should create a new instance for each row.>>>>> print "Querying for all properties to geocode...";>>> $c = new Criteria;>>> $c->add(MlsPrope​rtyPeer::GEOCODE_SCO​RE, 0); // only encode non-scored>>> items>>> $allPropsRS = MlsPropertyPeer::doS​electRS($c);>>> print "Done!\n";>>>>> $rowCount = 0;>>> $propArr = array();>>> while ($allPropsRS->next()) {>>> $prop = new MlsProperty;>>> $prop->hydrate($allPropsRS);>>> // do stuff>>> $rowCount++;>>> }>>>> Yes, of course it works, but i`m a friend of comfort :D Having a>> resultset>> and option to set is nice :D>>>> $rs = MlsPropertyPeer::doSelect($c);>> // now just wanna read with some special calculation functions>> $rs->setMode(Pro​pel::ONE_OBJECT /*FETCH_INTO*/);>>>> foreach ($rs as $row) {>> //do stuff>> }>>>> Of course, the iterator can work internally the way you mentioned above.>>>> Was just an idea to discuss. What do you think of using FETCH_CLASS?>>>> Alan, i am not against your hydrate-method - just wanna discuss the>> implementation of pdo-features.>>>> Greets sven>>>> --------------------​--------------------​--------------------​--------->> To unsubscribe, e-mail: dev-unsubscribe@prop​el.tigris.org>> For additional commands, e-mail: dev-help at propel dot tigris dot org>>>> --------------------​--------------------​--------------------​---------> To unsubscribe, e-mail: dev-unsubscribe@prop​el.tigris.org> For additional commands, e-mail: dev-help at propel dot tigris dot org>

But you still didn't answer my question? What is the benefit or appropriate use of FETCH_INTO that otherwise wouldn't work? Why is it needed at all?

Alan

On Oct 4, 2006, at 10:48 AM, Sven Tietje wrote:

> Alan Pinstein wrote:>> hi alan,>>> Eeeew. FETCH_INTO sounds awful. I cringe at the idea of how>> many bugs>> this causes with people not understanding references enough and>> having FETCH_INTO cause terrible problems. It's 2006, we don't need>> to re-use object shells! That is so not OO I don't even know how to>> express it.>>>> FETCH_INTO seems like some odd hack around GC unless I am not seeing>> something.>>>> Why is it needed?>> Ok... You are not a friend of FETCH_INTO - just mentioned it for > reading and> as possible option. If you know what it means: use it, where you need.>> By default propel should create a new instance for each row.>>> print "Querying for all properties to geocode...";>> $c = new Criteria;>> $c->add(MlsPrope​rtyPeer::GEOCODE_SCO​RE, 0); // only encode non-scored>> items>> $allPropsRS = MlsPropertyPeer::doS​electRS($c);>> print "Done!\n";>>> $rowCount = 0;>> $propArr = array();>> while ($allPropsRS->next()) {>> $prop = new MlsProperty;>> $prop->hydrate($allPropsRS);>> // do stuff>> $rowCount++;>> }>> Yes, of course it works, but i`m a friend of comfort :D Having a > resultset> and option to set is nice :D>> $rs = MlsPropertyPeer::doSelect($c);> // now just wanna read with some special calculation functions> $rs->setMode(Pro​pel::ONE_OBJECT /*FETCH_INTO*/);>> foreach ($rs as $row) {> //do stuff> }>> Of course, the iterator can work internally the way you mentioned > above.>> Was just an idea to discuss. What do you think of using FETCH_CLASS?>> Alan, i am not against your hydrate-method - just wanna discuss the> implementation of pdo-features.>> Greets sven>> --------------------​--------------------​--------------------​---------> To unsubscribe, e-mail: dev-unsubscribe@prop​el.tigris.org> For additional commands, e-mail: dev-help at propel dot tigris dot org>

> Eeeew. FETCH_INTO sounds awful. I cringe at the idea of how> many bugs> this causes with people not understanding references enough and> having FETCH_INTO cause terrible problems. It's 2006, we don't need> to re-use object shells! That is so not OO I don't even know how to> express it. > > FETCH_INTO seems like some odd hack around GC unless I am not seeing> something. > > Why is it needed?

Ok... You are not a friend of FETCH_INTO - just mentioned it for reading andas possible option. If you know what it means: use it, where you need.

Eeeew. FETCH_INTO sounds awful. I cringe at the idea of how many bugs this causes with people not understanding references enough and having FETCH_INTO cause terrible problems. It's 2006, we don't need to re-use object shells! That is so not OO I don't even know how to express it.

FETCH_INTO seems like some odd hack around GC unless I am not seeing something.

This doesn't consume unbounded memory. I'd think that if we create a new "PropelResultSet" that when iterated provides hydrated objects, that it will have the same effect as my code above and be nice and clean and memory-thrifty.

PREVIOUSLY there was some really hairy internal reference-mongering that causes GC to not work properly, but I complained enough :) and Hans (I think) fixed it. So now it works beautifully.

Right? What am I missing?

Alan

On Oct 4, 2006, at 5:56 AM, Sven Tietje wrote:

> David Zulke wrote:>> 1) memory consumption is a lot lower, e.g. when iterating over 1000>> rows to display them, a row is fetched, the data is displayed, and>> the GC can then go ahead and destroy the object.>> Think, that should be an option. Sometimes, i want do iterator over a> resultset and to write baseobjects with special information into an > array or> something like that. PDO handles it the same way - Fetchmode gives you> special options for fetching as object:>> http://de.php.net/ma​nual/de/function.pdo​statement-setfetchmo​de.php> Using PDO::FETCH_CLASS will result into an object-instance for each > row: 200> rows in your resultset will give you 200 object-instances.>> PDO::FETCH_INTO will give you the same object-instance for each > row: 200> rows, but one object-instance.>> Having the same feature / option for propel would be nice:>> 1. One Instance for all rows> $resultset = TablePeer::doSelect(new Criteria(), PROPEL::ONE_OBJECT);>> foreach ($resultset as $row) {> echo $row //will give you the same object instance> }> > I would use PROPEL::ONE_OBJECT to publish big resultsets. I don`t > manipulate> the object. With $myrow = clone $row it is possible to get my own > instance> of the row.> > 2. An Instance for each row> $resultset = TablePeer::doSelect(new Criteria(), PROPEL::EACH_OBJECT);> foreach ($resultset as $row) {> echo $row //will result in Object#1, Object#2 ..... Object#200> }>> Internally Propel uses clone() to create a single instance of each > row.>>> 2) it opens the door for a potential>> CategoryPeer::doSele​ctJoinProducts() support, where fetching 1:n>> relations with a join would become possible.>> We should discuss to use> PDOStatement::setFetchMode ( int PDO::FETCH_CLASS, string > classname ) or> PDOStatement::setFetchMode ( int PDO::FETCH_INTO, object object )> instead of hydrate();>> I wrote a little script to test it:>> $stmt = $pdo->prepare('select name as `customer.name` from customer');> $stmt->execute(array());>> class baseobject /*implements ArrayAccess*/ {> private $data = array(> 'customer.name' => null,> );>> // for hydrate> public function __set($key, $value) {> $this->data[$key] = $value;> }>> public function setName($value) {> $this->data['customer.name'] = $value;> }>> public function getName() {> if (isset($this->da​ta['customer.name'])​) {> return $this->data['cus​tomer.name'];> }>> }> }>> $blub = new baseobject;> $stmt->setFetchM​ode(PDO::FETCH_INTO,​ $blub);>> $arg = array();> while ($b = $stmt->fetch()) {> // $b->setName('blah');> var_dump($b->getName());> }>> Implement ArrayAccess into an BaseObject could be a possibility, > too. We> should discuss it.>> Your opinion? I think to use the power of pdo would be nice.>> --------------------​--------------------​--------------------​---------> To unsubscribe, e-mail: dev-unsubscribe@prop​el.tigris.org> For additional commands, e-mail: dev-help at propel dot tigris dot org>

David Zulke wrote:> 1) memory consumption is a lot lower, e.g. when iterating over 1000> rows to display them, a row is fetched, the data is displayed, and> the GC can then go ahead and destroy the object.

Think, that should be an option. Sometimes, i want do iterator over aresultset and to write baseobjects with special information into an array orsomething like that. PDO handles it the same way - Fetchmode gives youspecial options for fetching as object:

If you're concerned about hydrate performance, you might want to check out the patch I recently committed to creole. It does 2 things:

1. move the require_once statements to the top of the file so that they are called exactly once, rather than N times.2. replace array_key_exists() with isset() to test for column existence; it's much faster. Done only for ResultSetCommon and PGSQL. Would need to be "cloned" to other drivers too, but shouldn't cause any breakage as-is. Also changes an is_int() call for a simple var test, which is marginally faster.

These two things made a pretty big difference for me.

The more columns your table has, the more impact #2 has, b/c array_key_exists() was being called for EACH COLUMN, and it's really slow.

NOTE: I patched only the PGSQL version for #2 since it's the only one I use. But it's very little code; you could patch another driver in 5 minutes by following the diff from the root of the trunk folder:

svn diff -r 53 .

Alan

On Oct 3, 2006, at 8:13 AM, David Zülke wrote:

> Sven,>> I looked at your suggestion in detail and I believe it makes a lot > of sense.>> I think on-the-fly fetching of result sets should result in roughly > the same performance as the traditional array population method, > but it will have to very significant advantages:>> 1) memory consumption is a lot lower, e.g. when iterating over 1000 > rows to display them, a row is fetched, the data is displayed, and > the GC can then go ahead and destroy the object.> 2) it opens the door for a potential > CategoryPeer::doSele​ctJoinProducts() support, where fetching 1:n > relations with a join would become possible.>> David>>> Am 02.10.2006 um 10:43 schrieb Sven Tietje:>>> Hi,>>>> i`d like to create a PropelResultset to make it possible to >> iterate over>> Resultsets and to hydrate objects on the fly. We have talked about >> it.>>>> One of propel`s performance problems are the Result-Arrays.>>>> $resultArrray = TablePeer::doSelect(new Criteria());>>>> doSelect queries the database. A Resultset is given. Now, Propel >> fetches>> each row of the resultset, hydrates the row into an object and >> puts the>> object into an array. Having fetched all rows, doSelect returns the>> array.>>>> I think BasePeer should return a PropelResultset.>>>> class PropelResultset implements IteratorAggregate, ArrayAccess,>> Countable {>> public function setFetchMode();>>>> .... Interface - Methods>> }>>>> - ArrayAccess and Countable will make PropelResultset>> backwards-compatible to the normal array.>>>> - setFetchMode will give you two possiblities:>>>> 1) PropelResultset::OBJECT - enabled by default>> $resultset->setF​etchMode(PropelResul​tset::OBJECT);>> foreach ($resultset as $item) {>> $item->getPrimaryKey();>> $item->set...;>> $item->save();>> }>>>> 2) PropelResultset::ARRAY - sometimes you don`t need objects>> $resultset->setF​etchMode(PropelResul​tset::ARRAY);>> foreach ($resultset as $item) {>> $item[TablePeer::ID];>> $item[TablePeer::TITLE];>> }>> Key of the $item - Array is of structure:>> tablename.columname or tablealias.columnname>>>> Perhaps we also should discuss a BaseObject implementing ArrayAccess>> directly.>>>>>>>> --------------------​--------------------​--------------------​--------->> To unsubscribe, e-mail: dev-unsubscribe@prop​el.tigris.org>> For additional commands, e-mail: dev-help at propel dot tigris dot org>>>>>> --------------------​--------------------​--------------------​---------> To unsubscribe, e-mail: dev-unsubscribe@prop​el.tigris.org> For additional commands, e-mail: dev-help at propel dot tigris dot org>

I looked at your suggestion in detail and I believe it makes a lot of sense.

I think on-the-fly fetching of result sets should result in roughly the same performance as the traditional array population method, but it will have to very significant advantages:

1) memory consumption is a lot lower, e.g. when iterating over 1000 rows to display them, a row is fetched, the data is displayed, and the GC can then go ahead and destroy the object.2) it opens the door for a potential CategoryPeer::doSele​ctJoinProducts() support, where fetching 1:n relations with a join would become possible.

David

Am 02.10.2006 um 10:43 schrieb Sven Tietje:

> Hi,>> i`d like to create a PropelResultset to make it possible to iterate > over> Resultsets and to hydrate objects on the fly. We have talked about it.>> One of propel`s performance problems are the Result-Arrays.>> $resultArrray = TablePeer::doSelect(new Criteria());>> doSelect queries the database. A Resultset is given. Now, Propel > fetches> each row of the resultset, hydrates the row into an object and puts > the> object into an array. Having fetched all rows, doSelect returns the> array.>> I think BasePeer should return a PropelResultset.>> class PropelResultset implements IteratorAggregate, ArrayAccess,> Countable {> public function setFetchMode();>> .... Interface - Methods> }>> - ArrayAccess and Countable will make PropelResultset> backwards-compatible to the normal array.>> - setFetchMode will give you two possiblities:>> 1) PropelResultset::OBJECT - enabled by default> $resultset->setF​etchMode(PropelResul​tset::OBJECT);> foreach ($resultset as $item) {> $item->getPrimaryKey();> $item->set...;> $item->save();> }>> 2) PropelResultset::ARRAY - sometimes you don`t need objects> $resultset->setF​etchMode(PropelResul​tset::ARRAY);> foreach ($resultset as $item) {> $item[TablePeer::ID];> $item[TablePeer::TITLE];> }> Key of the $item - Array is of structure:> tablename.columname or tablealias.columnname>> Perhaps we also should discuss a BaseObject implementing ArrayAccess> directly.>>>> --------------------​--------------------​--------------------​---------> To unsubscribe, e-mail: dev-unsubscribe@prop​el.tigris.org> For additional commands, e-mail: dev-help at propel dot tigris dot org>>