For OGE, as we require high performance (it is for games), ive worked a fair bit on creating some atomic operations classes and a lock free queue as locks/mutexs are very expensive.
Looking through POCOs code, mutexs are used alot.

I have an AtomicOp template class which takes a type and inherits from AtomicOpBase. This takes a type and a size. each of the specialisations are for a given size (8, 16, 32 bits) and each use x86 assembly for the atomic code (operations like compare and swap, add, exchange and add, etc etc). The AtomicOp class automatically selects the correct base class dependant on the types size.

I also have an Atomic template class which allows you to store/modify etc a value inside it atomically. This is especially usefull for lock free, thread safe reference counting.

Lastly, i have an AtomicQueue, which is based on a various sources from the net (game programming gems code, and various other sites). ive tested this quite a bit, and had 100 threads pushing values onto it whilst another 100 threads were reading from it. at the end, no values were found to be corrupted or duplicated and no segfaults occurred etc. so i think it works fine.

Anyway, as OGE is now using POCO, we have discussed contributing our code so that POCO can benefit if you accept it. I havent looked at all of POCOs code, but a few places that might benefit is the SharedPtr class for reference counting (no mutexs/locks), and possibly the RWLock code. I might be wrong about this last one, but from a brief thought about it, could it be done using 1 mutex (only for write locking) and an Atomic< Int32 >? such as 0 is no locks, positive is read locks, negative 1 is a write lock (and then you lock the mutex).

The code is missing some parts that you would need. i havent developed this for GCC/MingW (sorry if i got this wrong, i only use VC), this requires a different way of writing the assembly? The code also needs to be formatted into the same style as POCO.

would you accept this contribution? and what do i need to do, such as should i (try to) format the code etc i would try to do as much as i can, although i do lack time.

> would you accept this contribution? and what do i need to do, such as should i (try to) format the code etc i would try to do as much as i can, although i do lack time.

Chris,

With due respect to your effort, the way I see it, such contribution could, at best, make it into contrib directory. The reason is assembly - introducing your code into POCO would make it non-portable.

> > would you accept this contribution? and what do i need to do, such as should i (try to) format the code etc i would try to do as much as i can, although i do lack time.
>
> Chris,
>
> With due respect to your effort, the way I see it, such contribution could, at best, make it into contrib directory. The reason is assembly - introducing your code into POCO would make it non-portable.

non-portable, but you would have to write implementations for different architectures, just like youve already got for different operating systems.
am i right in thinking that the contrib directory is just for additions? and wouldnt/couldnt be used anywhere else? such as SharedPtr etc etc.

if it cant be then we will have to branch off POCO tbh (if its allowed by licence?) because we require high performance. obviously a large use of shared pointers or read/write locks are going to have a significant performance hit without lock free code. we do really want to continue using POCO obviously as its alot easier to use than boost in many ways.

> non-portable, but you would have to write implementations for different architectures

Yes, but the problem is the 'you' guy. We do not have enough resources for such venture.

>just like youve already got for different operating systems.

It's a bit more complex than that. Currently, 'different operating systems' pretty much boil down to Windows and POSIX. But if you get on a lower level, then (in addition to differences in assembler - yes, there are differences even when you write assembly for the same CPU) you have to deal with x86, PowerPC, Sparc, ARM ... etc. I don't know the details of your implementation, but 32/64 bit probably becomes an issue as well. IMO, way too much to throw in without a firm maintenance commitment.

> am i right in thinking that the contrib directory is just for additions? and wouldnt/couldnt be used anywhere else? such as SharedPtr etc etc.

Yes, contrib is for stuff people contribute but is not officially supported and not part of mainstream code.

> if it cant be then we will have to branch off POCO tbh (if its allowed by licence?) because we require high performance. obviously a large use of shared pointers or read/write locks are going to have a significant performance hit without lock free code. we do really want to continue using POCO obviously as its alot easier to use than boost in many ways.

You can fork, but If I were you, I'd run some comparison tests before making the decision. Also, keep in mind that premature optimization is the root of all evil and it is much, much easier to make the correct and slow code fast than it is to make the fast and incorrect code correct.

BTW, recently we have been tinkering with the idea of providing Intel TBB support, so if you have any interest contributing there, let us know.

A few months ago James Mansion has sent me some code implementing atomic ops for various platforms, in part the HP AtomicOps library, assembly, and platform-specific APIs where available. I'd like to incorporate this into POCO at some time, but haven't had the time so far. If anyone wants to do this, I can make James' code available, but to incorporate this into POCO, this must be tested on all platforms supported by POCO. And this is a lot of work.

> Yes, but the problem is the 'you' guy. We do not have enough resources for such venture.

yea i understand that, we have the same problems.

> You can fork, but If I were you, I'd run some comparison tests before making the decision. Also, keep in mind that premature optimization is the root of all evil and it is much, much easier to make the correct and slow code fast than it is to make the fast and incorrect code correct.

even though we havent done any performance tests on POCO itself, performance hits from locking has already been seen in OGE and 1 thats very noticable was OGRE. they use a mutex for their shared pointer class which caused a very significant drop in FPS. We are using threading alot heavier than they are (obviously as we are a game engine, they are a graphics engine). we also use them in alot of other places, such as message queues. i dont really think theres any need to measure the performance issues as they are quite obvious.

>
> BTW, recently we have been tinkering with the idea of providing Intel TBB support, so if you have any interest contributing there, let us know.

i took a look at that quite a while ago, but now it seems quite good, as their licence is compatible (i think? im not great with licence details). by support, do you mean that you might use it throughout POCO or have an addon type project?

> > BTW, recently we have been tinkering with the idea of providing Intel TBB support, so if you have any interest contributing there, let us know.

> i took a look at that quite a while ago, but now it seems quite good, as their licence is compatible (i think? im not great with licence details). by support, do you mean that you might use it throughout POCO or have an addon type project?

License is fine. TBB has atomic ops. What we have discussed so far was related to add-on type of the project (a separate library). As far as hardware, the last I've read was that it has support for IA 32/64, Itanium and minimal support for 64-bit Power G5. Sun is porting it to Sparc, but I don't know how far did that effort go. Again, if you are interested, this would be a nice project/contribution. I think is much more realistic than replacing all the mutexes in POCO with lock-free operations. Or forking, for that matter.

> License is fine. TBB has atomic ops. What we have discussed so far was related to add-on type of the project (a separate library).

how could it be seperate and have POCO use it? such as reference counting in SharedPtr and the RWLock etc

> As far as hardware, the last I've read was that it has support for IA 32/64, Itanium and minimal support for 64-bit Power G5. Sun is porting it to Sparc, but I don't know how far did that effort go. Again, if you are interested, this would be a nice project/contribution. I think is much more realistic than replacing all the mutexes in POCO with lock-free operations. Or forking, for that matter.

i didnt mean all mutexs, thats impossible (or very bad), i just mean key places would benefit alot from using atomic operations, or (with TBB) even faster mutexs (such as spin mutex which busy waits, used in code that needs to lock a few instructions only).

> > License is fine. TBB has atomic ops. What we have discussed so far was related to add-on type of the project (a separate library).

> how could it be seperate and have POCO use it? such as reference counting in SharedPtr and the RWLock etc

It would be separate because none of the POCO libraries would directly depend on TBB, at least not until TBB is fully portable across all the supported hardware. Applications with need for threading would have a choice to use POCO threading facilities, go TBB way or a combination of the two. Note that TBB name is misleading because it is not just a threading library. Actually, you can't even get a hold of a thread in TBB. TBB operates on the higher level of abstraction, through tasks, parallel algorithms/containers and provides a granular access to some handy utilities such as atomic operations, mutexes and scalable memory allocators. Programming to benefit from multi CPU facilities is a paradigm shift from serial to parallel computing and it requires different approach to application development. To use it properly, you have to find parallelizing opportunities in your functionality, organize your code accordingly and let the low level library code distribute it optimally for the underlying hardware. You may have to do some design rethinking but, if you are designing a game engine for the future, TBB would probably be a good way to go. If Alex Stepanov praises it ("Threading Building Blocks... could become a basis for the concurrency dimension of the C++ standard library."), there's something there. I can't really do it justice here, so to get properly informed you should do some research of your own. A good starting point would be to look into James Reinders' [url=http://www.ddj.com/linux-open-source/201200614|interview] and http://shop.intel.com/shop/product.aspx?pid=SISW4001|book]. There's also a [http://en.wikipedia.org/wiki/Intel_Threading_Building_Blocks]Wikipedia entry[/url on TBB (it even mentions POCO).

Now about the classes you are mentioning:

The only place where RWLock is currently used is ))TextEncoding((. ))SharedPtr(( is used in caching and tuple classes. So, if those two are your stumbling block, then steer clear from classes using them and implement your own, lock free, versions. If you want to use ))AutoPtr((, make your own lock free reference counted class, do not use the one provided by poco, etc...

> i didnt mean all mutexs, thats impossible (or very bad), i just mean key places would benefit alot from using atomic operations, or (with TBB) even faster mutexs (such as spin mutex which busy waits, used in code that needs to lock a few instructions only).

Implementing your own ))SharedPtr(( and ))RefCounted(( objects should be quite straightforward. POCO may be lock-free there some day, but I don't think we are ready for that yet.

I took a quick look at TBB. The atomic template seems simple enough to use.

Also the classes that Alex mentions are probably trivial to change.
You could try and update SharedPtr/RefCountedObject to use atomic.
Also AutoPtr and SharedPtr are both rather stable now, so it is unlikely that
we will change these classes in the near feature, so integration in a new POCO release
should be trivial.

Changing these two files should give you a nice speed boost (and I am really interested
in how much it will improve) throughout the whole POCO libs.

About mutex use: we try to provide a thread-safe library, so where necessary
we do a lock, but we try to keep lock times minimal.

For Cache, extensibility was more important than design. Cache is thread-safe and relies heavily on events. Events themselves are also thread-safe due to their usage nature (add new delegates while a notify is running, the add doesn't block btw but this requires you to copy the delegates set for each notify). If you don't require these features, it is better to write your own specialized, faster version.

I'm not sure how much performance you would gain by exchanging the fastMutex class with the tpp mutex class. Benchmarks are welcome here, though.

If performance gains are truly large, we can integrate it in 1.4. I am thinking about a compile time flag to enable/disable tpp.

br

Peter

> > > License is fine. TBB has atomic ops. What we have discussed so far was related to add-on type of the project (a separate library).
>
> > how could it be seperate and have POCO use it? such as reference counting in SharedPtr and the RWLock etc
>
> It would be separate because none of the POCO libraries would directly depend on TBB, at least not until TBB is fully portable across all the supported hardware. Applications with need for threading would have a choice to use POCO threading facilities, go TBB way or a combination of the two. Note that TBB name is misleading because it is not just a threading library. Actually, you can't even get a hold of a thread in TBB. TBB operates on the higher level of abstraction, through tasks, parallel algorithms/containers and provides a granular access to some handy utilities such as atomic operations, mutexes and scalable memory allocators. Programming to benefit from multi CPU facilities is a paradigm shift from serial to parallel computing and it requires different approach to application development. To use it properly, you have to find parallelizing opportunities in your functionality, organize your code accordingly and let the low level library code distribute it optimally for the underlying hardware. You may have to do some design rethinking but, if you are designing a game engine for the future, TBB would probably be a good way to go. If Alex Stepanov praises it ("Threading Building Blocks... could become a basis for the concurrency dimension of the C++ standard library."), there's something there. I can't really do it justice here, so to get properly informed you should do some research of your own. A good starting point would be to look into James Reinders' [url=http://www.ddj.com/linux-open-source/201200614|interview] and http://shop.intel.com/shop/product.aspx?pid=SISW4001|book]. There's also a [http://en.wikipedia.org/wiki/Intel_Threading_Building_Blocks]Wikipedia entry[/url on TBB (it even mentions POCO).
>
> Now about the classes you are mentioning:
>
> The only place where RWLock is currently used is ))TextEncoding((. ))SharedPtr(( is used in caching and tuple classes. So, if those two are your stumbling block, then steer clear from classes using them and implement your own, lock free, versions. If you want to use ))AutoPtr((, make your own lock free reference counted class, do not use the one provided by poco, etc...
>
> > i didnt mean all mutexs, thats impossible (or very bad), i just mean key places would benefit alot from using atomic operations, or (with TBB) even faster mutexs (such as spin mutex which busy waits, used in code that needs to lock a few instructions only).
>
> Implementing your own ))SharedPtr(( and ))RefCounted(( objects should be quite straightforward. POCO may be lock-free there some day, but I don't think we are ready for that yet.
>
> Alex