Are you just doing this for rendering purposes? For stuff like weighed verticies and collision detection, you might need to calculate your own matrix, but for the rendering you should let openGL do it. Right now only GeForce would accelerate it but you still want to keep things in the GL pipeline if possible.

If the order isn't extremely important, you just do the transformation and rotations at the arbitrary point BEFORE your other transformations:

and this will achieve the same effect. If you absolutely positively have to rotate around the point after the fact, I believe you could load the matrix into memory, loadIdentity(), perform transformation/ rotation, and then multiply the matrix by the one stored in memory(although I haven't tried it out.)