FBO rendering to GL_ALPHA texture

Right,
Short question: I'd like to render an alpha buffer to a GL_ALPHA texture. Basically, right now I'm blasting quads textured with an GL_ALPHA texture to screen, but I'd like to cache those results to another GL_ALPHA texture.

Now, I've read that FBO attachments to alpha textures are no-go (or were in 2007) â€“ does anyone have a better idea?

These are my ideas at the moment:
1) ditch the caching altogether and keep running quads to screen.
2) use glCopyTexSubImage2D to assemble the texture. (Assembly speed is not critical â€“ this'll happen on average every three frames or so.)
3) suck the sour and run GL_RGBA textures and thereby inflating VRAM use by 400%

I'm leaning towards 2) here, but I thought I'd ask around and see if you 1337s have a better idea?

I have never stress-tested glCopyTexSubImage2D in this way, how slow is it really? A typical usage would be 100 calls of 50x50 quads to a 256x256 texture â€“ it sounds expensive to me?

Is that 100 256x256 textures being rendered or just 1 that has 100 quads drawn in it? A 256x256 texture is tiny. I can't imagine that running that every few frames is going to be even measurable to use glCopyTexSubImage2D().

The biggest benefit of glCopyTexSubImage2D() is that it's super simple to use without needed to set up extra buffers or anything. It should be easy to hack together something to see.

The performance lab results are back, and they're sweet.
First off, of course I meant glTexSubImage2D since I have the source pixels in-memory.

I simply ran a test where I try out 3 turns of 100 copies of 50x50 random pixel data into a texture. This would simulate doing the above, but three times a frame instead of every other frame (I aimed for the worst case).

This takes, on average, 2ms for 100 copies. If ever I wanted to say negligible, this is the occasion.

However, it gets better: turning on DMA transfers using glPixelStorei(GL_UNPACK_CLIENT_STORAGE_APPLE, GL_TRUE); pushes that down below half a millisec for 100 copies.

I was going to hold off talking about it until I had something to show for it, but it's text related: I'm storing characters as alpha textures and assembling them into string textures. It's working hell of good.

Progress bar: [#########-]
I hit a driver bug that put fugly artifacts into my glTexSubImage2D copies (Radar #6932125) that had me write a workaround. Instead of uploading a clear texture and copying into it, I assemble the texture in-memory in a block of unsigned bytes, and upload that once it is done. This way, each newly uploaded glyph takes nothing more than a bastard sibling of memcpy, and I only have to stall the pipeline once (when the texture is ready).

The result is ridiculously fast - fast enough that I can run most of these operations in real-time, per frame without any real performance hits.

Just wanted to share that a driver bug can inspire that mythic algorithm change that gives a factor 10 speedup.