I have some notes regarding it to my special usage - Implementing Algorithms for Photoshop / After Effects.

I'm not a programmer so I thought this would be the right tool for me, just taking care of the Algorithm.

I don't have a deep knowledge of what is possible using GPU's and what's not, All written here is in the spirit of the limitations you made clear in the PDF's of PB. Anyhow, Looking on the supported GPU's by you it seems you support pretty old GPU's. Since it's an experimental project for now and the big changes GPU had in the last 2 years you might want to reconsider dropping support for anything below DX10 to give PB all the features and flexibility it should have.

The Notes:

1. Hidden Codes - I know it is possible (Oil Painting Plug In). This should be a first priority. If I got is write, PB Kernels for Flash are compiled into some kind of a binary form. Let's use that, Let the Photoshop / AE Plug In read those files.

2. CPU Precomputing - There are some computations which suit the CPU much better than the GPU. Could the "Calculate Dependencies" concept be expanded? Let's say compute some (Small) data which later be used by Pixel Bender. A simple case would be Face Recognition. Let's say I could create a Gray Mask (Faces would be brighter grays the rest black...) on the CPU and then this mask would be passed as a source image to pixel bender. The CPU procedure could be created using AS3 or something. I know it is easy to do in Flash, yet what about Photoshop / After Effects?

3. Static Node / Loop of Nodes- Let's say I convert the image into LAB Colorspace. Now I apply an Algorithm, Convert it back into RGB and that's it. Let's say the user have 3 sliders controlling some parameters of the Algorithm applied on the LAB image. Now each time the user moves a slider the whole process is being calculated. The solution should be letting us define "Static Nodes", Which means they are calculated only once. Their result is saved for later use. They won't be regenerated. In the example above, The node which Convert the image into LAB would calculated once and that's it. This is a simple example, but there are decompositions of images which are very demanding and needed to be calculated only once. A generalization of this would be Looping over nodes. Which means do this node till something happens. This would make iterative algorithms possible in PB.

4. Changing Graphs Connections on The Fly - In a later note I'll ask for an UI improvements - Radio Buttons. Let's say I built a Graph with 2 different Local Contrast Algorithms. I allow the user to chose one of them (Radio Buttons). I want to change the graph connections according to his choice. The way things now I have to calculate both and use a boolean variable to chose one of them.

5. Performance Measure - We need some way to measure if the optimizations we do really improves performances. I know there's an option to see the FPS. I don't how accurate this is. Could be other way?

9. Vector Casting - Casting from Bool / Int / Float Vectors into Bool / Int / Float in one line. Sometimes it is needed to cast each item in the vector individually.

10. Defining DOD() - Sometimes Auxiliary images are needed. Let us create a DOD(). Define an array of RGBA in the size we want. This would make Resizing algorithms possible in PB. This could be even some kind of Auxiliary matrix for various calculations. Moreover, it's needed for many other uses (Building Matrices for other calculations such as Least Squares methods etc...).

11. Sample Nearest Mode B - In the current method ("Sample Nearest Mode A") if the Kernel access out of the DOD() pixel it gets (0, 0, 0, 0) as a result. In Mode B I would like it to actually return the value of the nearest pixel (Euclidean Distance), Basically "Padding" the image with its border. This would result in a much better Convolutions results.

12. Advanced Math Tools - Many advanced algorithms requires advanced Math Tools. Could you create the flexibility needed to run them. Today there are many ways to run highly parrarlized algorithms of Matrix Algebra and Optimizations problems on the GPU:

Do you mean protecting your Pixel Bender filters? We know that this is a priority for developers and are working on some ideas that would allow you to do this. The difficult part is getting something that works across multiple Adobe applications in a simple way. A binary representation of Pixel Bender files wouldn't be sufficient because it could get decompiled simply. We're still thinking through this.

This is a good suggestion. We have been thinking about ways of combining more general purpose languages with Pixel Bender that would still be high-performance.

The static node case is already handled within the Pixel Bender run-time. We will cache values at a node and will not update that node unless one of its parameters changes or one of the nodes connected to its inputs change. Looping is difficult to handle with region reasoning logic, but we are definitely aware that it would be useful to support.

Good suggestion. This has come up before as a request. Again, it is a region reasoning issue, but it is certainly do-able.

Testing GPU performance is particularly problematic as it is an external processor. Our GPU FPS counts are currently approximations. Our CPU performance measurements should be completely accurate however. I've thought in the past about specific performance testing options that would be similar to how my team does performance measurements of filters, but it seemed like an obscure feature for most users. Glad to hear that you would be interested.

There is the ability in Pixel Bender now to tag a float2 parameter as corresponding to the input size of the original image. The toolkit supports it, but AE and PS do not currently. We're working with them to support that. Something to remember though is that the original image size may not correspond to the image that you are receiving at your inputs because of region reasoning. Min/Max/Mean values are good ideas. There is the notion of the original image versus the image at the inputs of the filter however. Need to think about this a bit more.

Yup. Agreed.

We currently have the enum parameter semantic hinting metadata. In the toolkit I show that as a drop-down, but it could be shown as radio buttons.

Good suggestion.

Interesting suggestion.

Padding or clamping? GPUs support clamping (sampling off the image gives the color of the row or edge pixel). That would be more possible than actually computing distance. That could be a big performance hit.

1. For now, even a Binary Format is better than nothing. If it is doable at the near future (Month or so) It would be great till you get the "Optimal Solution".

3. Caching or "Calculate Only Once" flag would be great. Looping would be awesome. I don't have any knowledge in GPU's but since I can do it manually (Create few nodes which refer to the same kernel) couldn't it be done naively - I write the number of loops and the program which translates my code into the GPU code just create "On The Fly" nodes of the same kernel. I guess I'm missing something.

5. If people wanna do "Heavy Filtering" Performance Measurement" is a must. A tool for that would be great. If it makes it easier it doesn't have to be in the IDE.

6. The Image size is important for any "Generalized Convolution" Kernel. Just to know when you pick something outside. There's a Bypass, a Symmetric Continuation of the Image is easily done. As long as the number is consistent with the XY plane you created in PB that's great (Means if I have a Width of 2048 I know that (2047.5, y) is the right most pixel). Any other info on the image is great. (Min Max for every Channel, Mean etc...).

10. There are many Algorithms for Calculations on the GPU (Least Squares, Conjugate Gradients etc). I saw some Cg code which creates a Texture or something like that (I guess that's just an Image) of arbitrary size. Then you use it as it was a Matrix and calculate whatever you want. Since dod() is the native way to store data in PB (and only thing which could be changed in the EvaluatePixel phase) it would be great to be able to create arbitrary one for auxiliary computations. Unless there are other ways you may hint of.

They create arbitrary sized Textures (Each is a dod() or something, isn't it?).

They loop over it.

They mention DirectX 9.0C compliance. You support older hardware I think. Might be that the reason?

11. I'm not familiar with Clamping. I meant Padding as in Matlab notation. Basically, Let's say we have image of 2048 x 2048. I use a Convolution mask of 9 x 9. Let's say the current Pixel is (0.5, 0.5) . I try to access (-3.5, 0.5). In regular "sampleNearest" I get (0, 0, 0, 0). What I would like to have back is the value of pixel (0.5, 0.5). Think of it as padding the image with the values of the pixels at the borders of the image. Look at this: