Recommended Posts

This isn't exactly game programming but this has been the greatest site for assistance so i figured i would ask here.
I'm trying to create a raytracing framework. I am trying to write a simulator for a large kinematics engine i am designing the kinematics system is mostly finished. now for the rayracer. I need to design a depth first search engine to use on a distributed type system. now i'm not worried about how long it takes to render one frame. i DO want it to be as fast as possible but ease of coding should be step one and refactor from there. the idea is pretty simple. a 'client' user requests a work unit. the work unit is a collection of rays in a general cone (alias rays from primary or alias reflections/refraction/shadow check) and should return either a color value to be integrated through or the rays to be used to generate a new work unit. the reason i went with depth first is because these work units need to be as small as possible even if they generate a large number of rays to transmit back (aka 5 mins or so to get a new unit and 5 mins to return result)and this system seemed to be the best. this may also have benefit in the intersect arena over thigns like cached results and such (similiar to the difference between scan line tracing and square tracing)
Any suggestions? no ideas will be off hand rejected, honest!

Share this post

Link to post

Share on other sites

Guest Anonymous Poster

a) you need a memory management system that is cross-platform compatible as you would possibly have a heterogenous network of machines working together.

I suggest you 'tag' memory objects and type them, that you restrain yourself to working with 32-bit quantities (aka 'long' and 'float' types) because of byte ordering and packing efficiency.

Accessing a memory object is a "Lock / Modify / Unlock / Network Flush" combined operations. Remove the "Network Flush" part for just reading. "Lock" should pull the object from whatever computer it is on if it's not in local memory. Written in C++, you can have accesor functions for this.

You should design your tag numbering system so that it puts the owning computer's number into it. For example, reserve the first 8 bits of the 32-bit tags for the network computer network. Then you can split the remaining bits into buckets. For example, tag 0x1234ABCD could translate to location "(void *) &memory[0x12][0x34][0xABCD].pData". You can make your data types match the second bucket to that for example all cameras are allocated within the same memory area; it's more heap-efficient. In the example above, cameras of computer 0x12 on the net would be located in the 0x34 memory area, and the exact camera you are looking for is camera # 0xABCD. Makes type checking and memory management much easier on the system and on the network.

Scene data should be organized as a DAG whereby an element can reference other sub-elements by tag. You could have special "group" data tag elements that contain a variable list of tag children and an "object" tag element. This way you can have hierarchical data trees.

b) you need a task queue manager. A "task" would be a description of a rectangle area of the resulting image, a thread would be responsible of computing.

You can use the tag-based system described above to describe task elements and relationships to other tag-based tasks. When the system starts, all system and network threads are waiting for the task queue manager to release tasks. The main thread fills up all the rendering tasks into the tag-based memory and works out the relationships between the tasks. Then it send the signal to start processing at which point all threads in the system start pulling tasks one at a time.

Tasks can be split into sub-tasks and put back in the queue; for example, you can split a large Bezier primitive into ever smaller Bezier patches until a set criterion is reached whereby the Bezier patch is replaced by two triangles forming a rectangle. The task manager should be smart enough not to release a task for processing if its dependent task have not been resolved. The task queue thus has to do a topological sort each and every time a new task is either removed or added, and this operation shall be atomic (aka thread-locked).

c) you need a way to exchange data with other computer in the net that share that entire distributed database and a way to represent a remote computer. You can add a "network thread" for each computer you are connected to, and this thread's job is to exchange data with its assigned computer back and forth and act as a worker thred on the local system for local resources. For example, if a remote computer "Locks" a tagged data block this should translates as a TCP communication mechanism with its local thread representative and this local thread will thus have to secure that memory local on the local computer by performing a local "Lock / read / Unlock" operation on behalf of the remote computer.

d) you need a way to reference outside resources that would be available across the net. For example, it doesn't make sense to transmit picture data back and forth if there is a central local for all images used for rendering.

I suggest you use a sed-like naming scripting system whereby each computer on the net preloads some pre-defined tokens specific to the platform and relative location on the net. Some of the tokens could be based on a small file that is available locally or on environment variable. For example, let's say the token "Project" is associated with the string "Raytracer/version 3" and that each computer has defined its "IMG_LOCATION" environment variable to point to the network path where the image library exists. Then the following filename:

Share this post

Link to post

Share on other sites

Guest Anonymous Poster

e) you could investigate using a scanline rasterizer for primary rays. For each pixel on the screen you can quickly determine the list of surfaces intercepted and shade them in reverse depth order. An obvious optimization would be search for the first opaque intersected object fragment in the list and start your shading from there instead of the very end of the fragment list. Shading the fragments become "tasks" whereby the top task is the final color computation and all the other fragments undernearth are dependent tasks to be resolved first. This helps your "lazy evaluation" paradigm you were looking into.

f) you can assign a priority in the rendering tasks so that the bucket in the center of the screen are rendered first. It's easier to spot a mistake if what it is you are raytracing is in the middle of the screen and the raytracer DOESN'T start from the upper left corner.

g) you should implement a "kill" function in the task queue manager so that the queue can stop distributing tasks and ask running threads to abandon their current work and revert to the "wait" state.

h) an interesting optimization for the task queue manager would be to assign redundent tasks to network threads that would be otherwise in a wait state. This happens then the last tasks are distributed as the rendering processing is about to end; last thing you want is for the slowest computer to hold up the entire process. This optimization allows the best available computer to "win" and kill all the other identical tasks being run.

Share this post

Link to post

Share on other sites

I would love to use that the problem is that i'm talking scenes with billions of objects with billions of polygons in each object not even counting the number of rays casted. I'm not even using a standard color model. i'm using a frequency chart for each ray. (so you get effects like polarization etc) sub surface scattering (think the milky quality of milk the scattering is not at the intersection but just below it) and the reason i am doing depth first tracing instead of breadth first is because i would like the clients to ONLY down load the information they need to check rays in there small area (they don't need the whole scene descriptor) so not only do they do the calculations but they figure out what should and shouldn't be in the casting by asking for that 'section' of the oct tree space for the casting.so lets see. distributed, high alias levels (adaptive of course) adaptive focal length, sub surface scattering, motion blur, 'true' ligh model (frequency calculations). I'm not worried like i said how long it takes per 'frame' i'm more worried about how realistic it is. the kinematic modeler is relativly simple in comparison. I am trying to create a rather robust work client server set up as it may have to work for another project i have planned for latter. so think of it like seti (cool program) you decide your going to let your computer do some work oyu load the program up it calls up the server and asks for a work unit, you download your unit and start processing it. oyu finish it and upload the results. either a 'final' frequency spectrum for that ray or a 'bundle' of reflective, refractive, subsurface, shadow rays. the rays are tagged with what space it travels through and all the ones in the same area are bundled in work units with tags indicating what space goes with that unit. after the unit is uploaded to the server the server checks the size of the resulting work unit with objects if over a certain size it is broken up if to small it is combined with another unit thats tagged similiarly. repeate till all work units are clear.

once all units are finished the frame is finished. now update the physics by a specific time frame (it will be relativly small time frame) the big problem will be to make sure i don't generate any time aliased effects those are not good! hehe.

thats the framework i am thinking of going with. any speed up suggestions on this?

0

Share this post

Link to post

Share on other sites

Guest Anonymous Poster

Depending on how set you are on writing your owd distributed computing system, I may be wasting your time, but I thought I should speak my mind. You could always use Pov-Ray combined with an OpenMosix Linux cluster to do your rendering. You could create a cluster very simply with something like ClusterKnoppix (if you don't have PXE enabled NICs, there are simple ways areound it). I believe Pov-Ray has many of the advanced features that you would ike and could be very powerful when combined with OpenMosix. In order to tie your kinematics simulation in with Pov-Rays rendering engine, you could write a program that rewrites the code based on which frame needs to be rendered, then renders with povray...The clustering should be made quite simple with OpenMosix.

Share this post

Link to post

Share on other sites

Guest Anonymous Poster

the reason i am doing depth first tracing instead of breadth first is because i would like the clients to ONLY down load the information they need to check rays in there small area

It doesn't make any difference if you do depth-first of breadth-first unless you assume a very low degree of scattering. What you describe as your typical scene isn't going to bode well with algorithms that assume ray coherency (ex: Arvo & Kirk's Ray Classification scheme). In a system with any sort of scattering, any pre-processing step needs to have access to the whole scene data; at best you could compute per-object bounding boxes and tesselate on-demand, and redeem the tesselation memory on an LRU basis.

Quote:

I am trying to create a rather robust work client server set up as it may have to work for another project

You can also look up MPI (Message-Passing Interface) as a more standardized alternative than the tag-based one I described above.

Quote:

motion blur

Just curious as to how you plan to deal with this one. Adaptive tessellation doesn't bode well with motion blur in general as you cannot guarantee frame-to-frame topology coherency. The way commercial system do this is by computing per-vertex motion vectors and shoot stochastically time-sampled ray onto the moving triangles. But that generates artifacts if the frame rate is below an object's deformation Nyquist limit (i.e. the triangle motion paths cannot be approximated well-enough by a line within two frame samples). A way around this is to not only have access to all the geometrical data for the scene at that frame, but also of all the animations. Therefore the animation can take a non-linear path and be more precise, but that would be a lot more costly. As you don't really care about computation costs, that's probably the way to go.

Share this post

Link to post

Share on other sites

depth first : gather all the work units that have similiar object transfer requirments and any computers with high bandwith can use these 'large' units.

breadth first : no way to effectivly gorup work units there scatters over the breadth of the image.

thats why i am saying depth first.

as to useing povray there license agreement says no using it in a framework unless i retain basilcy an interface to the core of povray. (so if you use it it has to be 'povray' only with more features, not a ray engine for your use)

as to having access to the animation the animation needs to run concurrently with the rendering so i can't 'pregenerate' the animation information then use it.the way i have this mapped out so far:

database storing the work units, scene as it is for this current frame, and the kinematics information for this current frame, and at least one backup of a frame and kinematics snapshot. : whatever database system not to horribly important at this moment.

client server software i am thinking making this in c# (those of you who are about to freak on this i LIKE c#) why? ease of communication and interaction.

'core' of the render engine and kinematics modeler will be in c++ and liberal asm where needed. will get to that once i get it up and running.

<holds head and moans> sheesh this thing is going to be huge >.< oh well will be a cool project when i get it up and runnign at least.