Legends: P(= Probable) > Pl(=Plausible) > E(=Equivocal). What I want is for each child I want to trace it back to Q(Query), but I need the shortest path which leads it to the Q(Query) along with their probabilities. For example for the input data shown above the output should be :-

Code

__OUTPUT_1__ M7: M7<-Q = P M28: M28<-M6<-Q = Pl.E M6: M6<-Q = Pl

But as we can see from second row of input data M7 has another longer path tracing to Q : M7<-M28<-M6<-Q = Pl.E.E. But the code should have an option to neglect the largest path and thus show only the shortest path OR to show all of them. i.e.

this reminds me of another problem I solved a few months ago. This was a game with stones on a 4x4 board. Stones had a white and a black side. So you could have 2^16 = 65536 initial positions. You had 46 authorized moves, each turning around 3 to 4 stones (turning around a full line, a full column, a diagonal or a L pattern). The aim is to have all the stones on their white side. Any game can be solved in 5 moves, so that each initial solution can lead to 46 ^5 games (206 millions), but, given the number of initial positions, that leads to a huge amount of possible games, about 1.35 x 10^13 (13 thousand billion) possibilities. A quick test showed that checking all the possible paths would require more than 225 days on my computer. But at the same time, any move starts from one of 65k position and leads to 65k positions. So my idea was to start from the winning position (step 0), find the 46 positions that can lead to that (step 1) and record them in a hash as winning in one move, examine all the positions leading to one of those 46 positions (step 2) and removing from the problem any position already seen (already in the hash), and so on. Overall, I had to check only 2.34 million moves and that took me about 1.5 seconds. That took 11 lines of code.

Your problem may look very different, but I am pretty much convinced that the very same approach can lead to an easy solution. If you think about the game mentioned above, my problem was really to find the shortest path between the winning position and any original position. And that is very very similar to your problem. My approach made it possible to eliminate very early any solution that could be determined to be sub-optimal, and I am quite sure the same approach would work in your case.

So I would load your data in an appropriate date structure, start from Q, look for all precedessors, mark up those already found by storing them in a hash, look for the predecessors of those predecessors, unless they have been already been visited, etc. until all lines have been marked as solved.

It is quite late here now, but I am willing to give you the code tomorrow (hopefully, I will not forget) if you confirm that this is really what you are looking for.

It is again quite late and I am leaving out for an early flight tomorrow morning, I can't do anything this evening. Actually, when I offered to provide some code, this was the weekend, I could manage some free time, it is quite a bit more complicated during the week. I might be able to try to do something during my flight, but I am not sure at all that I'll have the energy.

Just one question on your example:

Quote

M7 : M7<-Q, P I_1 ... M7 : M7<-M28<-M6<-Q, Pl.E.E IV_34

I would have thought that since you have a direct path from Q to M7 (first line), you would not want to keep the longer path of the last line above. This is apparently incorrect. Does that mean that you want to keep all possible paths?

In that case, my idea of pruning "useless" paths early might not be what you need.

One last question. I think that you asked the same question on Perl Monks. If you got the appropriate answer there, then it would be nice to inform people here to avoid double work. I am happy to try to help others, but it still requires some work (not just a quick answer), so I would not want to do it if it is in fact useless.