In principle, dictionary based compression techniques work by creating a
grammar from a body of text such as a set of programs. One recent
compression technique which might be appropriate for your needs is Sequitur
by Craig Nevill-Manning which can be used for "inferring hierarchies from
sequences". Check the website at http://dna.stanford.edu/sequitur/ for more
info, including details of the algorithm and various research papers
discussing it. There's also some C++ and Java code examples so it shouldn't
be too difficult to make use of in your own coding

Cheers

Derek

Rahul Jain wrote in message 99-02-025...>I'm working on a project for which I need some information about some>reverse engineering method that would help me extract the grammar from a>set of programs (written in any language). A sufficient grammar will be>the one which is able to parse all the programs.> Now, the question is - Does there exist some formal theory for getting>the grammar from a program. Any heuristic approaches would also solve the>purpose.