Hi, I have a perl script to get N-N distances from pdb file and output into txt file. Now I want to plot histogram of the distribution out of it but I do not know how. Has anyone have any slight idea to share with me? Here is my code:

perl histogram gives somehelpful search results. In particular, this result can easily be modified for your purposes. It can produce output on your data like below (if you round your input to nearest integer):

Chris probably has the best solution for the actual histogram. I can make several comments on the code that you posted.

Your code could not have produced the output that you posted. I have no way of knowing which (if either) is correct. The nested loops would have produced one zero for every valid data point (The distance from the point to itself). You would have had two of every other distance (The distance from point I1 to I2 is the same as the distance from I2 to I1). By the way, Your code ignores the last data point.

Declaring most of your variables with file scope negates most of the advantage of using strict.

There is no reason to slurp the entire file. It should be processed line-by-line.

You should use lexical file handles and the three argument form of open. You should always verify the success of open. Close should be done as soon as possible.

The vector arithmetic should be in a subroutine, perhaps even a module.

Most of us do not know what a pdb file is. How about giving us a specification, a non-trivial sample, or a link.

The use of CPAN modules can greatly improve your existing code. The following example uses the subset function to avoid the messy details necessary to get all the pairs of points and nothing else. Storing the points as vector objects completly eliminates the need for the individual components in the code. The dist method clearly computes the distance between the two vectors.

There are 3288 lines beginning with ATOM (in the sample pdb file from Univ of Pittsburgh). To calculate the distances is a combination of 3288 choose 2 for a total of (3288*3287/2) = 5,403,828 distances and you want to creat a histogram from these 5 million? Not as simple a task as the sample distances you provided, (52 distances).

Well, after running a modified program I was able to create a histogram from the University sample. Each 'x' on the histogram is a quantity of 1500 points. The program and the histogram are attached,