What I'm working with:
I have a collection of prices that has very few to no repeating values (depending on the look back period) ie each price value is unique, some prices are clustered and some can be spread apart by great distances.

Because there is only one count for each price, each price therefore has an equal probability weight. This type of data produces a flat (pdf). I'm looking for a curvy linear (pdf), so I can find levels of interest.

Question:
How do I construct a curvy-linear (p.d.f.) from data that have the same frequency/count and probability distribution of (1)?

Potential Solutions:

1) Some of the values are clustered, and they look like they could be grouped to generate an aggregate-frequency/count.
I like this idea, but what technique do you use?

2) I could use volume or ticks to weight the notional price value.
For my work, I'm not interested in the influence that volume or tick weighted distributions would have.

Recommendations of papers or other resources is greatly appreciated.

@vanguard2k

First, I assume that your price data are all from the same asset but spread over a certain time range.
Correct, all prices are from one symbol the S&P500 futures, intraday price.
As a first step you could make a histogram of your data.
It's because of the 'lack' of shape of my histogram (it's flat, like a rug plot) that I'm looking for a technique to tease out a curvy-linear (pdf).
Due to the infrequency of similar price values in my data set, the probability weight of any price is equal to all other price probability values, P($price)=1/sample qty.

You could look into the topic of density estimation here.
I've spent the day reviewing your links, and the method of kernel density estimation (kde) looks promising. But I don't completely comprehend how to construct a (kde).

I've started a list of how to plot a (kde).
What steps have to be taken to implement a kernel density estimations with real world price examples?

Procedure?:

1 Determine what type of partitioning/clustering method to apply to a financial time series (5 categories/methods: partitioning, hierarchical, density, grid-based, and model-based).

2 Apply the clustering technique to partition observations into groups.

If you dont have time series data but only price data and you want to cluster it (you are speaking of "price level clusters") you should look into the topic of unsupervised learning.
I don't understand the difference between 'time series data' and 'price data'?

2 Answers
2

First, I assume that your price data are all from the same asset but spread over a certain time range.

If you are looking for the distribution of the price of this asset on the real axis, you have plenty of methods (several fields in mathematics and statistics deal with this topic).

As a first step you could make a histogram of your data. There you can see about the clusters you were talking about. It gives you a good impression of the distribution of the data.

Answer to question:There are lots of ways how to get a density out of your discrete dataset. You could look into the topic of density estimation here. The free software R (www.r-project.org) has lots of packages that helps you achieve this.

Generally, in the case of time-dependent data (financial time series) you will soon realize other effects(see time series). One notices for example that the density changes over time (due to seasonality, for example). That still not being enough, a lot of (financial) time series appear to be dependent on the past (see for example the topic autocorrelation). The approach to estimate a single density from the data is often not advisable as it changes over time! One tries to model the dependence of the data over time. It is therefore often necessary to speak of "conditional density at time $t$".

As you see, there is a lot you can do here and this is just a small sample of the possible methods.

If you dont have time series data but only price data and you want to cluster it (you are speaking of "price level clusters") you should look into the topic of unsupervised learning. But please be aware of possible changes of your results over time!

In general all mentioned topics are widely used and interrelated.
I hope this answers your question (and I got the meaning of your question right) at least to some extent.

EDIT: Just some remarks to the comments you posted in your question. I hope I found all of them:

As far as the histograms are concerned: The "art" of nice histogramps partly depends on how you choose your intervals. If you take as interval length between 2 and 5 points of your futures contract (for example) you will get a different picture and you should be able to spot something that more resembles a density. You divide your price data in 5 point intervals and count how many of your price data are in each interval. Then you can say $5\% $of the data was between $1408$ and $1410$. I have to stress again here that it would be more than brave to say that there is a $5\%$ probability of future S&P-future values to lie in this interval!

I am not sure how you should link the topics of clustering and density estimation here. For both topics you could definitely look into this resource: Elements of Statistical Learning. It is a free book and is widely used for teaching and learning of (but not only) these topics.

Answer to new question: The density estimation in your picture (or Fig. 6.13 of the book I mentioned) assigns a probability to every value - including those not in the dataset. Just that this is not a property of kernel density estimation in gereral but rather of the kernel used (here it is Gaussian).

Difference between time series data and price data: In mathematics a random sample consists of indipendent random variables with identical distributions. There is overwhelming evidence that the distribution of financial returns varies over time and that they are not independent. Financial time series should not be viewed as random sample because they are neither independent nor identically distributed. That was what I wanted to say here.

$\begingroup$I have a comment that is longer than what is allowed. How do I convey a constructive reply? FYI: your reply is about 3X long than what I'm being allowed to enter.$\endgroup$
– montyhallNov 3 '12 at 6:42

$\begingroup$@montyhall This is Q&A site, not a discussion forum. In general you shouldn't need to reply to answers.$\endgroup$
– Alexey KalmykovNov 3 '12 at 9:34

$\begingroup$@AlexeyKalmykov I don't have enough credits to use the chat room to directly ask for help. FAQ does not address how a new member should respond. Some one voted that there is an answer to my question, there is only a good suggestion (one that I'm working on tks!). How do I change to not ans? I'm looking for a method of how to create a (pdf) utilizing density estimation with real world price data. Since the question has been answered, should I resubmit my problem, but with a different title and I'll included the knowledge gained from vanguard2k with a slightly different angle and questions?$\endgroup$
– montyhallNov 3 '12 at 15:03