Re: st: Detecting Outliers

Well Ronnie you presumed correct. This is panel and I should have made
this absolutely clear. A graphical approach is indeed in order.
On 5/2/06, Ronnie Babigumira <rb.glists@gmail.com> wrote:

Raphael, I totally missed the time dimension. Nick has given it more thought and has offered a better answer. Please
ignore my "solution".
Ronnie
n j cox wrote:
> The short answer is Yes, many of them.
> A longer answer is more difficult to do well
> given such little information.
>
> We have just had a thread on an overlapping
> question. Look for "outliners" [sic] in
> the archives.
>
> You don't quite say so, but these sound like
> panel data. For concreteness, I guess 500
> patients and 10 observations on each, one
> for each year. My guesses have some
> influence on my suggestions.
>
> What is an outlier in this context? Presumably
> a patient who differs from many others; or
> an observation that differs from the rest
> of the patient's history. Both could make
> sense, e.g. in the case of anorexic/bulimic
> patients, or patients who had a really bad
> year, say a fight with cancer or being
> caught up in "Lost".
>
> First off, if a patient's height varies more than
> trivially over 10 years, either there is something
> going on, say growth for young people or some aging
> effect, or there is a error in the data.
>
> Weight fluctuations would seem rather different
> and everyone knows reasons for various kinds
> of weight change even in adulthood. It would
> seem a bit more difficult to pick up
> on errors (meaning mistakes).
>
> There are lots of things you can do. You
> could set up a loop to plot the time series
> for each patient. For 500 patients that would
> be a little tedious, but it is a direct
> approach.
>
> You could try reductions, e.g.
>
> last height - first height
> last weight - first weight
> mean height over period
> mean weight over period
> some measure of variability of each
>
> and look for outliers on pairwise plots
> of each. A scatterplot matrix often
> shows errors even in data that have
> supposedly been cleaned. Often
> the cleaning is univariate, but a
> weird data value can show up like
> a run in fabric.
>
> My prejudice is that no testing or
> measuring approach beats graphics
> for finding outliers.
>
> Nick
> n.j.cox@durham.ac.uk
>
>
> Raphael Fraser
>
> I have 10 years data (5000 observations) on patients heights and
> weights. Is there any ado-file that could assist in locating possible
> outliers?
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/