Re: Re: st: question on distribution of values

By "unique" here also understand "distinct".
This may be one of those mid-Atlantic linguistic problems
differentiating English and American.

After all, the Unix utility uniq, invented in New Jersey, removes
duplicate lines, and leaves just one copy of each of the distinct lines
in a file. It does not identify lines that occur just once in the file.

Tim's suggestion is illegal in Stata, as only one -egen- function is
allowed on the RHS of an -egen- command.

It would not be correct if it were legal, as -egen, count()- does not
count distinct values. There is a function in -egenmore- from SSC that
does, but official Stata suffices here.

First, tag each distinct co-occurrence of -order- and -zip-

egen tag = tag(order zip)

Now sum within -order-

egen distinct = sum(tag), by(order)

OR

egen distinct = total(tag), by(order)

Now you are home and dry

gen average_pkg_per_zip = qt / distinct

It took me several years to realise that the -nvals()-
function in -egenmore- was pretty much redundant
given the -tag()- function of -egen- that I introduced
earlier (although did not really invent).

1. The quantity of packages (qt) listed does not correspond directly to
the zip code. For example, Order #1 requested 5 packages, to be
distributed among each of four zip codes, or 1.25 packages per unique
zip, not 5 packages per zip code.

2. I have yet to find the correct syntax that would allow me to create a
variable that would show the distribution of Qt among the zip codes.
I've played with egen, but can't get it to work.