Following up on Daniel's comment, I'd like it if the documentation did more to at least warn you about things that may be bad ideas. For example, with MI, various sources say passive imputation is a bad idea. mi impute allows pweights but again, some say you shouldn't use them. Users have to know what their doing, but a little more guidance on whether something is a good or bad idea (perhaps even just one or two sentences with references to learn more) could help.

Comment

This may be more of a request for a FAQ than for changes in Stata 14. Stata has mi, svy, and xt. I get confused over how and how much I can mix and match these things. Rather than scanning through different manuals, it would be nice to have a single FAQ that showed, say, how I can use (or can't use) mi with xt, what mi commands can and cannot be combined with svy, using survey weights with xt data, etc. Just saying that it can't be done may actually be especially helpful, since otherwise you may go scouring the manuals and the web to figure out how it could be done.

Comment

A big thing, a substantial rewrite worthy a major new version: column-based storage, or even more fundamentally different data structures. Curiously, this seems to me the bottleneck of Stata with N > 10^7.

A small thing: kill "save, replace." It is a disaster if it is invoked by unset local, say, "save `neverset,maybetypo', replace"

An official (and as-fast-as-it-gets) implementation of -rdrobust- and -binscatter-. (Honorable mentions: -psacalc-, -estrat-, or maybe some machine learning tools, at least lasso, and a powerful cross-validation wrapper.)

Comment

Something wilder: support for git or even GitHub. Though this does not need to be more than a promotion of the site and service. But as many social scientist do their first or only programming in Stata, Stata should think about helping them develop best practices hardcode coders would adopt anyway.

Being able to mark up do-files with links to GitHub issues or pull requests, or Asana/Google tasks would be another amazing thing, though very unlikely.

Ideally, I would like see something on the lines of R graphics device where the one could easily define picture size and resolution

Nice code completion, especially I would to get RStudio equivalent a list of options after typing coma for each command. So for instance hitting Tab after typing graph box, would open a list of available options

Generic option to get variable labels instead of variable names whenever desired. It shouldn't be so much hassle with this. If people have desire to get opulent tables with long labels squeezed into table cells they should be able to do it.

Interactive plotting, but I think that something on those lines appeared in programmers community (or maybe I'm wrong)

If I remember well, old versions of Stata used to have something like tutorials that the one could run inside the Stata. I think this idea got second life with the advent of RMarkdown and fashion for "reproducible research". It's a nice thing and definitely worth considering.

Something wilder: support for git or even GitHub. Though this does not need to be more than a promotion of the site and service. But as many social scientist do their first or only programming in Stata, Stata should think about helping them develop best practices hardcode coders would adopt anyway.

Being able to mark up do-files with links to GitHub issues or pull requests, or Asana/Google tasks would be another amazing thing, though very unlikely.

Comment

I endorse Laszlo's wish to eliminate -save, replace-. In my experience, local macro references are the most at-risk for typographical errors of any part of Stata syntax, because to reach the left-quote key, you have to take your fingers off of the home keys. They may return to the wrong place, and then you are likely to mistype the local macro name. If that typo does not constitute a defined macro, as it likely won't, you then clobber a data set inadvertently.

Or perhaps the solution is to find some other notation for dereferencing macros that doesn't require going to the far reaches of the keyboard. If global macros can be dereferenced with $, why can't local macros be de-referenced with @, or something like that?

Comment

Something just dawned on me. I have, in previous versions' wish lists, requested that dereferencing a undefined macro be made illegal, (or at least a setting to do that) rather than returning an empty string. A compelling objection to that is that it would break enormous amounts of existing code that rely on an undefined macro's evaluating to empty.

It might be possible to keep the existing `' system, and also introduce @-dereferencing, as suggested in my earlier post, with @-dereferencing of an undefined macro being illegal. That way those of us who would prefer safer macros, could use the @ method, and those who prefer having undefined macros usable as empty strings could stick with `'. And the change wouldn't break a single program. Also, this method of dereferencing would reduce, or eliminate all those unreadable (to the human eye) sequences of `"``...''"' that currently populate our programs.

Comment

It might be possible to keep the existing `' system, and also introduce @-dereferencing, as suggested in my earlier post, with @-dereferencing of an undefined macro being illegal. That way [...] the change wouldn't break a single program.

I know this is a wishlist rather than a discussion, but might I point out that this particular example would break lots of code, including official Stata's reshape and the entire sem suit, in which @ is a legal character with special meaning. I imagine it would be very hard, if not impossible, to find a single character for dereferencing that would not break at least some old code. After all, this character must not be used anywhere in a (a)do-file if it is to issue an error message if whatever follows is an undefined local macro. It might be possible to use other characters surrounding the local macro name other than single quotes. Another alternative would be a macval2(lmacname) that would issue an error message if lmacname is undefined.

Comment

Good point, Dan. I forgot about the use of the @ character in -reshape-, and in-sem-, even though I use both of those regularly! I guess that wouldn't work. macval2(lmacname) would be reasonable, or maybe something a little shorter such as mval(). Or it might be possible to use something like @@ for local macro dereferencing, just as = and == are different. The gist of it, to me, is to provide an alternative way to dereference local macros, easy to type, and have it reject undefined macros.

Comment

If I could restart the world from scratch and have everything exactly the way I wanted it, we would not have `' . Or at least, we definitely would not have things like `"`'"' or whatever it is. Those things drive me crazy and if they get at all complicated I invariably have to try 5 times to get it right. Unfortunately I am not sure what the world would have instead.

Comment

I would like to see much better support for Full Information Maximum Likelihood (fiml). Some Stata routines, e.g. SEM, provide some support for fiml (which Stata calls mlmv). But, there are several limitations to Stata fiml support as it now stands.

* As Clyde Schechter points out in this thread, http://www.statalist.org/forums/foru...y-imputed-data, "Stata has -method(mlmv)- which is full information but relies on multivariate normality. MPlus has a full information estimator which is also robust to non-normality."

* As far as I can tell, fiml only works with linear models, e.g. you can't use it for logit.

* fiml could be useful in many more commands, e.g. it would be nice if regress had a fiml option (although maybe that would complicate postestimation commands)?

* Also, I understand that some other programs (e.g. MPLUS) let you specify auxiliary variables that help improve the handling of missing data.

I've read in several places that fiml is as good, if not better, than multiple imputation for handling missing data. It is certainly easier to add fiml as an option than it is to impute a bunch of data sets. See, for example,

Comment

What I would personally would love to have is a replace option for generate. It's a small thing in the grand scheme of things, but the lack of one keeps annoying me - especially when doing ad-hoc trials with .do-files and the like. Also, looping could be simpler in some cases.

Somewhat similaris the possibility to save empty data sets - yes, I know there's an addon to do it and also easy ways to work around, but it would make some things much more elegant to be able to do so from scratch.

Last but not least I would love some possibility to temporarily or permanently change the accuracy with which relational operators are evaluated. I recently did learn how to use the -float- function and all, but it seems somewhat tedious, and I keep forgetting and then wasting time finding my mistake... I would imagine there are countless cases where the current behaviour is both unwanted and unexpected by the user.

Comment

What I would personally would love to have is a replace option for generate. It's a small thing in the grand scheme of things, but the lack of one keeps annoying me - especially when doing ad-hoc trials with .do-files and the like. Also, looping could be simpler in some cases.

In the meantime check out regen and from SSC and/or cmpute from SJ.

I would like to add an ascii() function (like the Mata equivalent). More general would be the possibility to write own Stata functions - althogh I think this has been ruled out for reason I do not remember.

Best
Daniel

Comment

Last but not least I would love some possibility to temporarily or permanently change the accuracy with which relational operators are evaluated. I recently did learn how to use the -float- function and all, but it seems somewhat tedious, and I keep forgetting and then wasting time finding my mistake... I would imagine there are countless cases where the current behaviour is both unwanted and unexpected by the user.

You are presumably referring to ==, >, <, >=, <=, !=.

The problem you identify isn't (to me) at all clear. Being unexpected, unfortunately, means more often that the user doesn't understand Stata yet (and that happens all the time to very experienced users too).

If you want to take control and allow some fuzziness in comparisons, I would start with c(epsfloat) and c(epsdouble) as accessible constants. It's not clear precisely what you expect StataCorp to implement that isn't already under user control.