I think we would still have pars and include arguments to the fitting functions like we have done in RStan for a long time. But the implementation of that has always been sort of fake. It actually does store them (and computes the means) and then drops them before returning. There should be some common mechanism in the Stan library to accumulate the marginal means and variances of everything and for the interface to know to skip storing parameters that are not wanted.

For means and variances, that should be easy to call the corresponding accumulator after it calls write. We need to decide how to handle these accumulated means and variances when the sampler is done. I guess CmdStan would write them to a separate file but PyStan and RStan should store them in the objects being returned.

It seems like skipping the storing of some things has to be up to the interfaces. More generally, it should be easy-ish for the interfaces to take in the long vector of parameters and store parts of it in different holders and / or omit parts of it.

I'd mildly prefer that stan::services handle the exclusion of parameters.

As for permuted=TRUE, I'm not sure we reached a resolution. Maybe we
need some user testing?

For now, internally, I'm going to have the fit object store the chains
as a num_chains x num_samples x num_flat_params array (column-major in
memory), since that's the precedent set by stan::mcmc::chains. As Ben
has mentioned, it's easy to give users a view into this array with a
different shape.

Umm, don't strip them, but perhaps don't keep them in the same structure with the same methods. The sampler diagnostics aren't draws, draws aren't sampler diagnostics. We typically want quantiles, mean, sample variance, R-hat, estimated effective samples, and e-bfmi over the draws. We don't want all of that over sample parameters.
This discussion overlaps with the discussion on breaking apart the writers.
If this were done object-oriented, then the return type of something like fit$eta would be something like a draws class that would have methods like r-hat, quantile, etc, whereas something like fit$tree_depth__ wouldn't. I'm not suggesting these things be given separate classes in R or Python... I'm just trying to illustrate that they are different things.

This is completely incorrect. We have iterations which are sequential states of a Markov chain. The states themselves are elements of the sample space (and then dependent values including constrained parameters, transformed parameters, and generated quantities). Then we have the sequential states of the sampler itself which can be considered as an extended sample space.

We very much want to compute expectations of the sampler states just as we would with the parameter states. That's where we get average accept probabilities, and even E-BFMI. Even Rhat is valid as the sampler behavior should be the same for all chains when the samplers are performing well.

And this is before considering plotting applications where we need to be able to correlate the sampler parameters with the sample space parameters.

#' @slot constrained A logical scalar indicating whether an unknown is in

#' its constrained state

#'

every quantity that is saved can be a diagnostic or from one of the blocks in the Stan language. So, we would store energy for example pretty much like any other real scalar, except with type = diagnostic. So, you could do fit$energy, fit$samples("energy"), fit$diagnostics(), ...

I would be okay with this, but the default has to include everything. We need to facilitate users acknowledging and playing around with the diagnostics so that they better understand both when they can trust their fits and how to go about debugging bad fits.

This is completely incorrect. We have iterations which are sequential states of a Markov chain. The states themselves are elements of the sample space (and then dependent values including constrained parameters, transformed parameters, and generated quantities). Then we have the sequential states of the sampler itself which can be considered as an extended sample space.

We don't just have elements of the sample space or the extended sample space. There are things like divergent__ and stepsize__, and treedepth__ (which is a function of n_leapfrog__). But let's just focus on the technical aspects of the design.

Here's what we have per iteration:

iteration number

unconstrained parameters

sampling parameters that are part of the extended sample space

additional information about the iteration from the sampling algorithm (e.g. divergent__)

constrained parameters

transformed parameters

generated quantities

There's no reason all of these have to be jammed into the same structure. I know it's tempting, but let's move away from building god objects. We're currently jamming the sampling parameters, additional information about the iteration, and the constrained parameters + transformed parameters + generated quantities into one std::vector<double> when some of these things actually live in separate structures within the code base and should just be written out separately. For the things that aren't, maybe we can also have those be modular. We're currently concatenating std::vectors and we're also promoting things like tree_depth__ to double.

By making these modular, we can also add more information in the future. We will also have a better way to deal with new algorithms. If anyone has an argument against this sort of modularity, I'd be interested in hearing it.

Well I disagree COMPLETELY. This is not some game in pull a fun pattern out of a text book and talk about modularity being fun and hate on god objects.

A complete state of the sampler CANNOT be reconstructed without all of this information, which his why the underlying C++ spits it all out TOGETHER. This naive goal of splitting everything up is why debugging poorly fitting models in RStan and PyStan is a complete and utter mess right now. A reasonably robust analysis requires a ridiculous amount of bespoke code to get at the right information, and only the most dedicated users will end up doing their analyses correctly. This alone should be enough to motivate why splitting up the information is a terrible design decision.

The number one priority should be providing users with the information they need, and so long as HMC is the default algorithm in Stan then all of this information is going to be necessary.

Do we have one index (for the draw) or two indexes (for the draw and the chain)?

Do we keep the underlying shapes of the Stan object in the output? That is, if I have declared a matrix type in Stan, do I get that 2D structure back in the results?

Do we throw the diagnostics together with the parameters in a combined structure?

For (1): I don't ever need the chain ID. But all the internals, like ShinyStan are going to need it and I'd rather have it be explicit. Can we have a version that collapses and a version that doesn't? One thing I don't like about the current extract() is that the output shape changes based on whether permuted=TRUE or permuted=FALSE.

For (2): I do like that the output of extract has the structure.

For (3): I found it easier back when it was all combined. The whole "god object" thing is about combining heterogeneous structures into a global blackboard-like object, but that's not what we're talking about here. We're talking about mapping names to values on an iteration-by-iteration basis. These names have many flavors:

parameters (continuous)

transformed parameters (continuous)

generated quantities (discrete and continuous)

diagnostics (boolean, discrete, and continuous)

Right now, if there's a categorical discrete output in generated quantities, it makes no sense at all to compute its mean (well, maybe if it's ordinal).

If we do wind up separating these things somehow in the output, then it'd be nice to have exactly the same interface to pull the diagnostics out as the parameters, transformed parameters, and generated quantities.

So, if the output is stored as the Stan dimensions x iterations x chains that is the same as storing it as the Stan dimensions x draws. Users will tend to prefer iterations x chains collapsed into draws and convergence diagnostics functions can easily chop draws into iterations x chains without copying large objects. The

Re 2): I agree that it should be Stan dimensions x something and I think only Allen has voiced disagreement on the grounds that stan::mcmc::chains.hpp has expected things to be chains x iterations x unknowns. But stan::mcmc::chains.hpp is not user-facing and could be changed anyway. I think if a R user wanted something with the unknowns flattened, the would usually be doing as.matrix or as.data.frame, which almost have to be draws x unknowns. Arguments could be made various ways as to what as.array should do. Leading with the Stan dimensions makes it easy to append more chains to the end.

Re 3): To me, this is a CmdStan problem that has been metastasized into a flamewar. For RStan and PyStan, only one object is going to be returned when someone does foo <- fit$sample() but foo is going to have a field for each piece of information that comes back from the C++. So, for RStan and PyStan, it is just a question of how to access it and what gets shown by default in the summary (my opinion is that there should be warnings like there are now when there are divergences, transitions that hit max_treedept__, or chains with low BFMI, but the user doesn't need to see the mean, mcmc_se, sd, quantiles, etc. for things like n_leapfrog__). For CmdStan, it is an actual issue as to whether to write things to one file or several. But even for StataStan / JuliaStan / MatlabStan / MathematicaStan that wrap CmdStan, they can be programmed to read from one file or many pretty easily.

For @Bob_Carpenter's (1), the underlying library should track two indexes if it's running multiple chains, which may be possible with C++11. I think it makes sense that it's tracked in the interfaces.

For @bgoodri's (1), I missed what was the user preference even though you wrote it out as "Users will tend to prefer iterations x chains collapsed into draws." For your example, which is what R users will prefer?

For (2), it sounds like we're in agreement. We absolutely have the freedom to reimplement stan::mcmc::chains.hpp -- I think I hacked that together in a weekend early on and I didn't really expect it to last this long.

For (3), first off, sorry that it's become a flamewar. There's no reason for it.

I agree with @bgoodri. From the underlying library, it should be easier to separate it out as what Bob's suggested with the addition of the additional sampler parameters. Right now, we're concatenating std::vector<double> in order to write out these things. I'm just suggesting we don't and just write them separately. See mcmc_writer.hpp

@Bob_Carpenter, in CmdStan, there are actually two files -- the diagnostic.csv. The second one defaults to not writing out, but when it does, it writes out the sampler params and the unconstrained values.

And, as @bgoodri said, even though the output may be written separately, it'll be kept together in RStan and PyStan. In CmdStan, it'll probably still be kept in one output file even if there are separate writers for it.

No, I did not miss anything. You completely ignored what I am saying which is evident, for example, by your using the phrase "extended sampling space" as some mysterious new space that doesn't include sampler parameters such as divergences.

Modularity implies a decomposition into independent units, and this is fundamentally not what we have in MCMC. To be formal we have two base sources of information at each iteration,

These can be presented modularly to the user/implemented modularly in the underlying code only when the dependencies can be ignored, but this is very much the case. Firstly, as I noted above but everyone seems to be ignoring, expectations of sampler parameters are useful in diagnosing problems. Secondly, correlations between unconstrained and constrained parameters and generated quantities are critical in diagnosing problems with poor fits.

Importantly this is not a theoretical argument. Putting together case studies and exercises for workshops is a mess because the current RStan and PyStan APIs treat the sampler information and parameters as independent, splitting the information up and making it a massive pain to reconstitute and institute a robust workflow. I can't imagine that anyone using those interfaces is running HMC robustly at the moment.

To comment on something @bgoodri mentioned in a later post, canned warnings are insufficient. Not only are the warnings hard to control (they behave differently in the R app, RStudio, knitr, etc) they cannot be recovered after the fact and they censor information. More utility functions provided to the user can compensate for the former, but any such implementation will have to tackle reconstituting the information which again argues that the information is too tightly coupled to be "modular". The latter requires that the coupled information be readily available to the user which, again, demonstrates the coupling of the information.

And as everything is currently implemented there are no problems with adding additional information, either in terms of new sampler diagnostics or new parameters. The code readily handles the different sampler diagnostics in dynamic HMC and static HMC without any custom code. Everything is fused together and passed along with the correct coupling.

Putting together case studies and exercises for workshops is a mess because the current RStan and PyStan APIs treat the sampler information and parameters as independent, splitting the information up and making it a massive pain to reconstitute and institute a robust workflow. I can't imagine that anyone using those interfaces is running HMC robustly at the moment.

I completely agree. I think for RStan/PyStan we should/will have

fit$sample() # or whatever the final name is

return the draws for everything (i.e., not separating out sampler diagnostics into a separate object with a separate access method).

Having to call get_sampler_params and work with the object it returns certainly discourages all but the most advanced users from interacting with the sampler information at all. Returning everything in the same object will also make some of the shinystan and bayesplot code simpler, as they currently treat the parameter draws and sampler diagnostic draws separately in the code as a result of RStan.