<div><div>On Jun 20, 2012, at 11:06 AM, Brian Granger &lt;<a href="mailto:ellisonbg@gmail.com" target="_blank">ellisonbg@gmail.com</a>&gt; wrote:<br>
<br>
&gt; On Tue, Jun 19, 2012 at 7:49 PM, MinRK &lt;<a href="mailto:benjaminrk@gmail.com" target="_blank">benjaminrk@gmail.com</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; On Tue, Jun 19, 2012 at 7:25 PM, Brian Granger &lt;<a href="mailto:ellisonbg@gmail.com" target="_blank">ellisonbg@gmail.com</a>&gt; wrote:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; On Tue, Jun 19, 2012 at 5:01 PM, MinRK &lt;<a href="mailto:benjaminrk@gmail.com" target="_blank">benjaminrk@gmail.com</a>&gt; wrote:<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; On Tue, Jun 19, 2012 at 4:20 PM, Brian Granger &lt;<a href="mailto:ellisonbg@gmail.com" target="_blank">ellisonbg@gmail.com</a>&gt;<br>
&gt;&gt;&gt;&gt; wrote:<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; When the metadata PR come up, I was originally going to vote -1 on it<br>
&gt;&gt;&gt;&gt;&gt; because of this issue. I sat on it for a while and in the end decided<br>
&gt;&gt;&gt;&gt;&gt; that it was OK because I think the need for metadata is already upon<br>
&gt;&gt;&gt;&gt;&gt; us even though we don&#39;t have an actual usage case in our own code base<br>
&gt;&gt;&gt;&gt;&gt; (for example, we don&#39;t have a metadata UI in the notebook web app).<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; There is a fine line to walk here. On one hand, I completely agree<br>
&gt;&gt;&gt;&gt;&gt; with you that we should try to future-proof the notebook format to<br>
&gt;&gt;&gt;&gt;&gt; minimize disruptive format changes. On the other hand, adding things<br>
&gt;&gt;&gt;&gt;&gt; too soon leads to even more potential disruption for the following<br>
&gt;&gt;&gt;&gt;&gt; reason. As I developed the notebook format and notebook UI last<br>
&gt;&gt;&gt;&gt;&gt; summer, there were multiple situations where I added something to the<br>
&gt;&gt;&gt;&gt;&gt; notebook format before I actually used it in the UI. In many of these<br>
&gt;&gt;&gt;&gt;&gt; cases, when I did get around to developing the UI for it, I realized<br>
&gt;&gt;&gt;&gt;&gt; that my original thoughts on that element were incomplete. It wasn&#39;t<br>
&gt;&gt;&gt;&gt;&gt; until I wrote the UI that used the data that I realized exactly what<br>
&gt;&gt;&gt;&gt;&gt; the format of that data needed to be. As a result, I had to go back<br>
&gt;&gt;&gt;&gt;&gt; and modify the notebook format. After a few iterations of this, I<br>
&gt;&gt;&gt;&gt;&gt; realized that this approach was broken and started to enforce the<br>
&gt;&gt;&gt;&gt;&gt; following simple rule on myself: don&#39;t add it to the notebook format<br>
&gt;&gt;&gt;&gt;&gt; until I am ready to write the UI code that uses it. That rule served<br>
&gt;&gt;&gt;&gt;&gt; me very well last summer.<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; This is why for example the notebook and cells do not currently have<br>
&gt;&gt;&gt;&gt;&gt; any timestamp information (even though I think we will eventually want<br>
&gt;&gt;&gt;&gt;&gt; it). The one notebook feature (which I regret adding to the format)<br>
&gt;&gt;&gt;&gt;&gt; that doesn&#39;t have a UI is the multiple worksheets. We absolutely want<br>
&gt;&gt;&gt;&gt;&gt; that as a feature, I just wish I had waited to add it to the notebook<br>
&gt;&gt;&gt;&gt;&gt; format. When we do implement the mulitple worksheet UI, it is likely<br>
&gt;&gt;&gt;&gt;&gt; we will want to go back and make changes to the notebook format to<br>
&gt;&gt;&gt;&gt;&gt; better reflect the UI (for example, we will probably want to persist<br>
&gt;&gt;&gt;&gt;&gt; which worksheet is active/open).<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; I couldn&#39;t agree less. There is simply no reason that adding support<br>
&gt;&gt;&gt;&gt; for<br>
&gt;&gt;&gt;&gt; multiple worksheets in future versions of IPython should render<br>
&gt;&gt;&gt;&gt; single-sheet<br>
&gt;&gt;&gt;&gt; notebooks unreadable in 0.13, just like adding new metadata should not<br>
&gt;&gt;&gt;&gt; make<br>
&gt;&gt;&gt;&gt; the notebook artificially unreadable.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; I am not sure I am following you on this. Are you suggesting that<br>
&gt;&gt;&gt; 0.14 notebooks (let&#39;s say we bump to a v4 nbformat with expanded<br>
&gt;&gt;&gt; worksheet support) should be readable in 0.13?<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; I think I am saying the opposite - with the current state of 0.13, adding<br>
&gt;&gt; multi-worksheet support to the *javascript* should not result in<br>
&gt;&gt; incrementing the notebook version.<br>
&gt;<br>
&gt; With the current state of the notebook format, I think we can probably<br>
&gt; pull this off. So far, the only changes to the notebook format I can<br>
&gt; imagine will be minor version incrementing ones.<br>
&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; For the cell and worksheet metadata, I knew we would eventually need<br>
&gt;&gt;&gt;&gt;&gt; it and I didn&#39;t want to hold up the beta release any longer. But<br>
&gt;&gt;&gt;&gt;&gt; there are still unanswered questions related to it:<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; * What types of things go in the metadata?<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; * Is this an area for us to write data to, or for advanced users to<br>
&gt;&gt;&gt;&gt;&gt; write data to?<br>
&gt;&gt;&gt;&gt;&gt; * Is it entirely unstructured, or will we require a discussion for<br>
&gt;&gt;&gt;&gt;&gt; each new key/value entry into it.<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; It is not at all clean that the current metadata design will hold up<br>
&gt;&gt;&gt;&gt;&gt; to our answers of these questions. But in the end, I sort of wanted<br>
&gt;&gt;&gt;&gt;&gt; to add the metadata as it is now, so we could being to see how we and<br>
&gt;&gt;&gt;&gt;&gt; others start to use it. But just because we added the metadata to the<br>
&gt;&gt;&gt;&gt;&gt; notebook format definitely doesn&#39;t mean that future-proofs this part<br>
&gt;&gt;&gt;&gt;&gt; of the notebook format.<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; Hope this clarifies things a bit.<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; Sure, while it is extremely clear that we need cell metadata, we cannot<br>
&gt;&gt;&gt;&gt; be<br>
&gt;&gt;&gt;&gt; 100% certain that<br>
&gt;&gt;&gt;&gt; a simple dict will solve 100% of the cases we encounter. But adding it<br>
&gt;&gt;&gt;&gt; now<br>
&gt;&gt;&gt;&gt; means that we have at least a *chance*<br>
&gt;&gt;&gt;&gt; of making a release that is not backwards-incompatible.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Yes, I agree with this.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; Back to the question of output-level metadata. When a bit of code<br>
&gt;&gt;&gt;&gt;&gt; remains unused for almost a year, I start to question whether we<br>
&gt;&gt;&gt;&gt;&gt; really need it. I not convinced we don&#39;t need it, I am not sure. In<br>
&gt;&gt;&gt;&gt;&gt; light of this, I don&#39;t think that adding it to the notebook format<br>
&gt;&gt;&gt;&gt;&gt; makes sense. When one of us finds a good purpose for this metadata,<br>
&gt;&gt;&gt;&gt;&gt; let&#39;s add it to the nbformat them.<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; I believe the only current use is in the parallel display republishing,<br>
&gt;&gt;&gt;&gt; where the engine ID is added to the display data<br>
&gt;&gt;&gt;&gt; so that frontends could theoretically draw display data differently<br>
&gt;&gt;&gt;&gt; based on<br>
&gt;&gt;&gt;&gt; which engine it came from.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Yes, we have discussed this. The only other situation where I<br>
&gt;&gt;&gt; remember thinking about this is if we wanted to use metadata to help a<br>
&gt;&gt;&gt; frontend interpret JSON display data. There are numerous reasons code<br>
&gt;&gt;&gt; might display JSON data, and that code would have to help the frontend<br>
&gt;&gt;&gt; to know what to do with that data.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Do you think the engine ID idea makes sense to implement or should<br>
&gt;&gt;&gt; that information just be passed in the formatted display data itself?<br>
&gt;&gt;&gt; We could also handle by creating a custom JS widget that knows how to<br>
&gt;&gt;&gt; intelligently display data from multiple engines.<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; Right now I do both since the metadata is totally ignored, but I think it&#39;s<br>
&gt;&gt; better to have less markup in the output itself. It is precisely the same<br>
&gt;&gt; reason we don&#39;t embed the rendered prompt in the output of execute replies -<br>
&gt;&gt; frontends have their own way of rendering them (in the prompt column, etc.).<br>
&gt;&gt; The metadata could be used to do that for parallel results, rather than the<br>
&gt;&gt; current behavior of having fakee prompts in the general output area.<br>
&gt;<br>
&gt; OK if you think we want to go this route for displaying the engine<br>
&gt; IDs, then we should i) keep the display data metadata in the message<br>
&gt; itself and ii) move towards persisting that information in the<br>
&gt; nbformat.<br>
&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; The other philosophical line of reasoning that I am being guided by<br>
&gt;&gt;&gt;&gt;&gt; here is simplicity. It would be very easy to over design the notebook<br>
&gt;&gt;&gt;&gt;&gt; format and add all sorts of feature that we might need. I think this<br>
&gt;&gt;&gt;&gt;&gt; is a wrong direction to go. We want a notebook format that is as<br>
&gt;&gt;&gt;&gt;&gt; compact and minimal as possible, where each and every bit of data is<br>
&gt;&gt;&gt;&gt;&gt; there for a well-defined and justified reason.<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; I think it&#39;s simple: We have had ideas over and over and over again for<br>
&gt;&gt;&gt;&gt; features requiring metadata attached to cells (hashes, links,<br>
&gt;&gt;&gt;&gt; timestamps,<br>
&gt;&gt;&gt;&gt; etc.), so this is clearly a feature we have a need for right now.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Yes - maybe I wasn&#39;t completely clear. I do think that having cell<br>
&gt;&gt;&gt; and worksheet metadata right now does make sense.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; It would<br>
&gt;&gt;&gt;&gt; be totally silly for adding timestamps to require updating the nbformat<br>
&gt;&gt;&gt;&gt; in a<br>
&gt;&gt;&gt;&gt; backward-incompatible way.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; And I am definitely not suggesting that it would or should.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; And the biggest advantage of using json is that<br>
&gt;&gt;&gt;&gt; adding keys has no effect on backwards *readability*. It&#39;s only adding<br>
&gt;&gt;&gt;&gt; values/types that can cause problems, and should force new versions<br>
&gt;&gt;&gt;&gt; (e.g.<br>
&gt;&gt;&gt;&gt; changing worsheet to worksheets, or adding new cell types).<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Yes, JSON indeed turned out to be much nicer than XML for this type of<br>
&gt;&gt;&gt; thing exactly because of this.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; But I am wondering what your thought are about newer notebook versions<br>
&gt;&gt;&gt; being readable by older IPython versions. I have always thought that<br>
&gt;&gt;&gt; we would promise that older nbformats would *always* be readable by<br>
&gt;&gt;&gt; newer IPython versions, but that we would make no promises about newer<br>
&gt;&gt;&gt; nformats being readable by older IPython versions. I just want to<br>
&gt;&gt;&gt; clarify what other people are thinking in this respect.<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; Incrementing the nbformat means making notebooks unreadable in old versions,<br>
&gt;&gt; yes.<br>
&gt;&gt; This is very painful if we are doing it every six months. I am only trying<br>
&gt;&gt; to make<br>
&gt;&gt; reasonable efforts that the current nbformat is prepared for changes we<br>
&gt;&gt; *know* we intend to make soon,<br>
&gt;&gt; so that incrementing the nbformat is reserved for changes we don&#39;t already<br>
&gt;&gt; have planned, and aren&#39;t<br>
&gt;&gt; already prepared for.<br>
&gt;&gt; Obviously, if we have a change that we cannot fit into the current format,<br>
&gt;&gt; then we increment.<br>
&gt;<br>
&gt; I honestly can&#39;t think of any upcoming changes to the notebook format<br>
&gt; that we have thought about which would require a major version<br>
&gt; increment like you are talking about. I think there are lots of minor<br>
&gt; ones that we can do using minor version increments. I like the minor<br>
&gt; versioning scheme we have now as it clarifies our policies on this.<br>
&gt; So I think overall, the notebook format is pretty future safe for the<br>
&gt; time being. I hope we can stick with the 3.x nbformats for a few<br>
&gt; IPython releases.<br>
<br>
</div></div>I&#39;m curious what the effective difference between a minor version and<br>
a major version would be to me, the user. Would you try to make minor<br>
versions backward compatible if possible, either by not putting in new<br>
keys if they don&#39;t need to be there or by somehow trying to future<br>
proof the notebook to new unexpected notebook format changes?<br></blockquote><div><br></div><div>Major version: totally unreadable, don&#39;t even try</div><div>Minor revision: newer features are obviously unavailable, but the format is fundamentally readable</div>

<div><br></div><div>The minor version stuff is not meant to make it impossible, or even any harder, to update the nbformat. Only to give us a mechanism for expressing &quot;this notebook is newer, and may use features you don&#39;t have, but at least you can still read it&quot;, which we did not have before - there was no distinction between &quot;created by exactly this version&quot; and &quot;totally unreadable&quot;.</div>

</div></div></blockquote><div><br></div></div></div>So you are going to attempt to keep minor versions backwards compatible? Or maybe I&#39;m misunderstanding what you mean by &quot;readable&quot;.</div></blockquote><div>

<br></div><div>Backward-compatible only in that the general file format remains readable. Obviously, if you make use of features that depend on the changes in the minor-revision, that part of your notebook will not work. But if the fundamental format of the notebook does not change, users of 0.13 can open the new notebooks, and will get a warning that it was created by newer IPython.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Because as far as I, the user, am concerned, if a newer notebook<br>
format version doesn&#39;t work at all in older versions of IPython (such<br>
as is the case with notebook format v3 and IPython 0.12), then it<br>
hardly matters how &quot;major&quot; or &quot;minor&quot; the changes were. Or maybe you<br>
are thinking more for the benefit of people like Sage who are building<br>
on top of the notebook API?<br></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
By the way, I completely agree with Brian that future proofing is<br>
usually a waste of time. But also be careful against overly &quot;past<br>
proofing&quot;. I would much rather see new features added to the notebook,<br>
even every release, than to have them held back simply for the<br>
purposes of keeping things backwards compatible. Also, if jumping the<br>
gun on future proofing is a waste of time, so is spending a lot of<br>
effort on making sure that new notebook versions work correctly in<br>
older, unsupported releases.<br></blockquote><div><br></div><div>I totally agree that we should not spend significant effort on future (or past) proofing, and we haven&#39;t. Nor is there any reason this would cause resistance to new features that do require updating the nbformat. If a hoop must be leapt through to keep the nbformat, then the nbformat should be updated. We have a hoop threshold of zero. This only aims to prevent *known, planned, imminent features* from necessarily forcing that unpleasantness (they still may, since they haven&#39;t actually been implemented).</div>