Labels

Milestone

Assignee

4 participants

I'm not expecting this to land before 0.11, but I thought I'd get it on the radar.

Every time I've wondered how prompts were produced, I've ended up rather confused. So this is an attempt at cleaning it up, with a new PromptManager class responsible for handling everything to do with the prompts. The critical part is its render method, which assembles the necessary information, then uses the string formatting introduced in Python 2.6 to fill in the prompt template.

I've expanded the definition of 'prompts' to include the auto_rewrite prompt ("------> " by default). So there are now four prompts: input, continuation, output, and rewrite.

This definition of prompts does not include input/output separators. For now, I've left those as attributes of the main InteractiveShell object.

Can you hook up the qtconsole prompts to your new PromptManager as part of this?

I think making that change will help us make the right decisions on the PromptManager, since it will require support for things like HTML prompts without ANSI colors, divorcing the prompt from the InteractiveShell object, and executing things like os.getcwdu() either locally or in the kernel.

Does it make sense to simply generate an HTML prompt in the kernel, and send it over? Or, more generally, have the frontend make a prompt_request with a template (which could be in any format), and the kernel replies with the filled in version?

OK, that makes more sense, I didn't know about those fields. So the frontend needs to parse the format string, work out which variables it needs, and retrieve them from the backend. I think we still need to think a bit more about how this works, though, because a lot of the details that could go in the prompt aren't accessible from the user namespace without importing extra modules (like os or time).

Yes, the idea was to have frontends request whatever information/expressions they want along with each new prompt (possibly including calls to things like os.getcwd() or time.time(), etc). Then they can build whichever display they want with this. In principle, the same kernel could be monitored by two clients that offer different levels of customizability for the prompts.

So the only thing the kernel supplies always is the execution counter, plus whatever variables or expressions may be requested in this form.

Thomas, this overall looks pretty good, and even if it doesn't get us all the way in terms of restructuring the prompt system as we'd like to, it's probably worth merging now. The prompt rework will likely take a few passes until we have a full separation of duties between kernels (who just handle the namespace) and frontends (who present data from that namespace to the users in various ways, including possibly in a prompt).

But there's one thing we should think about: prompts were the one thing for which we kept using Itpl, because it allows the interpolation of arbitrary python expressions, while the new-style string formatting only allows simple variable name access, nothing else. And I know there are projects out there embedding ipython and making weird prompts that update dynamically with function calls, something like: "${some_function()[somearg]}> ".

So it seems your change removes that ability, right?

Now, I'm not totally against that, as long as we plan to complete the transition into a system that allows users to achieve their goal again, even if by different means (i.e. by asking the kernel for these things and using them to build the prompt).

Note that neither my EvalFormatter in utils.text nor the one I typed up in 5 minutes above are finished implementations (at least something breaks in both), but it shouldn't be hard to get something that works.

Well, just that we can do essentially the same thing with a fraction of the code. Also, Itpl parses the string character-by-character in Python code, so we should get better performance from string formatting. I had the idea of dropping some of our external dependencies, and there are only a couple of places that we actually use Itpl. But I'm not going to push the issue.

I don't think Itpl handles unicode at all. So either we need to reimplement Itpl with real string formatting (I doubt such a formatter would be more than ~20 lines), or we need to fix Itpl for unicode support.

I could do with hitting the sack now. @minrk, what breaks in the current implementation of EvalFormatter? And is it a condition that we should care about? The only one I'm currently aware of is that you can't readily put a dict literal ({}) inside an expression in a string (it might work if you double both braces, although I've not checked).

A few quick tests of EvalFormatter2 show that it probably supports everything we need for this PR:

In [2]: from string import Formatter
class EvalFormatter2(Formatter):
def get_field(self, name, args, kwargs):
v = eval(name, kwargs)
return v, name
fmt = EvalFormatter2()
import os
ns = dict(os=os)
print fmt.format(' we are in {os.getcwdu()}...', **ns)
we are in /home/fperez/ipython/notebooks...
In [3]: print fmt.format(' we are in {os.getcwdu()+"hi"}...', **ns)
we are in /home/fperez/ipython/notebookshi...
In [5]: print fmt.format(' we are in {os.getcwdu()}, a is {a} and a0 is {a[0]}...',
**ns)
we are in /home/fperez/ipython/notebooks, a is [1, 2] and a0 is 1...

So @takluyver, it seems to me that if you update the code to use this (perhaps providing EvalFormatter2 as utils.text.FulEvalFormatter), we'd be good to go.

Lots of stuff (function calls) don't work in the EvalFormatter in master. EvalFormatter2 seems to work for just about everything, but slicing definitely doesn't work: fmt.format("we are in ...{os.getcwdu()[-10:]}", os=os). It raises a SyntaxError, because the token is split at the colon for some reason, so os.getcwdu()[-10 is what gets eval'd.

Like I said above, I didn't spend more than 5-10 minutes on either of these, and I had never looked at any Formatter code before. So perhaps a more careful eye (and a test suite for various known Itpl use cases) would be prudent.

Format Specifiers
Each field can also specify an optional set of 'format
specifiers' which can be used to adjust the format of that field.
Format specifiers follow the field name, with a colon (':')
character separating the two:
"My name is {0:8}".format('Fred')

so the parser splits on ':' and returns the stuff after the colon as the format specifier.

Which means that this syntax simply is incompatible with arbitrary Python expressions.

That doesn't mean we can't use it, just that we'd be setting some restrictions on what kinds of things can go in there.

So Thomas, if you can do a little audit of our uses of Itpl and see if a more restrictive version based on EvalFormatter2 (hopefully with a nicer name) isn't a problem, then I have no issues with ditching itpl.

And I hope we'll find the time to complete the prompt machinery for frontends before 0.12.

@fperez, I see, so possibly slicing is out, but it's still possible that even the colon-splitting is implemented in a method that we can easily override. If we can take over right at where the Formatter decides what to do with something inside '{...}', then we should be able to just eval that block, and be done.

I don't see any reason for the new one to not just replace EvalFormatter. I just called the example I tossed up EvalFormatter2 because it was my second 10-minute attempt at writing a formatter that evals code. There should be no need for more than one of these.

Turns out it's actually pretty easy. I looked at the string.Formatter source, and the string docs, and all I have to do is override the _vformat method instead of get_field/get_value like I was doing. Now I can execute arbitrary code, and with a switch interpret ':' for slicing (disabling the format_spec part of fancy-formatting), or leave it as-is, preventing slicing (obviously you can't have both).

I'll do a PR soon, but I'm writing some tests first. For now a preview:

classEvalFormatter(Formatter):
"""A String Formatter that allows evaluation of simple expressions. Any time a format key is not found in the kwargs, it will be tried as an expression in the kwargs namespace. This is to be used in templating cases, such as the parallel batch script templates, where simple arithmetic on arguments is useful. Examples -------- In [1]: f = EvalFormatter() In [2]: f.format('{n/4}', n=8) Out[2]: '2' In [3]: f.format('{range(3)}') Out[3]: '[0, 1, 2]' In [4]: f.format('{3*2}') Out[4]: '6'"""# should we allow slicing by disabling the format_spec feature?
allow_slicing =True# copied from Formatter._vformat with minor changes to allow eval# and replace the format_spec code with def_vformat(self, format_string, args, kwargs, used_args, recursion_depth):
if recursion_depth <0:
raiseValueError('Max string recursion exceeded')
result = []
for literal_text, field_name, format_spec, conversion in \
self.parse(format_string):
# output the literal textif literal_text:
result.append(literal_text)
# if there's a field, output itif field_name isnotNone:
# this is some markup, find the object and do# the formattingifself.allow_slicing:
# override format spec, to allow slicing:
field_name =':'.join([field_name, format_spec])
format_spec =''# eval the contents of the field for the object# to be formatted
obj =eval(field_name, kwargs)
# do any conversion on the resulting object
obj =self.convert_field(obj, conversion)
# expand the format spec, if needed
format_spec =self._vformat(format_spec, args, kwargs,
used_args, recursion_depth-1)
# format the object and append to the result
result.append(self.format_field(obj, format_spec))
return''.join(result)

And the difference between this and the base Formatter is tiny. If we disallow slicing, then the EvalFormatter2 above is cleaner and identical to this when allow_slicing=False.

OK, I've merged #716, so that @takluyver can update this one cleanly, to use the new EvalFormatter that has full slicing support for these situations.

We do need to make one design decision: for these kinds of expressions, do we value more the ability to do slicing or the ability to specify formatting (i.e. using the ':N' syntax for format specification)? I'm leaning towards the latter, both to keep this syntax in sync with normal Python formatting syntax and because fine-tuning formatting seems to be more likely to be useful in producing things like prompts than slicing. And if slicing is needed, people can always hide it behind a little utility function they can write, so it's not like slicing is impossible.

But I'd like to get some consensus before we pull the trigger on the final design for this part.

I'd lean towards having format string enabled as well. If people do want slicing, they don't even need a utility function: x[slice(1,10)] is equivalent to x[1:10]. To give a particular use case, we could expose the timestamp as a datetime object, which can be formatted using the format string.

It is a hard choice. The reason I discovered that slicing didn't work in the EvalFormatter2 above, is the first example I tried was a truncated version of cwd:

f.format("...{os.getcwd()[-10:]}", os=os)

however

you can actually get slicing by using slice objects, and never need a colon:

f.format("...{os.getcwd()[slice(-10,None,None)]}", os=os)

So you can get slice behavior without colon notation, if slightly inconvenient to type, but you can't get formatting-notation back in any reasonably way if you allow slicing. Further, if we don't want to support the slicing, the three-line EvalFormatter2 implementation above is much simpler, and probably preferable (and functionally identical when allow_slicing=False).

OK, it seems we're in agreement that give how slice(a,b,c) is an option (if slightly less convenient), that's the way to go. In that case, we should perhaps go with the really simple EvalFormatter2 (renamed) and be done with it.

It's still useful to have the full one @minrk wrote, as it gives us a replacement for itpl with the modern python syntax and unicode support, but for this we should probably stick to the easy, comprehensible 3-liner.

The only other place we're using Itpl is for expanding variables in %magic and !system calls. My gut feeling is that slicing is more useful than format strings there, but on the other hand we may want to keep closer to the standard Python format as people get more used to that.

@minrk: Should EvalFormatter2 replace the existing EvalFormatter that we shipped with 0.11? I think the main difference is that the existing EvalFormatter will treat {0} as referring to a positional argument, but EvalFormatter2 will treat it as a literal 0.

I think the EvalFormatter has no need to support positional arguments. I would just replace the 0.11 EvalFormatter with EvalFormatter2 under the EvalFormatter name, and keep the big one that supports splitting around as a separate class (FullEvalFormatter, RichEvalFormatter, or something), just in case there is a desire to split args somewhere down the line.

I think of the EvalFormatter as evaluating code in a namespace. Positional args don't really make sense in that context.

The Itpl syntax is really much nicer for expanding variables (especially in the shell profile when it is restored), but it's not very Python, and as we have discovered doesn't play nice with unicode.

@takluyver, did you see my message on-list about the idea of a frontend_ns to handle this issue? The more I think about it, the more it seems it's the way to go, as it gives us a clean solution to the prompt namespace question as well as a place for other things frontends may legitimately want to do in isolation from the user's namespace.

Now, we could go ahead and finish merging this before tackling that to ensure this doesn't bitrot away... What's your take?

Hi Fernando, I did see your message - sorry, I've been a bit busy for the last week or so.

I think a frontend_ns makes sense, but a few questions about it:
1. Do we add frontend_variables & frontend_expressions fields to the execute request message? Or add another message type for accessing those?
2. What goes in the frontend_ns initially? I guess os and sys are obvious choices, along with the cwd-truncating functions currently defined for prompts. And something for timestamps. Anything else?
3. How can the frontend manipulate what's in the frontend_ns, e.g. for importing extra stuff? Can it send arbitrary Python code to be executed? Or a specific way to request an import? Or a way to "push" variables, perhaps from the user_ns (or even from the frontend, if it's in Python)?

As for merging this now - do you think this is the general structure of what we want to go forward with? It has the advantage that it makes the prompts code much more tractable for other people to work on. But if large parts are going to be redone, it might lead people down the wrong track. First, though, I need to sort out the various EvalFormatters. I'll get onto that now.

I'm thinking right now that the cleanest approach would be to use the standard execute request message, but simply add an optional field to it indicating the target namespace. Normal messages would go (as they do now) to the user_ns, but if frontend_ns is specified as the target, then they would get executed there.

I'd say we don't put anything in there other than perhaps our prompt-manipulation machinery (since that's an obvious use case). With 1 above, frontends can simply execute import statements.

Via 1, it's taken care of.

Currently our spec has the notion of user_variables and user_expressions. Those would be interpreted as variables and expressions in the execution namespace (perhaps we should change these names accordingly).

So basically, all we'd need is to add a new optional field, namespace, to the execute_request message. And the behavior would be such that if namespace != user_ns, then silent=True automatically, so that the user counter isn't auto-incremented behind the user's back.

So each namespace would have a standard string name, e.g. "user_local"? That seems a reasonable way to do it. Although I should note that technically you evaluate or execute code in two namespaces - local and global - just to complicate matters a little.

On the other hand, @ellisonbg has a point (from his email) that this should probably be kept as simple as possible, because we don't want the burden of complex code for a fairly minor feature. Is there a simpler way to handle prompts for the frontends? The idea I had previously was that the frontend would send a prompt template, and the kernel would return a formatted prompt (although the frontend still needs some flexibility so it can update prompt numbers).

On Wed, Sep 14, 2011 at 1:42 PM, Thomas <reply@reply.github.com> wrote:
So each namespace would have a standard string name, e.g. `"user_local"`? That seems a reasonable way to do it. Although I should note that technically you evaluate or execute code in *two* namespaces - local and global - just to complicate matters a little.

If you only pass one, it's used for both. No need to make a
distinction in most cases.

On the other hand, @bgranger has a point (from his email) that this should probably be kept as simple as possible, because we don't want the burden of complex code for a fairly minor feature. Is there a simpler way to handle prompts for the frontends? The idea I had previously was that the frontend would send a prompt template, and the kernel would return a formatted prompt (although the frontend still needs some flexibility so it can update prompt numbers).

As I replied in the email, I actually think this is a fairly simple
and generic solution that will actually be useful in the long run.
More feedback welcome: since we don't all agree on this one, it means
it's worth looking more carefully at it before we make any decision.

On 15 September 2011 00:22, Fernando Perez < reply@reply.github.com>wrote:
If you only pass one, it's used for both. No need to make a
distinction in most cases.

Indeed, but just to note that executing in the user namespace is a special
case in another way, because the local and global namespaces can be separate
(when IPython is embedded).

As I replied in the email, I actually think this is a fairly simple
and generic solution that will actually be useful in the long run.
More feedback welcome: since we don't all agree on this one, it means
it's worth looking more carefully at it before we make any decision.

You were talking about using the user_variables & user_expressions fields
(perhaps with a different name) to access variables from the frontend
namespace. Does that mean that, for the common case of just accessing
variables, the frontend will have to send a blank execute_request to pull
them out? Would it make more sense to split variable access into a separate
message type?

On Wed, Sep 14, 2011 at 4:49 PM, Thomas <reply@reply.github.com> wrote:
You were talking about using the user_variables & user_expressions fields
(perhaps with a different name) to access variables from the frontend
namespace. Does that mean that, for the common case of just accessing
variables, the frontend will have to send a blank execute_request to pull
them out? Would it make more sense to split variable access into a separate
message type?

If we keep the message structure the way it is, and simply add an
optional namespace field, then we can interpret the
variables/expression field as a way to request variables or
expressions from the given namespace.
Now, as @minrk noted, the advantage of retrieving user variables in
the same pass as the code was executed is that there's no need to make
an additional call, one that could potentially get queued behind
another request.
So there are advantages for requesting vars/expressions in the same call.
Let's recap for a minute the 'design landscape' of this problem, it
may help us think a little bit:
1. we need a way for frontends to retrieve data
(variables/expressions) they may want to use for information purposes,
such as constructing a prompt. It's not clear yet that this should be
data from the user_ns, a separate namespace, or both.
- currently this is provided by the user_variables and
user_expressions fields of the exeute request message.
2. there may be a case for having a separate namespace where frontends
can execute code in isolation from the user.
- this is where the idea of a separate frontend_ns comes in, by
extending the execute message with an optional field.
We can construct a cleaner prompt management system by using only the
current system, since the prompt templates can be evaluated as
user_expressions. The only issue here is that of collisions with user
variables, or things like %reset breaking the prompts, etc. In the
past we've actually had bugs reports of this nature, where calling
%reset would all of a sudden break a system where a custom prompt had
been implemented, because the prompt is currently evaluated in the
user's namespace.
OK, I need to switch gears now. Hopefully this outline will help us
clarify our ideas about this to find a good solution to the overall
problem...

On 15 September 2011 01:14, Fernando Perez < reply@reply.github.com>wrote:
Now, as @minrk noted, the advantage of retrieving user variables in
the same pass as the code was executed is that there's no need to make
an additional call, one that could potentially get queued behind
another request.

Except that with the proposed structure, we no longer get that. User code is
executed in the user namespaces, but then we want to retrieve variables for
the prompt from the frontend namespace, so we would have to do one
execute_request against each.

So there are advantages for requesting vars/expressions in the same call.
Let's recap for a minute the 'design landscape' of this problem, it
may help us think a little bit:
1. we need a way for frontends to retrieve data
(variables/expressions) they may want to use for information purposes,
such as constructing a prompt. It's not clear yet that this should be
data from the user_ns, a separate namespace, or both.
- currently this is provided by the user_variables and
user_expressions fields of the exeute request message.

I think we should look at what other purposes we're imagining this being
used for, besides prompts. E.g. I can imagine a GUI with a status bar
showing, for instance, the cwd (information it can't get purely within the
frontend). But then another frontend might change directory, and it would
ideally update without waiting for the user of frontend A to run the next
cell. So maybe some of this information (e.g. a successful cd) needs to be
published by the kernel, in addition to having a means to request it.

I think there are some good points here. If multiple frontends want to watch the value of a variable, then this data needs to be published on the iopub channel. There would need to be a way of registering variables that get published in this way after each execution. Also, there would need to be a way for those variables to become modified. For that the simplest thing is to use the user_ns, but you might want to compute some expression each time a variable is published (like os.getcwd()).

What you are describing does indeed sound quite useful, and would suggest something along the lines of a 'state change' message. frontends could register names or expressions to check, and a check_state() call is made at the end of each execute request, publishing its results. Of course, this could add arbitrarily expensive computations to every execute request.

I am also thinking about how we can design things to enable "Manipulate" style rich widgets to exist in the browser and receive updates as a computation progresses. This type of design might help in that context as well.

@takluyver, I was going to start testing this puppy but a conflict seems to have appeared between yesterday and today, sorry... The conflict is tiny and I can fix it here while testing, but you may want to rebase this guy.

Also note that the test suite doesn't pass anymore. Make sure to clear your .pyc files first to see the problem. It's the same reason as the conflict, simply that lib.pylabtools got moved to core.pylabtools.

Other than the above small problems, here's a proposal for moving forward with this one. The code looks solid, and it's indeed already an improvement and cleanup of the prompt machinery, so I don't see a reason for holding it back further, risking more (and possibly non-trivial) merge conflicts popping up.

The only thing I see missing is a good description of the changes in the docs, since the format of the special expressions for prompts has now changed as we move away from itpl.

As far as the larger discussion on the whole frontend_ns ideas, we should probably not rush into it. @ellisonbg's caution is often wisely guided, and I find @minrk's argument about non-atomicity of a separate request and the possibility of it being 'backed up' in a busy kernel very compelling.

The whole question of how to let multiple clients interact with a kernel regarding 'sideband' information (i.e. stuff beyond the raw execution of pure code blocks for the user) is a very tricky design problem, and we're just feeling our way through it. So it's probably best to go in small steps and see what works. We can keep thinking about how to serve those needs slowly, and for now the user_var,exp system will probably be sufficient in most cases. We might add some functionality to register variables protected of deletion in %reset so that people with special prompts can safely use %reset.

How does that sound? We could move forward with this (pending rebase and docs) and then we'll see how things work out in practice with the prompt system, and whether we can continue to work just with user_vars/exp or not. Given we have no actual use of even those components yet, it may be premature to extend the system even further.

Is the test suite passing for you? I'm having some very bizarre problems with the test suite right now on my box at work, but they are odd enough that I think it's my problem and nothing with the codebase. What do you get?