In this small post we'll investigate (quickly) the difference between slices (which are part of the Python standard library), and numpy array, and how these could be used for indexing. First let's create a matrix containing integers so that element at index i,j has value 10i+j for convenience.

In [1]:

importnumpyasnpfromcopyimportcopy

Let's create a single row, that is to say a matrix or height 1 and width number of element.
We'll use -1 in reshape to mean "whatever is necessary". for 2d matrices and tensor it's not super useful, but for higher dimension object, it can be quite conveneient.

In [2]:

X=np.arange(0,10).reshape(1,-1)X

Out[2]:

array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

now a column, same trick.

In [3]:

Y=(10*np.arange(0,8).reshape(-1,1))Y

Out[3]:

array([[ 0],
[10],
[20],
[30],
[40],
[50],
[60],
[70]])

By summing, and the rules of "broadcasting", we get a nice rectangular matrix.

Quick intro about slicing. You have likely use it before if you've encoutered the objet[12:34]objet[42:96:3] notation. The X:Y:Z part is a slice. This way of writing a slice is allowed only in between square bracket for indexing.

X, Y and Z are optional and default to whatever is convenient, so ::3 (every three), :7 and :7: (until 7), : and :: (everything) are valid slices.

A slice is an efficent object that (usually) represent "From X to Y by Every Z", it is not limitted to numbers.

You can construct a slice using the slice builtin, this is (sometime) convenient, and use it in place of x:y:z

In [7]:

sl=slice('cow','phone','traffic jam')

In [8]:

arr[sl]

From `cow` to `phone` every `traffic jam`.

In multidimentional arrays, slice of 0 or 1 width, can be used to not drop dimensions, when comparing them to scalars.

In [9]:

M[:,3]# third column, now a vector.

Out[9]:

array([ 3, 13, 23, 33, 43, 53, 63, 73])

In [10]:

M[:,3:4]# now a N,1 matrix.

Out[10]:

array([[ 3],
[13],
[23],
[33],
[43],
[53],
[63],
[73]])

This is convenient when indices represent various quatities, for example an athmospheric ensemble when dimension 1 is latitude, 2: longitude, 3: height, 4: temperature, 5: pressure, and you want to focus on height==0, without having to shift temprature index from 4 to 3, pressure from 5 to 4...

Zero-width slices are mostly used to simplify algorythmes to avoid having to check for edge cases.

Array are more or less what you've seem in other languages. Finite Sequences of discrete values

In [19]:

ar=np.arange(4,7)ar

Out[19]:

array([4, 5, 6])

When you slice with array, the elements of each arrays will be taken together.

In [20]:

M[ar,ar]

Out[20]:

array([44, 55, 66])

We now get a partial diagonal in out matrix. It does not have to be a diaonal:

In [21]:

M[ar,ar+1]

Out[21]:

array([45, 56, 67])

The result of this operation is a 1 dimentional array (which is a view – when possible – on the initial matrix memory),
in the same way as we flipped the sign of the largest block in the previous section, we'll try indexing with the same value:

Good Code is Deleted Code

The only code without bugs is no
code. And the less code you have,
the less mental load as well. This is why it is often a pleasure to delete a lot
of code.

In IPython we recently bumped the version number to 7.0 and dropped support for
Python 3.3. This was the
occasion to clean, and remove a lots of code that insure compatibility with
multiple minor Python version, and while it may seem easy it required a lot of
thinking ahead of time to make the process simple.

Finding what can (and should be deleted)

The hardest part is not deleting the code itself, but finding what can be
deleted. In many compiled languages, the compiler may help you, but with Python
it can be quite tougher, and some of Python usual practices make it harder.

Here are a few tips on how to prepare your code (when you write it) for
deletion.

EAFP vs LBYL

Python tend to be more on the Easier to ask Forgiveness than Permission, than
Look Before You Leap. It is thus common to see code like:

In this particular case though, why do we use the try/except ? Unless there is
a comment attached, it is hard guess that from imp import reload was
deprecated since python 3.4, the comment can easily get out of sync with the
actual code.

It is now obvious which code should be removed and when. You can see that as
"Explicit is better than implicit" rule.

Deprecated code

Removing legacy deprecated code is also always a challenge, as you may be
worried of other library might be still relying deprecation. To help with that
let's see how we can improve typical deprecation, here is a typical deprecated
method from IPython::

With this new snippet I'm confident it's been 3 versions and I am more willing
to delete. This also helps downstream libraries to know whether they need
conditional code or now. I'm still unsure downstream maintainer have updated
their code. Let's add a stacklevel (to help them find where the deprecated
function is used, and add more informations about how they can replace code uses
this function:

def unicode_std_stream(stream='stdout'):
"""DEPRECATED, moved to nbconvert.utils.io"""
warn("IPython.utils.io.unicode_std_stream has moved to nbconvert.utils.io since IPython 4.0", DeprecationWarning, stacklevel=2)
...

Well with this information I'm even more confident downstream maintainer have
updated their code. They have an actionable item: replace one import for
another, and are more likely to do that, than dig for 1h in history to figure
out what to do.

TLDR

Be explicit in your conditional import that depends on version of underlying
python or library.

take time to write good deprecation warning with :

Stacklevel (=2 most of the time)

Since When it was deprecated.

What should replace deprecated call for consumers.

The time you put in these will greatly help your downstream consumers, and
benefit you later to simplify getting rid of lots of code easily.

Signing Commit on Tags on GitHub

I've recently set-up keybase and integrated my public key
with git to be able to sign commits.

I decided to not automatically sign, as auto-signing would allow any attacker
that takes control of my machine to create signed commit. The git Merkle
tree of git still insure repos are
not tampered with, as long as you issue $ git fsck --full on a repo or $ git
config --global transfer.fsckobjects true once and forget it.

Using $ git log --show-signatur you can now check that commits (and tags) are
correctly signed. Be careful though, correct signature does not mean trusted,
and if you have a PGP key set; GitHub will helpfully signed the commit you make
on their platform with their key.

learn more

As usual the git documentation
has more to say about this. And signing is not really useful without checking
the integrity of Git history, so please set $ git config --global transfer.fsckobjects true as well !

I almost just had to use Mozilla WebExt Shim for Chrome, downgrade a few artwork
from SVG to PNG (like really??) and upload all by hand, like really again ?

The Chrome Store has way more fields and it is quite complicated –
compared to the Mozilla Addons website at least – It is sometime confusing
whether fields are optional or not, or if they are per addons on per
developer ?

It does though allow you to upload more art that will be show in a store
which that looks nicer.

Still I had to pay to go through a really ugly crappy website and had to pay
for it to publish a free extension. So Mozilla you win this.

Please rate the extension, or it may not appear in search results for others AFAICT:

Back to Firefox.

I've been using Chrome for a couple of years now, but heard a lot of good stuff
about Rust and all the good stuff it has
done or Firefox.
Ok that's a bit of marketing but it got me to retry Firefox (Nightly please),
and except for my password manager which took some week to update to the new
Firefox API, I rapidly barely used Chrome.

MyBinder.org

I'm also spending more and more time working with the JupyterHub team on
Binder, and see more and more developer adding binder
badges to their repository. Mid of last week I though:

You know what's not optimal? It's painful to browse repositories that don't
have the binder badge on MyBinder.org, also sometime you have to find the
badge which is at the bottom of the readme.

You know what would be great to fix that ? A button in the toolbar doing the
work for me.

Writing the extension

As I know Mozilla (which has a not so great new
design BTW, but
personal opinion) cares about making standard and things simple for their users,
I though I would have a look at the new
WebExtension.

And 7 days later, after a couple of 30 minutes break, I present to you a
staggering 27 lines (including 7 line business logic) extension that does that:

The hardest part was finding the API and learning how to package and set the
icons correctly. There are still plenty of missing
features
and really low hanging
fruits,
even if you have never written an extension before (hey it's my first and I
averaged 1-useful line/day writing it...).

General Feeling

Remember that I'm new to that and started a week ago.

The Mozilla docs are good but highly varying in quality, it feels (and is) a
wiki. More opinionated tutorials might have been less confusing. A lot of
statements are correct but not quite, and leaving the choice too users is just
confusing. For example : you can use SVG or PNG icons, which I did, but then
some area don't like SVG (addons.mozilla.org), and the WebExtensions should work
on Chrome, but Chrome requires PNG. Telling me that I could use SVG was not
useful.

The review of addons is blazingly fast (7min from first submissions to Human
approved). Apple could learn from that if what I've heard here and there is
correct..

The submission process has way to many manual steps, I'm ok for first
submission, but updates, really ? I want to be able to fill-in all the
information ahead of time (or generate them) and then have a cli to submit
things. I hate filling forms online.

The first submission even if marked Beta will not be considered beta. So
basically I published a 0.1.0beta1, then 0.1.0beta2 which did not trigger
automatic update because the beta1 was not considered beta. Super confusing. I
could "force" to see the beta3 page but with a warning that beta3 was an older
version than beta1 ? What ?

There is still this feeling that this last 1% of polishing the process has not
been done (That's usually where Apple is know to shine). For example your store
icon will be resized to 64x64 (px) and display in a 64x64 (px) square but I have
a retina screen ! So even if I submitted a 128x128 now my icon looks blurry !
WTF !

You can contribute

As I said earlier there is a lot of low hanging fruits ! I went through the
process of figuring things out, so that you can contribute easily:

This is a small essay to show how one can make a better use of the display protocol. All you will see in this blog post has been available for a couple of years but noone really built on top of this.

It is usually know that the IPython rich display mechanism allow libraries authors to define rich representation for
their objects. You may have seen it in SymPy, which make extensive use of the latex representation, and Pandas which dataframes have nice HTML view.

What I'm going to show below, is that one is not limited to these – you can alter the representation of any existing object without modifying its source – and that this can be used to alter the view of containers, with the example of lists, to make things easy to read.

This section is just a reminder of how one can change define representation for object which source code is under your
control. When defining a class, the code author needs to define a number of methods which should return the (data, metadata) pair for a given object mimetype. If no metadata is necesary, these can be ommited. For some common representations short methods name ara availables. These methond can be recognized as they all follow the following pattern _repr_*_(self). That is to say, an underscore, followed by repr followed by an underscore. The star * need to be replaced by a lowercase identifier often refering to a short human redable description of the format (e.g.: png , html, pretty, ...), ad finish by a single underscore. We note that unlike the python __repr__ (pronouced "Dunder rep-er" which starts and ends wid two underscore, the "Rich reprs" or "Reprs-stars" start and end with a single underscore.

Here is the class definition of a simple object that implements three of the rich representation methods:

"text/html" via the _repr_html_ method

"text/latex" via the _repr_latex_ method

"text/markdown" via the _repr_markdown method

None of these methonds return a tuple, thus IPython will infer that there is no metadata associated.

The "text/plain" mimetype representation is provided by the classical Python's __repr__(self).

All the mimetypes representation will be sent to the frontend (in many cases the notebook web interface), and the richer one will be picked and displayed to the the user. All representations are stored in the notebook document (on disk) and this can be choosen from when the document is later reopened – even with no kernel attached – or converted to another format.

As stated in teh introduction, you do not need to have control over an object source code to change its representation. Still it is often a more convenient process. AS an example we will build a Container for image thumbnails and see how we can use the code written for this custom container to apply it to generic Python containers like lists.

As a visual example we'll use Orly Parody books covers, in particular a small resolution of some of them so llimit the amount of data we'll be working with.

in the above i've used an IPython specific syntax (!ls) ton conveniently extract all the files with a png extension (*.png) in the current working directory, and assign this to teh names variable.

That's cute, but, for images, not really usefull. We know we can display images in the Jupyter notebook when using the IPython kernel, for that we can use the Image class situated in the IPython.display submodule. We can construct such object simply by passing the filename. Image does already provide a rich representation:

In [5]:

fromIPython.displayimportImage

In [6]:

im=Image(names[0])im

Out[6]:

The raw data from the image file is available via the .data attribute:

We encode the data from bytes to base64 (newline separated), and strip the newlines. We format that into an Html template – with some inline style – and set the source (src to be this base64 encoded string). We can check that this display correctly by wrapping the all thing in an HTML object that provide a conveninent _repr_html_.

In [10]:

HTML(tag_from_data(im.data))

Out[10]:

Now we can create our own subclass, hich take a list of images and contruct and HTML representation for each of these, then join them together. We define and define a _repr_html_, that wrap the all in a paragraph tag, and add a comma between each image: