Post navigation

When x is smaller than 1 return 1 / x, otherwise return x. Here are a few example values:

x

abs_ratio(x)

0.5

2

2

2

0.2

5

5

5

And a graph:

Another spelling for the same operator would take 2 positive numbers and give their absolute ratio:

And a graph:

Use case examples

Music and audio – an octave of a frequency F is 2F. More generally a harmony of a frequency F is N*F where N is a natural number. To decide if one frequency is a harmony of another we just need to get their absolute ratio and see if it’s whole. E.g. if abs_ratio(F1, F2) == 2 they’re octaves. If abs_ratio(F1, F2) is whole – they’re harmonies.

Computer vision – to match shapes that have similar dimensions e.g. their width is only 10% larger or smaller. We don’t care which is the bigger or smaller, we just want to know if 0.91 < W1 / W2 < 1.1 which may be easier to pronounce as abs_ratio(W1, W2) < 1.1

Real life – when we see 2 comparable objects we’re more likely to say one is “three times the other” vs “one third the other”. Either way in our brains both statements mean the same concept. We think in absolute ratios.

General case – When you want to know if X is K times bigger than Y or vice versa and you don’t care which is the bigger one.

Interesting Properties

abs_ratio(Y / X) == abs_ratio(X / Y)

log(abs_ratio(X)) = abs(log(X))

log(abs_ratio(Y / X)) = abs(log(Y / X)) = abs(log(Y) – log(X))

You can see from the above that absolute ratio is somewhat of an absolute value for log-space.

What’s next for absolute ratio

I’d love to hear more use cases and relevant contexts.

What would be the written symbol or notation?

How can we get this operator famous enough to be of use to mainstream minds?

About negative numbers and zero – right now that’s undefined as I don’t see a use case for that domain.

Appendix – The hardships

This was overly hard to do – first of all NSFW links gave me the “are you over 18?” prompt which for some reason I wasn’t able to solve by cookies. I eventually turned to the mobile version of the site (append “.compact”) to avoid the prompts completely. Also, matplotlib and networkx aren’t that fun for drawing graphs it seems. To visualize and output the graph I eventually used gephi which was somewhat easy although has it’s clunkiness baggage.

Us python fanboys like to think of python as similar to English and thus more readable. Let’s examine a simple piece of code:

for item in big_list:
if item.cost > 5:
continue
item.purchase()

For our discussion there are only 3 kinds of people:

People who have never seen a line of code in their life.

Have programmed in other languages but have never seen python.

Python programmers.

We’ll dabble between the first 2 groups and how they parse the above. Let’s try to forget what we know about python or programming and read that in English:

“for item in big_list” – either we’re talking about doing something for a specific item in a big_list or we’re talking about every single item. Ambiguous but the first option doesn’t really make sense so that’s fine.

“if item.cost > 5” – non-programmers are going to talk about the period being in a strange place, but programmers will know exactly what’s up.

“continue” – That’s fine, keep going. English speakers are going to get the completely wrong idea. As programmers we’ve grown used to this convention though its meaning in English is very specifically equivalent to what pythonistas call “pass” or “nop” in assembly. We really should have called this “skip” or something.

“item.purchase()” – non-programmers are going to ask about the period and the parentheses but the rest grok that easily.

So I’m pretty sure this isn’t English. But it’s fairly readable for a programmer. I believe programmers of any of the top 8 languages on the TIOBE index can understand simple python. I definitely can’t say the same for Lisp and Haskell. Not that there’s anything wrong with Lisp/Haskell, these languages have specialized syntax for their honorable reasons.

Continue is a silly word, what about iterator labels?

Let’s say I want to break out of an outer loop from a nested loop, eg:

for item in big_list:
for review in item.reviews:
if review < 3.0:
# next item or next review?
continue
if review > 9.0:
# stop reading reviews or stop looking for items?
break

Java supports specific breaks and continues by adding labels to the for loops but I think we can do better. How about this:

items_gen = (i for i in big_list)
for item in items_gen:
for review in item.reviews:
if review < 3.0:
items_gen.continue()
if review > 9.0:
items_gen.break()

But how can that even be possible you may ask? Well, nowadays it isn’t but maybe one day if python-ideas like this idea we can have nice things. Here’s how I thought it could work: a for-loop on a generator can theoretically look like this:

So every generator could have a method which throws its relevant exception and we could write specific breaks and continues. Or if you prefer a different spelling could be “break from mygen” or “continue from mygen” as continue and break aren’t allowed as method names normally.

I think this could be nice. Although many times I found myself using nested loops I actually preferred to break the monster into 2 functions with one loop each. That way I could use the return value to do whatever I need in the outer loop (break/continue/etc). So perhaps it’s a good thing the language doesn’t help me build monstrosity’s and forces me to flatten my code. I wonder.

Drawing inspiration from this blog post on title virality I wanted to investigate what makes these top 10,000 titles the best of their breed. Which are the best superlatives? Who/what’s the most popular subject? Let’s start with some statistics:

On Feb. 03, 14:10:45 (UTC) the all-time top 10,000 submissions on reddit (/r/all) had a total of 82,751,429 upvotes and 62,655,532 downvotes (56.9% liked it).

Now I have a GAE app that sends 70-80 emails per day where the free limit is 100. I’d gladly switch over to the paid side of GAE just to be sure that if it ever passes the 100 mark I don’t have any failed email requests but the price of that is 9$ per month. GAE is extremely expensive for apps that just barely brush the end of their free quotas. In order to actually use the $9 per month minimum I’d have to send out 3000 emails per day (at $0.0001 per email).

I don’t know if the free quota on email recipients is really low or if sending out an email is extremely cheap. Either way, GAE expects me to scale from 100 to 3000 while paying the price of 3000. Who knows if I’ll ever even reach that mark?

If google keeps with this plan, I’m probably never going to start another GAE app that has a chance to grow. Every time I have a chance of hitting the quota limits I have 2 choices:

Pay google and be screwed over for an indefinite amount of time until I reach the next landmark.

Migrate to a cheaper shared hosting option until I reach the next landmark.

Thanks, but no thanks. That’s the GAE glass ceiling.

Appendix

Other than this problem I do like GAE. It’s a shame I have to leave it.

I’ve made about 11 small python GAE apps. Only 2 of which ever reached the aforementioned glass ceiling.

This issue shouldn’t bother you if your app is already big enough to cost more than $9.

A proposed solution: Google takes $9 of credit at a time from your google wallet and eats quotas out of that. When the $9 run out, it bills another 9. Sounds reasonable and “don’t be evil” to me. Another thing that could be nice would be to allow multiple paid apps to feed from the same budget.

Nowadays I work for a medical device company where in a medical test the big indicators of success are specificity and sensitivity. Every medical test strives to reach 100% in both criteria. Imagine my surprise today when I found out that other fields use different metrics for the exact same problem. To analyze this I present to you the confusion matrix:

Confusion Matrix

E.g. we have a pregnancy test that classifies people as pregnant (positive) or not pregnant (negative).

True positive – a person we told is pregnant that really was.

True negative – a person we told is not pregnant, and really wasn’t.

False negative – a person we told is not pregnant, though they really were. Ooops.

False positive – a person we told is pregnant, though they weren’t. Oh snap.

Standardized equations

Equations explained

Sensitivity/recall – how good a test is at detecting the positives. A test can cheat and maximize this by always returning “positive”.

Specificity – how good a test is at avoiding false alarms. A test can cheat and maximize this by always returning “negative”.

Precision – how many of the positively classified were relevant. A test can cheat and maximize this by only returning positive on one result it’s most confident in.

The cheating is resolved by looking at both relevant metrics instead of just one. E.g. the cheating 100% sensitivity that always says “positive” has 0% specificity.

More ways to cheat

A Specificity buff – let’s continue with our pregnancy test where our experiments resulted in the following confusion matrix:

8

2

10

80

Our specificity is only 88% and we need 97% for our FDA approval. We can tell our patients to run the test twice and only double positives count (eg two red lines) so we suddenly have 98.7% specificity. Magic. This would only be kosher if the test results are proven as independent. Most tests are probably not as such (eg blood parasite tests that are triggered by antibodies may repeatedly give false positives from the same patient).

A less ethical (though IANAL) approach would be to add 300 men to our pregnancy test experiment. Of course, part of our test is to ask “are you male?” and mark these patients as “not pregnant”. Thus we get a lot of easy true negatives and this is the resulting confusion matrix:

8

2

10

380

Voila! 97.4% specificity with a single test. Have fun trying to get that FDA approval though, I doubt they’ll overlook the 300 red herrings.

What does it mean, who won?

Finally the punchline:

A search engine only cares about the results it shows you. Are they relevant (tp) or are they spam (fp)? Did it miss any relevant results (fn)? The ocean of ignored (tn) results shouldn’t affect how good or bad a search algorithm is. That’s why true negatives can be ignored.

A doctor can tell a patient if they’re pregnant or not or if they have cancer. Each decision may have grave consequences and thus true negatives are crucial. That’s why all the cells in the confusion matrix must be taken into account.

Earlier in December I was approached by Chris McDonough with a reddit pm asking if I could or would implement some kind of behavior regarding a “Python 2 only” classifier on the wall of shame. After some aggressive googling I found the original discussion in catalog-sig. The idea was to add a classifier that signified “the authors have no current intention to port this code to Python 3”. By declaring such an intent, Chris explained, a python package should be erased from the wall of shame. Not that I completely understood this intuition but still I tried to somehow apply myself to the effort of improving the WOS. So here’s what’s new:

Packages with the “Programming Language :: Python :: 2 :: Only” trove classifier will have a lock next to their package with a mouse over explaining their intent.

Packages that have an equivalent py3k package are now not erased from the wall but rather show a link to the equivalent package. This rightfully boosts the compatibles count by 4. Note that packages that would doubly boost the count are still erased (eg Jinja is erased because Jinja2 is in the top 200).

Packages that are python 3 compatible but lack the trove classifier won’t stay red if brought to my attention. I’ve always stated the WOS can only be as good as pypi, not better. Hoping that in time PyPI would become more accurate, this move saddens me a bit. To keep a bit of the spirit the artificially green packages have a red triangle signifying the maintainer’s lack of trove classifiers (again with a relevant mouse over).

The WOS is now written for python 2.7 and migrated to the HRD, woohoo!

Please do contact me if there are any more inaccuracies or mistakes. I’m reachable at ubershmekel at gmail and by comments on this blog.

Ps, we’re at 57/200, so maybe by this time next year we can have that Python 3 Wall of Superpowers party! Amen to that…