Category Archives: Programming

You may occasionally find yourself in a Django template attempting to distinguish if False is not equal to 0 (bool vs. int).

There are multiple ways to accomplish this, but if you don’t want to create a new template tag, and you don’t want to add logic to the view (because you’re a rebel), you can accomplish it using the “add” filter.

Take the variable “my_var”

If my_var is a ‘bool’ False:

my_var|add:"f" == "f"

If my_var is an ‘int’ 0:

my_var|add:"f" == ""

And all together now:

{% if my_var|add:"f" == "f" %}
I have a boolean False
{% else %}
I have a 0
{% endif %}

Keep in mind that this logic may vary depending what you’re comparing. It’s useful as is ONLY if you know you’re trying to distinguish between False and zero (0). If you can have values for my_var like “” or [] or other data types that have certain scenarios for equating with False, you’ll have to play around, but I personally had difficulty finding answers that address this that didn’t involve building logic outside of the template file itself.

This was a monster for me to track down, so I’m hoping it helps many others.

The issue I ran into was that when trying to read an Excel file using Python’s xlrd package that was generated by PHPExcel, the following error was generated:

File read error [row 1]: Workbook corruption: seen[2] == 4

In search of a fix, preferably one that didn’t assume PHPExcel was just buggy (which it may be, but that’s another topic), I perused dozens of online articles with the actual developer of xlrd (John Machin) commenting on threads from people with nearly identical issues, but all the solutions were just specific workarounds that didn’t seem to apply to me.

Somewhere, I caught the tiniest glimpse of the phrase “Compound File Binary.” At first it didn’t lead to much, but it ultimately led me here:

Note that “Workbook” stream might be specific to PHPExcel, but there is a list_dir() method on the ole object that will show you the streams available. There’s also decent documentation in the zip download from the above link.

For a project relating to Amazon, I found a need for an XSL template that could convert their XSD files–which are both numerous and meaty–into complete XML prototypes. This would be useful for two reasons.

First, it would generate a complete sample call for basically any API request for which the provide an XSD. Second, it would give me XML that I could programmatically convert to a self-generating form if necessary.

After digging around the web, to my surprise I was unable to find an XSL already made that accomplishes this, so I set about making my own.

The resulting XSL is included below.

Please note that while this appears (as of this writing) to be 100% functional for what I need on Amazon, it does not cover all possible XSD configurations, so may need to be modified for more exotic XSDs. Also, it’s possible (and quite likely) that the XML generated will not validate to the given XSD. This is because I had to find ways to pass value restriction meta data in a way that made sense in XML.

The two obvious points are nodes with enumerations in which the valid values are passed as <Value> subnodes and nodes with extendable attributes that had to be constructed in the way of other nodes. In both cases, a flag is set as an attribute in the parent node.

If you make enhancements or fixes or if you find problems, feel free to let me know, and I will revise this.

The XSL is free for anyone to use, modify, etc. without restraint, but I’d certainly be happy to hear about it if it helps you out.

<!– Node is referencing a base element – use that for the prototype either from this file or from any included files–>
<xsl:when test=”@ref”>
<xsl:variable name=”elementRef” select=”current()/@ref” />
<xsl:for-each select=”//xs:element[@name=$elementRef]”><xsl:call-template name=”element”/></xsl:for-each>
<xsl:for-each select=”//xs:include”>
<xsl:for-each select=”document(@schemaLocation)//xs:element[@name=$elementRef]”><xsl:call-template name=”element”/></xsl:for-each>
</xsl:for-each>
</xsl:when>

<!– Carry all attributes through other than name (which is used for the element nodeName) and type (which we have to handle cases for) –>
<xsl:copy-of select=”@*[local-name() != ‘name’ and local-name() != ‘type’]” />

<!– Bring in any annotations for the element as a “note” attribute –>
<xsl:if test=”./xs:annotation/xs:documentation”>
<xsl:attribute name=”note”><xsl:value-of select=”./xs:annotation/xs:documentation” /></xsl:attribute>
</xsl:if>

One of the worst parts of a web application can be the variability of mysql queries that get sent into your database. You can add indices, tweak hardware configurations, etc., but wouldn’t it be nice to simply kill any database query that takes longer than whatever you deem is “too long”?

Well, no, not any query. I would never want to kill a write — just a read-only query; in particular: search queries.

So, as it turns out, the hurdles for this are immense, and, because my solution uses PHP’s pcntl_fork() function, even my solution, while it works, it has to make assumptions and is not perfect.

That being said, it would seem easy enough for this to be built into PHP or for some similar mechanism to be built into MySQL: Execute this query but only if it takes less than n seconds. If not, kill it. This is not the case, however, so we’re left to our own cleverness.

There are hundreds of reasons why you would never want to do this, but I only need one reason to want to do it to try to implement it.

So here is my solution steps in techno-layman’s terms, followed by the necessary code:

Open a shared memory space so that we can pass the query results back to the parent from the child

Store process state information in a database

Fork, and execute the query in the child process

Keep time in the parent process

Kill the child process if it takes longer than n seconds

Return the results

Just to forwarn, I have tested this as proof of concept, but I am uncertain about the particulars of PHP’s shared memory and am not confident how reliable the shared memory implementation will in a production environment. I plan to try it out, but right now I’m just getting the info out there.

Just a little snippet of sample code. Sometimes you just need a random string generator, and on top of that, a random form to test a page.

Maybe I’m just keeping this for my own future reference, but maybe someone else out there could use it too. 🙂

The function generates a string containing numbers and letters only (it’s easily customizable to contain other chars). The form just creates inputs with random names from the string generator and random values from the string generator.

This is a very simple one, but one that can take a lot of google-search-query-nuance tweaking to find.

It’s easy to find the documentation to do PayPal Express Checkout, but to find that one little field where you send an alternate user as the recipient of the payments, well that’s downright impossible. It is not in the docs (at least not as of this writing 5-18-2011).

I have been an avid user of nano/pico since about 1999, and yes, many naysayers think it is crap for programming, but it works for me, and I like it.

That being said, one of the major issues I’ve had is that the Xorg select/paste always copies tab characters as the corresponding number of spaces. So, when I select text in one file, paste into another, I have to replace all the spaces with tabs.

Typically I paste into gedit first, do the replace there, then c/p into the file. This preserves the tabs.

(If you’re still wondering why I use nano, I just like having my editor accessible as long as I have server access. I never got used to vi, and nano is more than effective for me.)

I have always thought nano should be able to search and replace tabs and spaces, but I could never get it to work. Even without the gedit technique, I would typically just replace all double spaces to nothing, then manually insert the tabs. My workarounds, again, are generally sufficient. I have not NEEDED search and replace of tabs within nano, but today I decided I wanted to really find out if I could.

And it required some digging …

But, I finally found it: verbatim input!

Nano has a feature to disable character interpretation, and for one character, accept input literally.

To turn it on (again it’s for just the first character typed), hit alt-SHIFT-V (alt-V without shift may trigger x-windows menus), then just hit the tab key (it may or may not show a note that you’re in verbatim input mode).

You only need to do this in the search / replace prompts. Obviously, you can type tabs directly into the file.

So an example — let’s say you want to convert any instance of 8 spaces to a tab character.

One of the shortcuts I always keep in my gnome taskbar is a link to a gnome-terminal with 3 tabs, each in the websites base directory of my web development server.

It took my quite a long time to figure out how to get it to change to that directory then stay logged in, even though it is rather simple. It also took me a long time again recently because I reinstalled without backing that up. So, this post is mainly so I don’t have to do that again, but maybe it will help others too.

The trick is to make SSH behave like a terminal (with the oft-overlooked -t flag), then to execute a login bash shell using normal SSH command execution.

I have been heavily researching password storage lately, because I believe there are still lapses in internet security that keep it far behind what hackers are able to achieve, and, as a programmer, I want to keep such things as lock-tight as possible.

It, shamefully, just clicked with me the power of rainbow tables (for which I only learned the term in the last few days). For some background, any legitimate website or authentication system does not store your password in plain text. It uses a one-way encryption and stores that encryption. Then to check if you used the right password, it encrypts the one you send it and compares it against the stored encryption. This minimizes opportunities for a copy of your password to be easily accessible, decryptable, or readable, in any sense to someone who want to use it for malfeasance.

The trick with all one-way encryptions used for such “hashing” purposes is their resulting strings are fixed length. For example, sha1, which is probably the most widely used method on the web, creates alphanumeric (0-9, a-z not case sensitive) hexadecimal strings of length 40 (at least using the php sha1() method with which I am familiar — I’m not a sha1 expert). Because of this, there are possible collisions. In other words, although your password might encrypt to one string, there are effectively infinite other strings that would encrypt to that same string.

Before getting too uptight about this, just do the basic numbers. The sha1 encryption creates 40 character strings with 36 16 possible characters in each place, meaning there are 36^40 16^40 possible resulting encryptions. That number?

Is there even a word for that? Yes … 178.69 novemdecillion 1.46 quindecillion. (that is a big number)

So a hacker would build the corresponding rainbow table by finding exactly 1 string that converts to each of those 178,689 1,461,501 … hashes. Then, somehow, they get your encrypted password, and all they have to do is look it up in this table, and they have your password (or at least one that collides with yours’ hash).

It is very simple, but, fortunately, at this point, that rainbow table would not fit on all the hard drives in the world. In fact, it would take about 50 billion earths to find an equivalent number of atoms the earth is only composed of about 133 quindecillion atoms. (That’s one hash per 89 atoms — a human hair is about 10000 atoms thick).

But let’s abandon the seeming impossibility of the existence of this table temporarily. A common practice in secure storage of passwords these days is to “salt” the password before hashing and storing it. That is, you take the password, prepend (or append) a random string, then encrypt the resulting string. (More complex salts can exist, but the idea is that you modify the password in a predictable, repeatable way). What this does is force a hacker to build a corresponding rainbow table for EVERY password he/she wants to hack, because the unsalted rainbow table won’t work (I’ll leave this to you to figure out why if you don’t know at this point.)

The problem is, if you’re still using sha1 to encrypt the salted password, it’s no different than if you didn’t salt the password. If this complete sha1 rainbow table DID exist, the salt would serve absolutely no purpose, it would just shift the collision. Some infinite subset of strings would still map to this encrypted string.

As a side note, the rainbow tables that exist are simply compilations of common types of passwords, and, in many instances, they work, because people just don’t use strong enough passwords. This reduces the size of the tables to a usable form, so, in this case, the salting is imperative. I am discussing today the case where a complete sha1 rainbow table exists. I have said that it is pretty much impossible, but, from a purely theoretical standpoint, eventually, even the number 178.69 novemdecillion 1.46 quindecillion will become small as technology improves with time.

By the way, “strong,” for all intents and purposes of one-way hashing, just means long. Knowing a password is short drastically decreases the number of hashes that need to be generated for the table. Good advice? Use passwords 20 characters or longer — i.e. a sentence “This is my password, baby. Nothing personal, but don’t steal it.”

My question is, is it really effective to use the same encryption mechanism after you salt the password? Salting is little more than an obfuscation in the presence of a complete rainbow table. In the face of the assumed existence of this rainbow table, and ignoring that most passwords aren’t truly “random,” there is exactly no difference between “salt and hash” and simply “hash.” This just leads me to the conclusion that the inherent flaw in one way hashes is that they create strings of finite length, but that is exactly why they are useful.

Perhaps the only purpose of this post was to legitimately use the word novemdecillion quindecillion more than once. I dunno, but I would certainly appreciate expert commentary on the topic.

Edit: After initially writing this, I realized that sha1 created hex strings, not alphanumeric. This changed the numbers, but fortunately not the article. I decided to leave the old numbers in strikethrough because, well, novemdecillion is a cool freaking word.

If you’re a web developer and happen to develop software for use in e-commerce, chances are, somewhere along the lines you’ll need or want to integrate with the big 4 shippers’ (UPS, USPS, FedEx, DHL) APIs. You’ll find right off the bat that they all offer rather robust APIs, so your options are sufficient.

Then you’ll get to programming and realize that the documentation is pretty crappy, but specifically I want to address the idiosyncrasies of the USPS “test” environment. Effectively, what USPS means when they say “test” is not a test of robustness of your application, but simply whether or not your application can build a sample request (an EXACT sample request), and send it to their server. Yeah — it’s like asking a math teacher to write the numbers 1 to 30 on a sheet of paper (in order) before he/she can get hired.

The problem is, the USPS docs don’t tell you this, nor do they show you the sample request. So, for others who are about to embark on a few hour journey finding these details on Google (or worse, emailing USPS directly …eeek) I’m going to sum up a few facts here.

The most laborious for me so far is the one I already mentioned above. For a rate request, the docs show you a RateV3Request, but in testing you can only use a RateV2 request (which does not support package dimensions). Also, you must use the zip codes 10022 and 20008 for origination and zip, as well as 10 lbs. 5 oz. for the weight, and “LARGE” for the size. Everything else (LAUGH) you have leeway with.

If you don’t use these exact values, you’ll get responses like “Please enter a valid zip code for the sender” (which of course makes you think you wrote the XML incorrectly) or “The package size must be ‘Regular’, ‘Large’, or ‘Oversize.'” (even though you have “regular” quite clearly in the request.

The advice is to get to production as soon as possible, though why USPS would design things this way is beyond me, but them’s the cards, you gotta play ’em.

I will add more here as I find them obstaclicious enough (yeah I just made up that word).

Amendment 1: I should add that the issues about the documentation not mentioning the “canned” requests is only applicable to the PDF documentation. It is stated quite clearly in the HTML versions. Go figure …