Technology

While auditing some new changes to an Open Source project I often use, I wrote a test-suite to surface edge cases and discovered something I never noticed before: the ASCII Control Characters made it safely across the entire stack as-is. It didn’t raise any alarms or trigger escaping from Form controls, User Generated Content Sanitizers, or HTML Safe Markup rendering.

Form control systems generally only trigger an error with ASCII controls if the input validator is built with a Regular Expression that allows specific characters/ranges. Most just check to see valid ASCII characters. ASCII control characters do a lot of useful things like providing “newlines” and “tabs”. Most are irrelevant, but one is particularly problematic — the backspace character. “Backspaces” are problematic because Web Browsers, Text Editors and Developer IDEs typically consider most ASCII control characters to be non-printing or “invisible” and don’t render them onscreen, but they are still part of the “document”. If you copy a document or document part to your clipboard, the “invisible” characters copy too. If you then paste them into an interpreter window, they will be interpreted/execute in real-time — allowing a malicious user to carefully construct an evil payload.

An Example

Consider this trivial example with Python. It looks like we’re loading the yosemite library, right?:

import yosemite

Wrong. Pasted into a Python interpreter, this will appear:

import os

How? There are ASCII backspace control characters after certain letters. When pasted into an interpreter, they immediately “activate”. Using the escape sequences, the actual string above looks like:

import y\bose\bm\bi\bt\be

If you want to try this at home, you can generate a payload with this line:

open('payload.txt', 'w').write("import y\bose\bm\bi\bt\be ")

Given a few lines of text and some creativity, it is not hard to construct “hidden” code that deletes entire drives or does other nefarious activities.

It’s an Edge Case

This really only affects situations meeting both of the following two requirements:

People retrieve code published via User Generated Content (or potentially main site articles)

The code is copy/pasted directly into an interpreter or terminal (I’ve test this against Python, Ruby, and Node.JS)

The target audiences for this type of vulnerability is developers and “technology organizations”. The potential vectors for bad payloads are:

What is and isn’t affected

I tested dozens of “internet scale” websites, only two filtered out the backspace control character: StackOverflow and Twitter. WordPress.com filters out the backspace, but WordPress open source posts do not.

I focused my search on the types of sites that people would copy/paste code from: coding blogs, “engineering team” notes, discussion software, bugtrackers, etc. Everything but the two sites mentioned above failed.

Bad payloads were also accepted in:

Facebook

Multiple Pastebin and ‘fiddle’ services

Most Python/Ruby/PHP libraries allowed bad payloads through Form validation or UGC sanitization.

Did I file reports with anyone?

I previously filed security reports with a handful of projects and websites. I also offered patches to escape the backspace issue with some projects as well.

I decided to post this publicly because:

Open Source projects overwhelmingly believed the issue should be addressed by another component.

Commercial Websites believed this is a social engineering hack and dismissed it.

A long, long, long time ago I started using interpreter optimizations to help organize code. If a block of code is within a constant (or expression of constants) that evaluate to False, then the block (or line) is optimized away.

if (false){ # this is optimized away }

Perl was very generous, and would allow for constants, true/false, and operations between the two to work.

Thankfully, Python has some optimizations available via the PYTHONOPTIMIZE environment variable (which can also be adjusted via the -o and -oo flags). They control the __debug__ constant, which is True (by default) or False (when optimizations are enabled), and will omit documentation strings if the level is 2 (or oo) or higher. Using these flags in a production environment allows for very verbose documentation and extra debugging logic in a development environment.

Unfortunately, I implemented this incorrectly in some places. To fine-tune some development debugging routines I had some lines like this:

if __debug__ and DEBUG_CACHE:
pass

Python’s interpreter doesn’t optimize that away if DEBUG_CACHE is equal to False, or even if it IS False (i tested under 2.7 and 3.5). I should have realized this (or at least tested for it). I didn’t notice this until I profiled my app and saw a bunch of additional statistics logging that should not have been compiled.

The correct way to write the above is:

if __debug__:
if DEBUG_CACHE:
pass

Here is a quick test

trying running it with optimizations turned on:

export PYTHONOPTIMIZE=2
python test.py

and with them off

export PYTHONOPTIMIZE
python test.py

As far as the interpreter is concerned during the optimization phase, __debug__(False) and False is True.

The (forthcoming) Aptise platform features a web-indexer that tabulates a lot of statistical data each day for domains across the global internet.

While we don’t currently need this data, we may need it in the future. Keeping it in the “live” database is not really a good idea – it’s never used and just becomes a growing stack of general crap that automatically gets backed-up and archived every day as part of standard operations. It’s best to keep production lean and move this unnecessary data out-of-sight and out-of-mind.

An example of this type of data is a daily calculation on the relevance of each given topic to each domain that is tracked. We’re primarily concerned with conveying the current relevance, but in the future we will address historical trends.

Let’s look at the basic storage requirements of this as a PostgreSQL table:

This structure conceptually stores the same data, but instead of repeating the date in every row, we record it only once within the table’s name. This simple shift will lead to a nearly 40% reduction in size.

In our use-case, we don’t want to keep this in PostgreSQL because the extra data complicates automated backups and storage. Even if we wanted this data live, having it within hundreds of tables is a bit much overkill. So for now, we’re going to export the data from a single date into a new file.

I skipped a lot of steps above because I do this in Python — for reasons I’m about to explain.

As a raw csv file, my date-specific table is still pretty large at 7556804 bytes — so let’s consider ways to compress it:

Using standard zip compression, we can drop that down to 2985257 bytes. That’s ok, but not very good. If we use xz compression, it drops to a slightly better 2362719.

We’ve already compressed the data to 40% the original size by eliminating the date column, so these numbers are a pretty decent overall improvement — but considering the type of data we’re storing, the compression is just not very good. We’ve got to do better — and we can.

We can do much better and actually it’s pretty easy (!). All we need to do is understand compression algorithms a bit.

Generally speaking, compression algorithms look for repeating patterns. When we pulled the data out of PostgreSQL, we just had random numbers.

We can help the compression algorithm do its job by giving it better input. One way is through sorting:

As a raw csv, this sorted date-specific table is still the same exact 7556804 bytes.

Look what happens when we try to compress it:

Using standard zip compression, we can drop that down to 1867502 bytes. That’s pretty good – we’re at 25.7% the size of the raw file AND it’s 60% the size of the non-sorted zip. That is a huge difference! If we use xz compression, we drop down to 1280996 bytes. That’s even better at 17%.

17% compression is honestly great, and remember — this is compressing the data that is already 40% smaller because we shifted the date column out. If we consider what the filesize with the date column would be, we’re actually at 10% compression. Wonderful.

I’m pretty happy with these numbers, but we can still do better — and without much more work.

As I said above, compression software looks for patterns. Although the sorting helped, we’re still a bit hindered because our data is in a “row” storage format. Consider this example:

As a raw csv, my transposed file is still the exact same size at 7556804 bytes.

However, if I zip the file – it drops down to 1425585 bytes.

And if I use xz compression… I’m now down to 804960 bytes.

This is a HUGE savings without much work.

The raw data in postgres was probably about 12594673 bytes (based on the savings, the file was deleted).
Stripping out the date information and storing it in the filename generated a 7556804 bytes csv file – a 60% savings.
Without thinking about compression, just lazily “zipping” the file created a file 2985257 bytes.
But when we thought about compression: we sorted the input, transposed that data into a column store; and applied xz compression; we resulted in a filesize of 804960 bytes – 10.7% of the csv size and 6.4% of the estimated size in PostgreSQL.

This considerably smaller amount of data can not be archived onto something like Amazon’s Glacier and worried about at a later date.

This may seem like a trivial amount of data to worry about, but keep in mind that this is a DAILY snapshot, and one of several tables. At 12MB a day in PostgreSQL, one year of data takes over 4GB of space on a system that is treated for high-priority data backups. This strategy turns that year of snapshots into under 300MB of information that can be warehoused on 3rd party systems. This saves us a lot of time and even more money! In our situation, this strategy is applied to multiple tables. Most importantly, the benefits cascade across our entire system as we free up space and resources.

These numbers could be improved upon further by finding an optimal sort order, or even using custom compression algorithms (such as storing the deltas between columns, then compressing). This was written to illustrate a quick way to easily optimize archived data storage.

The results:

Format

Compression

Sorted?

Size

% csv+date

csv+date

12594673

100

row csv

7556804

60

row csv

zip

2985257

23.7

row csv

xz

2362719

18.8

row csv

Yes

7556804

60

row csv

zip

Yes

1867502

14.8

row csv

xz

Yes

1280996

10.2

col csv

Yes

7556804

60

col csv

zip

Yes

1425585

11.3

col csv

xz

Yes

804960

6.4

Note: I purposefully wrote out the ASC on the sql sorting above, because the sort order (and column order) does actually factor into the compression ratio. On my dataset, this particular column and sort order worked the best — but that could change based on the underlying data.

I remembered going though this before, but couldn’t find my notes. I ended up fixing it before finding my notes ( which point to this StackOverflow question http://stackoverflow.com/questions/2088569/how-do-i-force-python-to-be-32-bit-on-snow-leopard-and-other-32-bit-64-bit-quest ) , but I want to share this with others.

psycopg2 was showing compilation errors in relation to some PostgreSQL libraries

Last week I found a Security flaw on Dreamhost caused by the User Experience on their control panel. I couldn’t find a security email, so I posted a message on Twitter. Their Customer Support team reached out and assured me that an email response would be addressed. Six days later I’ve heard nothing from them, so I feel forced to do a public disclosure.

I was hoping that they would do the responsible thing, and immediately fix this issue.

## The issue:

If you create a Subversion repository, there is a checkbox option to add on a “Trac” interface – which is a really great feature, as it can be a pain to set up on their servers yourself (something I’ve usually done in the past).

The exact details of how the “one-click” Trac install works aren’t noted though, and the integration doesnt “work as you would probably expect” from the User Experience path.

If you had previous experience with Trac, and you were to create a “Private” SVN repository on Dreamhost – one that limits access to a set of username/passwords – you would probably assume that access to the Trac instance is handled by the same credentials as the SVN instance, as Trac is tightly integrated into Subversion.

If you had no experience with Trac, you would probably be oblivious to the fact that Trac has it’s own permissions system, and assume your repository is secured from the option above.

The “one click” Trac install from Dreamhost is entirely unsecured – the immediate result of checking the box to enable Trac on a “private” repository, is that you inherently are publicly publishing that repo from within the Trac browser.

For example, if you were to install a private subversion and one-click Trac install onto a domain like this:

my.domain.com/svn
my.domain.com/trac

The /svn source would be private however it would be publicly available under /trac/browser due to the default one-click install settings.

Here’s a marked-up screenshot of the page that shows the conflicting options ( also on http://screencast.com/t/A2VQT5gOVkK )

I totally understand how the team at Dreamhost that implemented the Trac installer would think their approach was a good idea, because in a way it is. A lot of people who are familiar with Trac want to fine-tune the privileges using Trac’s own very-robust permissions system, deciding who can see the source / file tickets / etc. The problem is that there is absolutely no mention of an alternate permissions system contained within Trac – or that someone may need to fine-tune the Trac permissions. People unfamiliar with Trac have NO IDEA that their code is being made public, and those familiar with Trac would not necessarily realize that a fully unsecured setup is being created. I’ve been using Trac for over 8 years , and the thought of the default integrations being setup like this is downright silly – it’s the last thing I would expect a host to do.

I think it would be totally fine if there is just a “Warning!” sign next to the “enable Trac” — with a link to Trac’s wiki for customization , or instructions ( maybe even a checkbox option ) on how a user can have Trac use the same authorization file as subversion.

But, and this is a huge BUT, people need to be warned that clicking the ‘enable Trac’ button will publish code until Trac is configured. People who are running Trac via an auto-install need to be alerted of this immediately.

This can be a huge security issue depending on what people store in Subversion. Code put in Subversion repositories tends to contain Third Party Account Credentials ( Amazon AWS Secrets/Keys, Facebook Connect Secrets, Paypal/CreditCard Providers, etc ), SSH Keys for automated code deployment, full database connection information, administrator/account default passwords — not to mention the exact algorithms used for user account passwords.

## The fix

If you have a one-click install of Trac tied to Subversion on Dreamhost and you did not manually set up permissions, you need to do the following IMMEDIATELY:

### Secure your Trac installation

If you want to use Trac’s own privileges, you should create this .htaccess file in the meantime to disable all access to the /trac directory

deny from all

Alternately, you can map access your Trac install to the Subversion password file with a .htaccess like this:

* All Third Party Credentials should be immediately trashed and regenerated.
* All SSH Keys should be regenerated
* All Database Accounts should be reset.
* If you don’t have a secure password system in place , you need up upgrade

## What are the odds of me being affected ?

Someone would need to figure out where your trac/svn repos are to exploit this. Unless you’ve got some great obscurity going on, it’s pretty easy to guess. Many people still like to deploy using files served out of Subversion (it was popular with developers 5 years ago before build/deploy tools became the standard) , if that’s the case and Apache/Nginx aren’t configured to reject .svn directories — your repo information is public.

When it comes to security, play it safe. If your repo was accidentally public for a minute, you should wipe all your credentials.

In March of 2011 I represented Newsweek & The Daily Beast at the Harvard Business School / Committee of Concerned Journalists “Digital Leaders Summit”. Just about every major media property sent an executive there, and I was privileged enough to represent the newly formed NewsBeast (Newsweek+TheDailyBeast had recently merged, but have since split).

Over the course of two days, we covered a lot of concerns across the industry – analyzing who was doing things right and how/why others were making mistakes.

On the first day of the summit we looked at how Amazon was posturing itself for digital book sales – where their profits were hoping to be, where their losses were expected, and strategies for finding the optimal price structure for digital goods.

Inevitably, the conversation sidetracked to the Apple Ecosystem, which had just announced Subscriptions and their eBooks plan — consequently being their new competitor.

One of the other 30 or so people in attendance was Jeffrey Zucker from NBC, who went into his then-famous “digital pennies vs. analog dollars” diatribe. He made a compelling, intelligent, and honest argument that captivated the minds and attention of the entire room. Well, most of the room.

I vehemently disagreed with all his points and quickly spoke up to grab the attention of the floor… “apologizing” from breaking with the conventional view of this subject, and asking people to look at the situation from another point of view. Yes, it was true as Zucker stated that Apple standardized prices for digital downloads and set the pricing on their terms – not the producer’s. Yet, it was true that Apple allowed for records to be purchased “in part” and not as a whole – shifting purchase patters, and yes to a lot of other things.

And yes – Jeffrey Zucker didn’t say anything that was “wrong” – everything he said was right. But it was analyzed from the wrong perspective. Simply put, Zucker and most of the other delegates were only looking at portion of the scenario and the various mechanics at play. The prevailing wisdom in the room was way off the mark… by miles.

Apple didn’t gain dominance in online music because of their pricing system or undercutting retailers – which everyone believed. Plain and simple, Apple took control of the market because they made it fundamentally easier and faster for someone to legally buy music than to steal it. When they first launched (and still in 2012) it takes under a minute for someone to find and buy an Album or Single in the iTunes store. Let me stress that – discovery, purchase and delivery takes under a minute. Apple’s servers were relatively fast at the start as well – an entire album could be downloaded within an hour.

In contrast, to legally purchase an album in the store would take at least two hours – and at the time they first launched, encoding an album to work on an MP3 player would take another hour. To download a record at that time would be even longer: services like Napster (already dead by the iTunes launch) could take a day to download; torrent systems could take a day; while file upload sites were generally faster, they suffered from another issue that torrents and other options did as well – mislabeled and misdirected files.

Possibly the only smart thing the Media Industry has ever done to curb piracy is what I call the “I Am Spartacus” method — wherein “crap” files are mislabeled to look like Top 40 hits. For example: in expectation of a new Jay-Z record, internet filesharing sites are flooded with uploads that bear the name of the record… but contain white noise, another record, or an endless barrage of insults (ok, maybe not the last one… but they should).

I pretty much shut the room up at that point, and began a diatribe of my own – which I’ll repeat and continue here…

At the conference, Jeffrey Zucker and some other media executives tended to look at the digital economy like this: If there are 10 million Apple downloads of the new Beyonce record or the 2nd Season of “Friends”, those represent 10 million diverted sales of a $17.99 CD – or 10MM diverted sales of a $39.99 dvd. If Apple were to sell the CD for 9.99 with a 70% cut, they’re only seeing $7 in revenue for every $17.99 — 100 million times. Similarly, if 10MM people are watching Friends for $13.99 (or whatever cost) on AppleTV instead of buying $29.99 box sets, that’s about $20 lost per viewer — 10 million times.

To this point, I called bullshit.

Digital goods such as music and movies have incredibly diminished costs for incremental units, and for most of these products they are a secondary market — records tend to recoup their various costs within the first few months, and movies/tv-shows tend to have been wildly profitable on-TV / in-Theaters. The music recording costs 17.99 and the DVD 29.99 , not because of fixed costs and a value chain… but because $2 of plastic, or .02¢ of bandwidth, is believed by someone to be able to command that price.

Going back to our real-life example, 10MM downloads of “Friends” for 13.99 doesn’t equate to 10MM people who would have purchased the DVD for $39.99. While a percentage of the 10MM may have been willing to purchase the DVDs for the higher price, another — larger — percentage would not have. By lowering the price from 39.99 to 13.99, the potential market had likely changed from 1MM consumers to 10MM. Our situation is not an “apples-to-apples” comparison — while we’re generating one third the revenue, we’re moving ten times as many units and at a significantly lower cost (no warehousing, mfg, transit, buybacks, etc).

While hard copies are priced to cover the actual costs associated with manufacturing and distributing the media, digital media is flexibly priced to balance convenience with maximized revenue.

Typical retail patterns release a product at a given introductory price (e.g. $10) for promotional period, raise it to a sustained premium for an extended period of time (e.g. $17), then lower it via deep discounted promotions for holiday sales or clearance attempts (e.g. $5). Apple ignored the constant re-pricing and went for a standardized plan at simple price-points.

Apple doesn’t charge .99¢ for a song, or $1.99 for a video because of some nefarious plan to undervalue media — they came up with those prices because those numbers can generate significant revenue while being an inconsequential purchase. At .99¢ a song or $9.99 an album, consumer’s simply don’t think. We’re talking about a dollar for a song, or a ten dollar bill for a record.

Let me rephrase that, we’re talking about a fucking dollar for a song. A dollar is a magical number, because while it’s money, it’s only a dollar. People lose dollar bills all the time, and rationalize the most ridiculous of purchases away… because it’s only a dollar. It’s four quarters. You could find that in the street or in your couch. A dollar is not a barrier or a thought. You’ll note that a dollar is not far off from the price of a candy bar, which retailers incidentally realized long ago that “Hey – let’s put candy bars next to the cash registers and keep the prices relatively low, so people make impulse buys and just add it onto their carts”.

Do you know what happens when you charge a dollar for something? People just buy it. At 13.99 – 17.99 for a cd, people look at that as a significant purchase — one that competes with food, vacations, their children’s college savings. When you charge a dollar a song – or ten dollars a record – people don’t make those comparisons… they just buy.

And buy, and buy, and buy. Before you know it, people end up buying more goods — spending more money overall on media than they would have under the old model. Call me crazy, but I’d rather sell 2 items with little incremental cost at $9.99 each than 1 item at $13.99 — or even 1 item at $17.99.

Unfortunately, the current stable of media executives – for the most part – just don’t get this. They think a bunch of lawyers, lobbyists and paying off politicians for sweetheart legislations are the best solution. Maybe that worked 50 years ago, but in this day and age of transparency and immediacy, it justq doesn’t.

Today: you need to swallow you pride, realize that people are going to steal, that the ‘underground’ will always be ahead of you, and instead of wasting time + money + energy with short-term bandaids which try to remove piracy ( and need to be replaced every 18months ) — you should invest your time and resources into making it easier and cheaper to legally consume content. Piracy of goods will always exist, it is an economic and human truth. You can fight it head-on, but why? There will always be more pirates to fight; they’re motivated to free content, and they’re doubly motivated to outsmart a system. Fighting piracy is like a chinese finger trap.

Instead of spending millions of dollars chasing 100% market share that will never happen (and I can’t stress that enough, it will never happen), you could spend thousands of dollars addressing the least-likely pirates and earn 90% of the market share — in turn generating billions more in revenue each year.

Until decision makers swallow their pride and admit they simply don’t understand the economics behind a digital world, media companies are going to constantly and mindlessly waste money. Almost every ( if not EVERY ) attempt at Digital Rights Management by major media companies has been a catastrophe – with most just being a waste of money, while some have resulted in long term compliance costs. I can’t say this strongly enough: nearly the entire industry of Digital Rights Management is a complete failure and not worth addressing.

Today, the media industry is at another crossroads. Intellectual property rights holders are getting incredibly greedy , and trying to manipulate markets which they clearly don’t understand. In the past 12 hours I’ve learned how streaming rights to Whitney Houston movies were pulled from major digital services after her death to increase DVD sales [ I would have negotiated with digital companies for an incremental ‘fad’ premium, expecting the hysteria to die down before physical goods could be made ], and read a dead-on comic by The Oatmeal on how it has – once again – become easer to steal content than to legally purchase it [ http://theoatmeal.com/comics/game_of_thrones ].

As I write this (Feb 2012) it is faster to steal a high quality MP3 (or FLAC) of record than it is to either: a) rip the physical CD to the digital version or b) download the item from iTunes ( finding/buying is still under a minute ). Regional release dates for music , movies and TV are unsynchronized (on purpose!) , which ends up in the perverse scenario where people in different regions become incentivized to traffic content to one another — i.e. a paying subscriber of a premium network in Europe would illegally download an episode when it first airs on the affiliate in the United States, one month before the European date.

Digital economics aren’t rocket science, they’re drop-dead simple:

If you make things fast and easy to legally purchase, people will purchase it.

If you make things cheap enough, people will buy them – without question , concern, or weighing the purchase into their financial plans.

If you make it hard or expensive for people to legally purchase something, they will turn to “the underground” and illegal sources.

Piracy will always exist, innovators will always work to defy Digital Rights Management, and as much money as you throw at creating anti-piracy measures… there will always be a large population of brilliant people working to undermine them.

My advice is simple: pick your battles wisely. If you want to win in digital media, focus on the user experience and maximizing your revenue generating audience. If your content is good, people will either buy it or steal it – if your content is bad, they’re going somewhere else.

I’m glad to no longer be in corporate publishing. I’m glad to be back in a digital-only world, working with startups , advertising agencies, and media companies that are focused on building the future… not trying to save an ancient business model.

2016 Update

Re-reading this, I can’t help but draw the parallels to the explosion of Advertising and Ad Blocking technologies in recent years. Publishers have gotten so greedy trying to extract every last cent of Advertising revenue and including dozens of vendor/partner javascript tags, that they have driven even casual users to use Ad Blocking technologies.

I needed to upgrade from Python 2.6 to 2.7 and ran into a few issues along the way. Learn from my encounters below.

# Upgrading Python
Installing the new version of Python is really quick. Python.org publishes a .dmg installer that does just about everything for you. Let me repeat “Just about everything”.

You’ll need to do 2 things once you install the dmg, however the installer only mentions the first item below:

1. Run “/Applications/Python 2.x/Update Shell Command”. This will update your .bash_profile to look in “/Library/Frameworks/Python.framework/Versions/2.x/bin” first.

2. After you run the app above, in a NEW terminal window do the following:

* Check to see you’re running the new python with `python –version` or `which python`
* once you’re running the new python, re-install anything that installed an executable in bin. THIS INCLUDES SETUPTOOLS , PIP , and VIRTUALENV

It’s that second thing that caught me. I make use of virtualenv a lot, and while I was building new virtualenvs for some projects I realized that my installs were building against `virtualenv` and `setuptools` from the stock Apple install in “/Library/Python/2.6/site-packages” , and not the new Python.org install in “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages”.

It’s worth nothing that if you install setuptools once you’ve upgraded to the Python.org distribution, it just installs into the “/Library/Frameworks/Python.framework” directory — leaving the stock Apple version untouched (basically, you can roll back at any time).

# Installing Mysql-Python (or the ruby gem) against MAMP

I try to stay away from Mysql as much as I can [ i <3 PostgreSQL ], but occasionally I need to run it: when I took over at TheDailyBeast.com, they were midway through a relaunch on Rails , and I have a few consulting clients who are on Django.
I tried to run cinderella a while back ( http://www.atmos.org/cinderella/ ) but ran into too many issues. Instead of going with MacPorts or Homebrew, I've opted to just use MAMP ( http://www.mamp.info/en/index.html )
There's a bit of a problem though - the persons who are responsible for the MAMP distribution decided to clear out all the mysql header files, which you need in order to build the Python and Ruby modules.
You have 2 basic options:
1. Download the "MAMP_components" zip (155MB) , and extract the mysql source files. i often used to do this, but the Python module needed a compiled lib and I was lazy so...
2. Download the tar.gz version of Mysql compiled for OSX from http://dev.mysql.com/downloads/mysql/
Whichever option you choose, the next steps are generally the same:
## Copy The Files
### Where to copy the files ?
mysql_config is your friend. at least the MAMP one is.
Make sure you can call the right mysql_config, and it'll tell you where the files you copy should be stashed. Since we're building against MAMP we need to make sure we're referencing MAMP's mysql_config
iPod:~jvanasco$ which myqsl_config
/Applications/MAMP/Library/bin/mysql_config

Note: if you’re installing for a virtualenv, this needs to be done after it’s been activated.

Set the archflags on the commandline:

export ARCHFLAGS="-arch $(uname -m)"

### Python Module

I found that the only way to install the module is by downloading the source ( off sourceforge! ).

I edited site.cfg to have this line:

mysql_config = /Applications/MAMP/Library/bin/mysql_config

Basically, you just need to tell mysql to use the MAMP version of mysql_config to figure everything out itself.

the next steps are simply:

python setup.py build
python setup.py install

If you get any errors, pay close attention to the first few lines.

If you see something like the following within the first 10-30 lines, it means the various files we placed in the step above are not where the installer wants them to be:

_mysql.c:36:23: error: my_config.h: No such file or directory
_mysql.c:38:19: error: mysql.h: No such file or directory
_mysql.c:39:26: error: mysqld_error.h: No such file or directory
_mysql.c:40:20: error: errmsg.h: No such file or directory

note how we see “/Applications/MAMP/Library/include/mysql” in there. When I quickly followed some instructions online that had all the files in /include — and not in that subdir — this error popped up. Once I changed the directory structure to match what my mysql_config wanted, the package installed perfectly.

Basically what’s happening is that as you run it, mysql_python drops a shared object in your userspace. That shared object is referencing the location that the Mysql.org distribution placed all the library files — which differs from where we placed them in MAMP.

There’s a quick fix — add this to your bash profile, or run it before starting mysql/your app:

There are too many posts on this subject matter to thank. A lot of people posted variations of this method – to which I’m very grateful – however no one addressed troubleshooting the Python process , which is why I posted this.

I also can’t stress the simple fact that if the MAMP distribution contained the header and built library files, none of this would be necessary.

I’m quickly prototyping something that needs to interact with Facebook’s API and got absolutely lost by all their documentation – which is plentiful, but poorly curated.

I lost a full day of time trying to figure out why my code wasn’t doing what I wanted it to do, trying to understand how it works so I could figure out what I was actually telling it to do. I eventually hit the “ah ha!” moment where I realized that by following the Facebook “getting started” guides, I was telling my code to do embarrassingly stupid things. This all tends to dance around the execution order , which isn’t really documented at all. Everything below should have been very obvious — and would have been obvious, had I not gone through the “getting started” guides, which really just throws you off track.

Here’s a collection of quick notes that I’ve made.

## Documentation Organization

Facebook has made *a lot* of API changes over the past few years, and all the information is still up on their site… and out on the web. While they’re (thankfully) still supporting deprecated features, their documentation doesn’t always say what is the preferred method or not – and the countless 3rd party tutorials and StackOverflow activity don’t either. The “Getting Started” documentation and on-site + github code samples also doesn’t tie together with the API documentation well either. If you go through the tutorials and demos, you’ll see multiple ways to handle a login/registration button… yet none seem to resemble what is going on in the API. There’s simply no uniformity, consistency, or ‘official recommendations’.

I made the mistake of going through their demos and trying to “learn” their API. That did more damage than good. Just jump into the [Javascript SDK API Reference Documentation](https://developers.facebook.com/docs/reference/javascript/) itself. After 20 minutes reading the API docs themselves, I realized what was happening under the hood… and got everything I needed to do working perfectly within minutes.

## Execution Order

The Javascript SDK operations in the following manner:

1. Define what happens on window.fbAsyncInit – the function the SDK will call once Facebook’s javascript code is fully loaded. This requires, at the very least, calling the FB.init() routine. FB.init() registers your app against the API and allows you to actually do things.
2. Load the SDK. this is the few lines of code that start “(function(d){ var js, id = ‘facebook-jssdk’;…” .
3. Once loaded, the SDK will call “window.fbAsyncInit”
4. window.fbAsyncInit will call FB.init() , enabling the API for you.

The important things to learn from this are :

1. If you write any code that touches the FB namespace _before_ the SDK is fully loaded (Step 3), you’ll get an error.
1. If you write any code that touches the FB namespace _before_ FB.init() is called (Step 4), you’ll get an error.
1. You should assume that the entire FB namespace is off-limits until window.fbAsyncInit is executed.
1. You should probably not touch anything in the FB namespace until you call FB.init().

This means that just about everything you want to do either needs to be:

1. defined or run after FB.init()
2. defined or run with some sort of callback mechanism, after FB.init()

That’s not hard to do, once you actually know that’s what you have to do.

## Coding Style / Tips

The standard way the Facebook API is ‘instructed to integrated is to drop in a few lines of script. The problem is that the how&why this works isn’t documented well, and is not linked to properly on their site. Unless you’re trying to do exactly what the tutorials are for – or wanting to code specific Facebook API code on every page, you’ll probably get lost trying to get things to run in the order that you want.

Below I’ll mark up the Facebook SDK code and offer some of ideas on how to get coding faster than I did… I wasted a lot of time going through the Facebook docs, reading StackOverflow and reverse engineering a bunch of sites that had good UX integrations with Facebook to figure this out.

// before loading the Facebook SDK, load some utility functions that you will write

One of the move annoying things I encountered, is that Facebook has that little, forgettable, line in their examples that read:

// Additional initialization code here

You might have missed that line, or not understood its meaning. It’s very easy to do, as its quite forgettable.

That line could really be written better as :

// Additional initialization code here
// NEARLY EVERYTHING YOU WRITE AGAINST THE FACEBOOK API NEEDS TO BE INITIALIZED / DEFINED / RUN HERE.
// YOU EITHER NEED TO INCLUDE YOUR CODE IN HERE, OR SET IT TO RUN AFTER THIS BLOCK HAS EXECUTED ( VIA CALLBACKS, STACKS, ETC ).
// (sorry for yelling, but you get the point)

So, let’s explore some ways to make this happen…

In the code above I called fb_Utils.initialize() , which would have been defined in /js/fb_Utils.js (or any other file) as something like this:

// grab a console for quick logging
var console = window['console'];
// i originally ran into a bunch of issues where a function would have been called before the Facebook API inits.
// the two ideas i had were to either:
// 1) pass calls through a function that would ensure we already initialized, or use a callback to retry on intervals
// 1) pass calls through a function that would ensure we already initialized, or pop calls into an array to try after initialization
// seems like both those ideas are popular, with dozens of variations on each used across popular sites on the web
// i'll showcase some of them below
var fb_Utils= {
_initialized : false
,
isInitialized: function() {
return this._initialized;
}
,
// wrap all our facebook init stuff within a function that runs post async, but is cached across the site
initialize : function(){
// if you wanted to , you could migrate into this section the following codeblock from your site template:
// -- FB.init({
// -- appId : 'app_id'
// -- ...
// -- });
// i looked at a handful of sites, and people are split between calling the facebook init here, or on their templates
// personally i'm calling it from my templates for now, but only because i have the entire section driven by variables
// mark that we've run through the initialization routine
this._initialized= true;
// if we have anything to run after initialization, do it.
while ( this._runOnInit.length ) { (this._runOnInit.pop())(); }
}
,
// i checked StackOverflow to see if anyone had tried a SetTimeout based callback before, and yes they did.
// link - http://facebook.stackoverflow.com/questions/3548493/how-to-detect-when-facebooks-fb-init-is-complete
// this works like a charm
// just wrap your facebook API commmands in a fb_Utils.ensureInit(function_here) , and they'll run once we've initialized
ensureInit : function(callback) {
if(!fb_Utils._initialized) {
setTimeout(function() {fb_Utils.ensureInit(callback);}, 50);
} else {
if(callback) { callback(); }
}
}
,
// our other option is to create an array of functions to run on init
_runOnInit: []
,
// we can then wrap items in fb_Utils.runOnInit(function_here) , and they
runOnInit: function(f) {
if(this._initialized) {
f();
} else {
this._runOnInit.push(f);
}
},
// a few of the Facebook demos use a function like this to illustrate the api
// here, we'll just wrap the FB.getLoginStatus call , along with our standard routines, into fb_Utils.handleLoginStatus()
// the benefit/point of this, is that you have this routine nicely compartmentalized, and can call it quickly across your site
handleLoginStatus : function(){
FB.getLoginStatus(
function(response){
console.log('FB.getLoginStatus');
console.log(response);
if (response.authResponse) {
console.log('-authenticated');
} else {
console.log('-not authenticated');
}
}
);
}
,
// this is a silly debug tool , which we'll use below in an example
event_listener_tests : function(){
FB.Event.subscribe('auth.login', function(response){
console.log('auth.login');
console.log(response);
});
FB.Event.subscribe('auth.logout', function(response){
console.log('auth.logout');
console.log(response);
});
FB.Event.subscribe('auth.authResponseChange', function(response){
console.log('auth.authResponseChange');
console.log(response);
});
FB.Event.subscribe('auth.statusChange', function(response){
console.log('auth.statusChange');
console.log(response);
});
}
}

So, with some fb_Utils code like the above, you might do the following to have all your code nicely asynchronous:

1. Within the body of your html templates, you can call functions using ensureInit()

fb_Utils.ensureInit(fb_Utils.handleLoginStatus)
fb_Utils.ensureInit(function(){alert("I'm ensured, but not insured, to run sometime after initialization occurred.);})

2. When you activate the SDK – probably in the document ‘head’ – you can decree which commands to run after initialization:

window.fbAsyncInit = function() {
// just for fun, imagine that FB.init() is located within the fb_Utils.initialize() function
FB.init({});
fb_Utils.runOnInit(fb_Utils.handleLoginStatus)
fb_Utils.runOnInit(function(){alert("When the feeling is right, i'm gonna run all night. I'm going to run to you.");})
fb_Utils.initialize();
};

## Concluding Thoughts

I’m not sure if I prefer the timeout based “ensureInit” or the stack based “runOnInit” concept more. Honestly, I don’t care. There’s probably a better method out there, but these both work well enough.

In terms of what kind of code should go into the fb_Utils and what should go in your site templates – that’s entirely a function of your site’s traffic patterns — and your decision of whether-or-not a routine is something that should be minimized for the initial page load or tossed onto every visitor.

A tumblr posting just popped up on my radar about Apple trying to patent an app that is identical to one by the company Where-To [Original Posting Here]

The author shows a image comparing a line drawing in Apple’s patent to a screenshot of an application called “Where-To”. The images are indeed strikingly similar.

The author then opens:
>> It’s pretty easy to argue that software patents are bad for the software industry.

Well yes, it is pretty easy to argue that. It’s also pretty easy to argue that Software Patents are really good for the software industry. See, you can cherry-pick edge cases for both arguments and prove either point. You can make an easy argument out of anything, because it’s easier to do that and argue on black&white philosophical beliefs than it is to think about complex systems.

That’s a huge problem with bloggers though– they don’t like to think. They just like to react.

The author continues:

>Regardless of where you stand on that issue, however, it must at least give you pause when Apple, who not only exercises final approval over what may be sold on the world’s largest mobile software distribution platform, but also has exclusive pre-publication access (by way of that approval process) to every app sold or attempted to be sold there, quietly starts patenting app ideas.

> But even if you’re fine with that, how about this: one of the diagrams in Apple’s patent application for a travel app is a direct copy, down to the text and the positions of the icons, of an existing third-party app that’s been available on the App Store for years.

Believe it or not this happens ALL THE TIME. It’s not uncommon to see major technology companies have images from their biggest competitors in their patent diagrams. Patent diagrams are meant to illustrate concepts, and if someone does something very clear — then you copy it. So you might see a Yahoo patent application that shows advertising areas that read “Ads by Google” ( check out the “interestingness” application Flickr filed a few years ago ), or you might have an Apple patent application that shows one very-well-done user interface by another company being used as an example to convey an idea. This isn’t “stealing” ( though I wonder how someone can argue both against and for intellectual property in the same breath ) – it’s just conveying a concept. Conveying a concept or an interface in a patent doesn’t mean that you’re patenting it, it just means you’re using it to explain a larger concept.

The blogger failed to mention a few really key facts:

1. This was 1 image out of 10 images.
2. Other screenshots included a sodoku game, an instant message, a remote control for an airline seat’s console, a barcoded boarding pass, and a bunch of other random things.
3. The Patent Application is titled “Systems And Methods For Accessing Travel Services Using A Portable Electronic Device” — it teaches about integrating travel services through a mobile device. Stuff like automating checking, boarding , inflight services and ground options for when you land. The Where-To app shows interesting things based on geo-location.

You don’t need to read the legalese claims to understand the two apps are entirely unrelated — you could just read the title, the abstract, or the laymans description. If someone did that, they might learn this was shown as an interface to navigate airport services:

> In some embodiments, a user can view available airport services through the integrated application. As used herein, the term “airport services” can refer to any airport amenities and services such as shops, restaurants, ATM’s, lounges, shoe-shiners, information desks, and any other suitable airport services. Accordingly, through the integrated application, airport services can be searched for, browsed, viewed, and otherwise listed or presented to the user. For example, an interface such as interface 602 can be provided on a user’s electronic device. Through interface 602, a user can search for and view information on the various airport services available in the airport.

Apple’s patent has *nothing* to do with the design or functionality of the Where-To app. They’re not trying to patent someone else’s invention, nor are they trying to patent a variation of the invention or any portion of the app. They just made a wireframe of a user interface that they liked (actually, it was probably their lawyer or draftsman) to illustrate an example screen.

Either 2 things happened:

1. The blogger didn’t bother reading the patent, and just rushed to make conclusions of his own.
2. The blogger read the patent, but didn’t care — because there was something in there that could be controversial.

Whichever reason doesn’t matter — both illustrates my underlying point that 99% of people who are talking about software patents should STFU because they’re unable or unwilling to address complex concepts. Whenever patent issues come up, the outspoken masses have knee-jerk reactions based on ideology (on all sides of the issue), and fail to actually read or investigate an issue.

There was even a comment where someone noted:

> Filing date is December 2009….which means Apple’s priority date is December 2008. From what I can see, this app went on sale in mid 2009….going to be hard to argue it is prior art.

They didn’t bother reading the application either. On the *very first line* , we see:

> [0001]This application claims the benefit of U.S. Provisional Patent Application No. 61/147,644, filed on Jan. 27, 2009, which is hereby incorporated by reference herein in its entirety.

How the commenter decided that *December 2008* was a priority date bewilders me. The actual priority date is written in that very first line! They also brought up the concept of ‘Priority’ – which is interesting because that suggests they understand how the USPTO works a bit. “Priority” lets an applicant use an earlier date as their official filing date under certain conditions — either a provisional application is turned into a non-provisional application, or a non-provisional application is split into multiple applications. In both of these cases no new material can be submitted to the USPTO after the ‘priority date’ – It’s just a convenient way to let inventors file information about their invention quickly, and have a little more time to get the legal format in full compliance. A provisional application does have 1 year to be turned into a a non-provisional application — but there’s no backwards clock to claim priority based on your filing date.

I’ve been growing extremely unsatisfied with Apple over the past few years, and I’d love to see them get ‘checked’ by the masses over an issue. Unfortunately, there is simply no issue here.

*Update: The brilliant folks at TechCrunch have just stoked the fire on this matter too, citing the original posting and then improperly jumping to their own conclusions. They must be really desperate for traffic today. Full Article Here*