Recent Content:

PoC‖GTFO Issue 0x18

Pastor Manul Laphroaig's Montessori Soldering School and Stack Smashing Academy for Youngsters Gifted and Not

File Polyglottery

or, This Proof of Concept is Also a Picture of Cats

Over the past few years I’ve been a frequent contributor to and editor of the journal PoC or GTFO. Part of my contributions have been to help create several of the recent file polyglots for the electronic releases. I just presented a talk on this topic at BSides Philly, 2017.

A polyglot is a file that can be interpreted as multiple different filetypes depending on how it is parsed. While polyglots serve the noble purpose of being a nifty parlor trick, they also have much more nefarious uses, e.g., hiding malicious printer firmware inside a document that subverts a printer when printed, or a document that displays completely different content depending on which viewer opens it. This talk does a deep dive into the technical details of how to create such special files, using examples from some of the recent issues of the International Journal of PoC||GTFO. Learn how we made a PDF that is also a valid NES ROM that, when emulated, displays the MD5 sum of the PDF. Learn how we created a PDF that is also a valid PostScript document that, when printed to a PostScript printer, produces a completely different document. Oh, and the PostScript also prints your /etc/passwd file, for good measure. Learn how to create a PDF that is also a valid Git repository containing its own LaTeX source code and a copy of itself. And many more!

Anybody here interested in zoning? Open Data Philly now has the 2012 pre-reform zoning base map up, making it possible to compare the pre-reform period to the current zoning map. I’m very interested in finding out what’s happened in the 5 years since we started zoning remapping. I know that the Planning Commission is working on an analysis, but it is unlikely to be very comprehensive. An outside analysis, perhaps in partnership with a university, could be really impactful.

I had recently been toying with another mapping-related side-project
(which is not yet complete and therefore I’ll have to leave you in
suspense), so I offered Jon my assistance.

In the remainder of this article, I first describe the regulatory
reforms that have occured over the past five years in the Philadelphia
zoning code, and why they render before-and-after zoning change
analysis challenging. I then describe my technical approach to
overcoming the challenges, which will likely only appeal to computer
scientists; you can feel free to skip that section unless you plan on
reusing, extending, or repurposing my code. Finally, I conclude with
the preliminary results of my analysis.

New Philly Zoning

Over the past five years, the borders of many zoning districts were
redrawn; some of this may have even occurred instantaneously when
transitioning from the old zoning code to the new. For example, the
600 block on the North side of Washington Avenue used to be zoned
R-10A, but is now subdivided: The interior parcels have been split
into a separate RSA-5 (single family row home) district and those
fronting Washington Avenue have been rezoned into CMX-2 (mixed-use).
The corner parcel at 7th street is now RM-1 (multi-family
residential). Likewise, the 500 block used to almost entirely be
zoned R-10, but is now a mixture of RM-1 fronting 6th street.

Example of Zoning Subdivision

Furthermore, the GPS coordinates of the zoning districts for the
pre-2012 and post-2012 datasets
on Open Data Philly don’t always line up. So, even
if a zoning district’s borders didn’t change, the GPS coordinates of
its new boundary might not be identical to its pre-2012 boundary.
Therefore, there is not a straightforward way of mapping old zoning
districts to new ones, even using automation.

Even if the zoning boundaries did line up perfectly, there are over
thirty thousand zoning districts in Philly: Mapping old districts to
new ones using a naïve approach will require making over one billion
“Does this old district line up with this other new one?” comparisons.
Even if a human could do one comparison per second working eight hours
per day, he or she would require close to a hundred years to complete
the mapping. By then, there’s a good chance there might be another
change to the zoning code 🤣! Even if a computer could do
a hundred comparisons per second, it would take about four months
non-stop to complete.
So, how did I complete the mapping in less than a day? I answer that
in the following section. But if you’re impatient and just want to
skip to the analysis, it’s safe to skip ahead.

Technical Approach

Here’s a rough outline of the naïve algorithm I mentioned above:

Algorithm 1 The dumb approach that will take a gazillion years.

1:

procedureMerge-Zoning($Z$, $Z^\prime$)

Require: $Z = \{D_1, D_2 \ldots\}$ is a set of the old zoning districts, each being a list $D = \{p_{1}, p_{2}, \ldots\}$ of its boundary points. $Z^\prime$ is the list of new zoning districts.

Ensure: $M = \{ I_1, I_2, \ldots \}$ is a set of merged zoning districts, where each $I \in M$ is a tuple $\langle i, \{ j_1, j_2, \ldots \}, \{ D_1, D_2, \ldots \}$ with the first element containing the index $Z^\prime_i$ of the new district, the second element containing the set of old districts $\{Z_{j_1}, Z_{j_2}, \ldots\}$ it subsumes, and the third element containing its new boundary.

2:

for all $D_i^\prime \in Z^\prime$ do

3:

for all $D_j \in Z$ do

4:

if $D^\prime \cap D \neq \emptyset$ then

/* The zoning districts intersect */

5:

$J \gets \{j$ for all $j$ in all preexisting districts in $D_j \in M$ that intersect with $D_i^\prime\}$

Using a combination of Shapely
and pyproj makes things like intersecting polygons
(i.e., zoning district boundaries) and calculationg the
area inside polygons easy.

I’m not sure which algorithm Shapely uses for polygon intersection,
but its computational complexity is likely $\Omega((n\times
m)\log(n\times m))$, where $n$ and $m$ are the number of edges in the
polygons being intersected. The zoning districts have on average 39
edges, so that means a single intersection is going to take at least
5000 operations. Therefore, it’s not realistic to hit the
afforementioned speed of a hundred automated comparisons per second.
(In fact, my final implementation ended up averaging about two seconds
per comparison.)

The first optimization we can make stems from the observation that
zoning districts are non-overlapping. Therefore, any portion of an
old zoning district that overlaps with a new zoning district can be
deleted once they are merged:

Algorithm 2 A slightly less dumb approach.

1:

procedureMerge-Zoning($Z$, $Z^\prime$)

2:

for all $D_i^\prime \in Z^\prime$ do

3:

for all $D_j \in Z$ do

4:

if $D^\prime \cap D \neq \emptyset$ then

/* The zoning districts intersect */

5:

$J \gets \{j$ for all $j$ in all preexisting districts in $D_j \in M$ that intersect with $D_i^\prime\}$

The final optimization that gets the running time down to tens of
hours instead of days is to get rid of the cumbersome $J$ calculation
using some clever caching. If you’re interested, check out intersect_maps.py in my code.

I could probably reduce the computation time by another order of
magnitude by more intelligently pruning old zoning
districts, e.g., by storing them in a datastructure like
an R-Tree. I chose not to do that because it would
have required more development time, and the existing computation time
of “less than a day” was sufficient. I may add that feature in the
future.

Preliminary Analysis Results

At this point we have merged the old zoning districts with the new,
producing a KML file which can be loaded
into Google Earth to interactively explore the zoning
changes. You
can download it here.

Structural Density

The first thing I’d like to discuss is what I call structural
density. (I’m a Computer Scientist, not a City Planner, so please
excuse me if I’m not using the correct terminology.) I wanted to know:
What is the maximum amount of square feet of building the zoning code
would allow to be built on a parcel if whatever’s currently there were
replaced? I calculated the structural density metric for both pre-2012
and post-2012 zoning.

Structural Density Changes

The results turned out to be less interesting than I had hoped; you
can download
the KML file here. The overall square footage capacity of the
city has increased with the new zoning code by over 38 million square
feet, which might seem like a lot, but that only constitutes about one
half of one percent change. Most of the changes seemed to emanate
from the reduction in the number of zoning district types, so old
zoning districts without a good modern analogue either had to be
“rounded down” or “rounded up.”

Residential Density

The next metric I investigated was what I call residential
density: an upper bound on the number of residents a parcel can
support. This is of course poorly defined, because the zoning code
itself doesn’t explicitly limit residency. I describe my methodology
and parameterization for estimating it in the following section.

Residential Density Changes

You
can download
the KML file for this map here. The residential capacity of the
city increased by about 386 thousand (~7%) from a maximum of
5,229,657 in 2012 to a maximum of 5,615,158 today. Assuming the
population grows at a steady but slow rate of 0.4% per year, it
will take at most $$\frac{\log\frac{\mbox{residential
capacity}}{\mbox{current population}}}{\log(1 + \mbox{growth rate})} =
\frac{\log\frac{5615158}{1567872}}{\log 1.004}\approx 320\
\mbox{years}$$ before the city will reach maximum residency under the
current zoning regulations. You heard it here first, folks! Time to
get the ball rolling with a new Zoning Code Commission so we’re
prepared for 2337!

Seriously, though, there are some other interesting patterns in the
map. There are a few clusters of down-zoning and reduced residential
capacity around the city, particularly in Strawberry
Mansion/Brewerytown, Queen Village/Southwark, deep South Philly,
Kensington, Harrowgate, and Frankford. The majority of these seem to
be due to old R-10 parcels (which permitted multi-family use) being
re-zoned RSA-5, which only permits single-family use. This primarily
happened in preexisting single family townhouse communities that
aren’t likely to be redeveloped any time soon. But re-zoning from
R-10 to RSA-5 will constitute a 60 to 70% reduction in
residential capacity of a typical parcel.

With that said, across the board and even within the down-zoned
clusters, there are pockets of up-zoning, almost exclusively around
large retail corridors like American street. This is thanks to
up-zoning to newly created mixed-use districts like CMX and IRMX.

Callowhill, NoLibs, Fishtown, and the river wards were the standout
beneficiaries of mixed-use up-zoning, with an excess of what were
previously G-2 (medium density industrial) districts that have
now become a mixture of CMX, IRMX, and RMX.

Residential Density Changes

Methodology and Caveats

The Internet tells me that the average number of residents per
household in Philadelphia is 2.35. (Someone should double-check that
with the latest census.) I also assume that the minimum space per
resident is 450 sqft.; lowering that requirement will increase
the maximum residency estimates for multi-family homes. For districts
that only permit single family homes, I say that their maximum
residency is 2.35. Likewise, for duplexes: 4.7. For zoning
districts that have a maximum height/floors and a minimum percentage
of the lot that must be open (e.g., R-10 and RM-1), I
calculate the maximum residency as: $$\frac{(\mbox{Lot Area}) \times (1 -
\mbox{Minimum Open Area Percentage}) \times (\mbox{Maximum Floors})}{450\ \mbox{sqft.}}.$$ For districts with a Floor Area
Ratio (FAR, e.g., R-11 and RM-2), I calculate
the maximum residency as: $$\frac{(\mbox{Lot Area}) \times
(\mbox{FAR})}{450\ \mbox{sqft.}}.$$ Finally, I round the maximum
residency down to the greatest multiple of 2.35.

The old zoning code had a nice series of tables that let me easily
determine the parameters for most zoning district classes such that I
could estimate maximum residency. However, not all of the
zoning classes were covered! I didn’t have time yet to pore over the
pages and pages of fifty-year-old legalese to grok the parameters for
the missing classes, so instead I used a relative mapping from old
zoning classes to new zoning classes that is provided in the new
zoning code. Here are the missing zoning classes and their mappings:

C1

→

CMX-1

L1

→

I-1

C2

→

CMX-2

L2

→

I-1

OC

→

CMX-2

L3

→

I-1

C3

→

CMX-3

L4

→

ICMX

C4

→

CMX-4

L5

→

ICMX

C5

→

CMX-5

G1

→

I-2

LR

→

I-3

G2

→

I-2

So, for example, if a parcel had an old zoning classification of “C4,”
I would assign it the same maximum residency as a modern “CMX-4”.
Please let me know if you can help me navigate and interpret the old
zoning code to improve these estimates.

For zoning districts with bonuses, I always assume that the maximum
bonus is achieved when making residency and square footage estimates.

Be mindful that we have to take some of these changes with a grain of
salt, because they affect large preexisting planned
communities/neighborhoods that weren’t before and aren’t now likely to
be redeveloped. For example, the theoretical maximum residency of the
Society Hill Towers dropped by over 50%, but they ain’t going
anywhere anytime soon.

By that, Cuban means software will soon begin writing itself, which will ultimately eliminate those lucrative software development jobs. About writing software, Cuban said: “It’s just math, right?” Humans will no longer be needed.

Hold my beer, I’ma explain some math.

Remember The Imitation Game? The Academy Award winning
movie from a few years back, starring Buttercup Cummerbund and British
Natalie Portman? It was loosely based on Alan
Turing’s groundbreaking work cracking the Nazis’ cryptographic
codes, helping the Allies win the war. That very same dude is much
more famous among mathematicians and computer scientists for his work
on the Halting Problem.

The Halting Problem dates back to the seventeenth century, from
none other than Gottfried
Leibniz—the same dude who independently discovered Calculus,
even though Isaac Newton usually gets all the credit. Leibniz dreamt
of building a machine that could automatically check whether
mathematical formulas were correct or not. If you’ve gathered by this
point that mathematical formulas are effectively the same as computer
programs, then congratulations! Take a sip of my beer as a reward.

If you’re familiar with the Halting Problem, you can probably
already see where I’m going with this argument. I’ve been
asked why I chose to use the Halting Problem as a basis, as
opposed to going the route of simply—and
correctly—arguing that creating software is an immensely
complex engineering problem whose configuration space is
exponential. If the reader is technical like you, then they
undoubtedly already know this. If, however, the reader is not
familiar with any of these concepts, then I have found that it
is very difficult to overcome the public’s incorrect
perception that computers will continue getting faster and
faster and, therefore, will eventually be able to do anything.
As if hardware speed alone is what provides “intelligence.”
It’s very difficult to convince people otherwise, since it
basically requires teaching
P vs. NP. Therefore, it is a much simpler
rhetorical task to demonstrate that there are clear,
fundamental limits to what computers can even compute, meaning
that computers provably can’t make a perfect mapping from
imprecise specifications into a sufficiently correctly
functioning program. There needs to be some “intelligence” in
the equation: either a programmer or strong AI.

Fast forward to the early 20th century, and a sort of
revolution was going on in the world of mathematics: New systems were
being developed to formally encode logic. Think of it like
the system of algebra you learned in grade school, except that instead
of numbers you’re working with logical statements. David Hilbert was at the forefront of
this research, and in 1928 posed the question: Is it possible to
devise an algorithm (vi&, a computer program)
that can automatically determine whether a given logical
statement is universally valid? This became known with the
extraordinarily Deutschtastic name „Entscheidungsproblem“. A
few years later, Austrian expat Kurt Gödel—who is also famous
for discovering a “bug” in the US constitution at an
immigration court appearance with Albert Einstein—proved that
the answer to the entscheidungsproblem is: No, it is provably
impossible. In his seminal paper titled „Über formal unentscheidbare Sätze der Principia
Mathematica und verwandter Systeme I“, he shattered the
entscheidungsproblem both in theory and in Deutschtasticness.

So, back to Turing. A few years later, but before the events that
inspired The Imitation Game, Turing created a mathematical
definition of a computer which we now call a Turing Machine,
and on which practically all modern computers and computer science is
based. Turing realized that the logical statements and algorithms
Hilbert and Gödel were working with were in fact no different than the
programs that could be computed by his Turing Machine. He then
restated the entscheidungsproblem in this context: Is it possible to
write a program for a Turing Machine that can take another
program and that other program’s input and automatically
determine whether that other program will terminate? This became
known as the Halting Problem and, just like the
entscheidungsproblem, Turing was able to prove that the answer is
“No”: It is impossible to create a program that can automatically
determine whether another program will terminate.

Please indulge me as I briefly delve into metaphysics and, dare I
say, even digital physics. Can our universe be described
using logic and/or computation? In other words, if there were some
infinitely powerful Turing machine, could one write a program to run
on it such that it could accurately simulate the entire universe?
If so, then we humans are bound by the same theorems that prove
mathematical logic is incomplete and that the Halting Problem is
undecidable. In other words, it’d be impossible for even
humans to prove that a given program is correct.

The Halting Problem is part of
what makes programming computers hard: Any program with even a modicum
of real-world complexity cannot have provably correct behavior. Those
logical flaws and unintended behaviors are precisely what hackers
exploit to make programs do things that the programmer never intended.
Human programmers are only able to do a passable job of quality
control because we have the benefit of heuristics and instincts that
are guided by our general intelligence.

Let’s say we develop a computer program that can take a human’s
high-level description of a task and automatically generate a computer
program that can complete that task. There are two possibilities:

the human is specifying all of the complex logic, and the computer is simply translating that specification into a program by rote; or

the human is not specifying any complex logic, and the computer needs to determine the logic that needs to be in the program to accomplish the task.

The first possibility is effectively describing a human computer
programmer and compiler. The second possibility is what Cuban
seems to be describing. The trouble is that, by virtue of the Halting
problem, no such program could ever determine whether its output is
correct! Furthermore, automatically determining the logic that needs
to be in a program to accomplish a given task is provably
computationally hard, and cannot be
solved in general in a reasonable amount of time, regardless of
how fast computers get.

Could I write a program that accepts inputs like, “Create a program
that displays pictures of cats,” and then automatically generates a
program to do so? Sure. Could I create a program that automatically
creates a program that is able to interface with the highly formally
specified interface of another automatically generated program?
Probably. But the vast majority of software projects are for solving
complex human problems, and typically involve integration with
numerous legacy programs and software that were written by crazies,
idiots, and crazy idiots. Professional programming sucks. Don’t
believe me? If you take away nothing else from this essay, please read this article. I’m serious, read it. Any professional-grade program that can
automatically generate other programs will have to grok how to
interface with the undocumented source code of other humans. That’s
impossible unless the computer is intellectually indistinguishable
from a human.

In order for a computer to automatically generate a program that
solves a human’s problem, there either needs to be a human deciding
the logic (i.e., a programmer), or the computer’s
intelligence needs to be indistinguishable from that of a human.
And if we have computers that are intellectually indistinguishable
from humans, we’ll have more issues to deal with than simply losing
software jobs.

PoC‖GTFO Issue 0x13

PASTOR LAPHROAIG'S MERCY SHIP HOLDS STONES FROM THE IVORY TOWER, BUT ONLY AS BALLAST!

PoC‖GTFO Issue 0x12

COLLECTING BOTTLES OF BROKEN THINGS, PASTOR MANUL LAPHROAIG WITH THEORY AND PRAXIS COULD BE THE MAN WHO SNEAKS A LOOK BEHIND THE CURTAIN!