jc blog - tales of a modern-day nomadic hunter-gatherer

This is the weblog of Intrepid Wanderer. You never know what you
might find here;
graphic descriptions of bodily functions, computer programming secrets,
proselytizing for the antichrist, miscellaneous ranting and kvetching,
valuable information on living off the land...
if you don't share my rather weird interests you may want to try
slashdot instead.

If you want to comment on anything you see here, try the new Facebook comments,
reachable by clicking the "[comment]" link at the end of each post.
If for some reason that isn't working, go ahead and email me,
jc.unternet.net. You know what to do with the first dot. Make the 'subject'
line something reasonably intelligent-looking or it goes plunk! into the
spambasket unread.

This RSS feed may or may not work. Haven't fiddled with it in forever.

it always happens. once I get into Programming Mode, I start getting all kinds
of wild ideas how to optimize stuff. one of these is related to pipelining
CSV output between programs, and I'm thinking about ways to reduce the
overhead of packing and unpacking the data, keeping its type, i.e.,
string, float, integer, etc., specified explicitly in the encoding.

not that I've done any research to see if CSV packing and unpacking is even
significant... bleah.

one idea is using VLQ encoding, with a tag byte before each quantity indicating its type.
table headers would also have their own type byte, as would table rows, so I
could probably leave off the type bytes within the row altogether,
by including them in the table headers just after the name.
for example, let's say the table header type byte is 0xf0, table row 0xf1.
string is 0, int is 1, and a float with scale of 2, a common financial
spreadsheet format, is 2. here's some sample CSV:

id,dept,amount
1,shoes,1.37

and its equivalent (thanks to the WP article giving me the encoding for 137):

\xf0id\x00\x01dept\x00\x00amount\x00\x02\x00 # final \x00 marks end of header
\xf1\x01shoes\x00\x81\x09

rows need no end marker because their length is known from the header. a final
\xff byte can mark the end of the table.
[comment]

I don't know if I was fired or not, but I don't see much hope for getting paid,
so just for shits and grins I decided to abandon Spark, and even Pandas,
for a couple days and see if I could duplicate the client's scripts with
some simple Python using the csv module. turns out it wasn't that difficult, and
though it's not fast yet, its output is equivalent to the Pandas scripts it
was based on, except it doesn't add the gigabytes of duplicated rows the
original scripts do (due to there being duplicated rows in the input data).

and it's not a memory hog.

I believe I can make it fast, too. I haven't really started optimizing yet.
[comment]

this past couple of weeks (or so... one day runs into another and I lose
all track of time) I've been fighting to hold onto a job that could be big
money if I don't get lost in the woods. the client's client is using Pandas
to sling data around, but his scripts are all single-threaded and only
utilize a single processor. so he wants it ported to Spark. but Spark is buggy
-- version 2.0.2, which is what Amazon offers by default on their EMS clusters,
has a bug in dataframe.join() which breaks any left outer joins (at least)
given certain parameters of which I haven't tracked down yet, but certainly
my code (ported directly from his) triggered it. and 2.0.2 is only about 5
months old. so we're talking really basic stuff not working only 5 months ago --
hardly a mature platform. 2.1.0 fixes that, but there's still a bug in that
that makes calculated columns (dataframe.withColumn(...)) disappear on a
following join. so you have to write the data out to disk and read it back
in before you do the join! that forces the calculations to actually be
performed. I'm sure there's an easier way, but that's the way the client's
client did it and I'm just learning it as I go.

but stepping back, getting in the cockpit and flying over the problem at
ten miles up, I'm looking at all this shit and shaking my head. this isn't
programming. I've been saying for a while now that all these frameworks are
an impediment to getting work done. people go from storing data in flat files
to SQL databases, then on to NoSQL and S3. over a period of several decades,
we've gone full circle! instead of using the filesystem to store and reference
data, we're now using key-value stores, which accomplish the same thing!
even the pseudo-file structure of keys on S3 match a POSIX file pattern.

some user on StackOverflow (I think it was abarnert but can't find the thread at the moment) that simple Unix pipelining makes the best use of all processors and RAM. string together a bunch of scripts that each do one thing and do it well, feeding the output to the next script. if this client gives me the leeway, I can do that with this problem. for
example, all a "left outer join" does is copy all columns from one table
to another (appending them to the end) if a certain condition applies (such as,
in this case, having the same value for the same key column name). I can
script that using just a few lines of Python with the CSV module. for that
matter, I could pipeline all the input CSV files through a filter first that
changes the separator character to something that is close to 100%
certain not to appear in business spreadsheets (such as this?), and just split on that in all downstream pipes, eliminating that module.

one advantage of this pipeline approach is that you can, by various methods,
set the process name to something that indicates exactly what the script does,
like convert_cp1252_to_utf8 or left_outer_join. so when you
start up top, instead of a hundred lines of java or python, you see exactly what's going on!

my inner tinfoil-hat-conspiracy-nut thinks that these frameworks are simply a
big scam. I'm guessing they're being pushed by colleges to CS majors, and by
big corporations, whose decision-makers are being wined and dined by the
companies who stand to make big money training and supporting these bloated
pieces of bug-ridden software. they probably hook them at conferences and
conventions and such... I don't know, I've been out of the loop for almost 2 decades already.

but what I do know is, most of my coding now isn't coding at all, it's trying
to figure out what the fuck the framework is doing to my data, and trying to
find a workaround for it. I want to get back to programming.
[comment]

well, that was easy enough. Googled and found it at https://repo1.maven.org/maven2/asm/asm/3.2/, downloaded it and moved it to ~/.m2/repository/asm/asm/3.2/. now it's
attempting to load files over s3n:// instead of summarily complaining
that the filesystem was unsupported.
[comment]

oh yeah, I beat that cold. 3 days of massive probiotic use, with just an occasional scratchy throat, no coughing or sneezing to speak of. then another day with the yogurt and brine just to make sure. and now a day or two on the normal
regimen. I love this shit.
[comment]

:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILSException in thread "main" java.lang.RuntimeException: [download failed: asm#asm;3.2!asm.jar] at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1078) at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:296) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:160) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

and Googling doesn't give any results intelligible to a non-Hadooper. it's failing a download. so now what? what a load of crap.
[comment]

just had an epiphany. sometimes I need really verbose debugging messages when
testing, but don't want to see all that crap when running the program with full
data. so I took an idea I had years ago for JavaScript and adapted it for
Python:

that lambda x: None construct simply creates a no-op that accepts
any number of args. now I can sprinkle
DOCTESTDEBUG('array is: %s', listing) statements in my routines
and know they'll only be executed during doctests. neat, huh?

what I used to use for JavaScript stopped working a few years back. it switched
between alert(something) and console.log(something). but
now if console.log isn't working I just put in alerts until I find
the broken statement. rarely necessary any more, which is good, because I've
forgotten the exact syntax I used and don't know if I ever figured out what
in the ECMAScript changes broke it.
[comment]

Python doctests seem to ignore changing global variables, even if you
set them using:

>>> globals()['INFER_SCHEMA'] = True

though INFER_SCHEMA was recognized as True in the doctest, the called
routine still saw it as False. so I instead added a 3rd arg to the routine
to indicate whether to treat the columns as string or double values, and
the doctest works. sooner or later, even the program may work. the deadline
has been moved up to Tuesday. I'm hopeful.
[comment]

some virus is attempting to take me down, but I've so far been thwarting it
with probiotics: yogurt and brine pickled garlic and Brussels sprouts. but I've
got to lay off the coffee, wine, and beer, something that's very difficult for
me. been minimizing it, about the best I'm willing to do. but if the sickness
catches up with me, I'm going to wish I'd done better.
[comment]

Big Data: the domain of Hadoop, Pandas, and now Apache Spark, is something I've
been studiously avoiding until lately, but my current #1 client offloaded me
some work in it so I'm grudgingly getting caught up. apparently his client, a Data Scientist, is too busy sciencing data to write code. but it pays
well, so I'm riding that learning curve to prosperity. or at least to covering
next month's bills.
[comment]

for a couple of days I was remembering long conversations with a big, dark-skinned guy but couldn't remember a name, the subject of the conversations, or a place where these happened. I finally remembered earlier today that the face was
that of a Petaluma post office clerk, the big Samoan-looking guy. but I never
had any deep conversations with him. I must have dreamt it.

seeing a lot of Waltheria indica, the fake "marijuana substitute" sold online
for big bucks with claims that it is "rare". it's all over the place down
here. more than once I saw it and thought I'd found Damiana (Turnera diffusa)
because the leaves and flowers are so similar.

the arroyo out by Home Depot is lush with growth right now. a lot of sandia, watermelon vines, growing, plenty of amaranth, "wild" tomatoes, and
a lot more, some of which I knew and many I didn't.
[comment]

maybe you noticed the other day that I couldn't get the silicone flap back into
the snorkel I took apart. you can't push it through, you have to pull
it, because pushing makes it compress, and then the tang won't go through the hole.
but because the piece is curved, and only about 3/4" inner diameter, I couldn't
get any pliers through from either the mouthpiece or the intake.

but yesterday it hit me: just bend the flap, push the plier tip in through the
exit, and pull the tang through by pushing on the plier handles. so simple. it took about 2 seconds to get it once I tried in this manner.

that's the problem with most software I've seen. it's not thought through
enough. developers go through all these hoops to create a special set of curved
needlenose pliers, when if they come at the problem from a different angle,
they could solve it with existing tools for a fraction of the cost.
[comment]

I jogged up to the molinito today and the roadblock was gone. I don't know the story. the truckers could have given up; there may have been a negotiation; or
the government could have sent police and/or military force to break it up. if
that was the case, then it's far from over.

getting the banks to hand out 10 peso coins is like pulling teeth, but today
I pulled 500 pesos worth. maybe going early is key.

I can see why they don't like handing them out. they probably don't get issued
very many because they're no doubt expensive to make. and I'm sure they don't
want people hoarding them, which in fact is my intent, but it never works out
that way. I always end up spending them when my income isn't coming in fast
enough.
[comment]

truckers protesting, as I understand it, an increase in the price of gasoline
have blocked ports and land routes in Baja California, and perhaps all of Mexico, I haven't investigated much yet. in any case, things could get ugly. I just
went and stocked up on a couple months' worth of whole wheat flour, and will go out tomorrow and get some other staples. panic buying hasn't yet hit La Paz
in any noticeable amount.
[comment]

this Mexican-made hand truck shows some smart engineering. instead of grinding
down 3/4 inch round steel stock to make an axle, as I did, they used 5/8" square
stock. much less milling to do to reach the 17mm diameter of the most common
bearings you find down here. then they turn the ends down to 1/2" and cut threads to hold the wheel on with a 1/2" nut and lockwasher.

I got better measurements to drill the holes in the next pulley: 69mm through
the axle, and about 48.6mm point to point along the square. but the sun angle
was already too low by the time I got around to it, so I'll have to drill it
tomorrow.
[comment]

watched First Blood again, and confirmed that Rambo's arm is still
not in a sling after his jump off the cliff. so
I guess I'll remain in this sector of the multiverse. I wish I could find
someone who remembers it the way I do. it's just weird.

I can't figure out how this whole thing works. if I died in one timeline, how is
it that the "I" that remains remembers how things were in that one? wouldn't this John Comeau only remember what happened in this branch of spacetime?
[comment]

today I jogged the 4-miles-plus out to Home Depot, stopping along the way at
that newish hardware store before Chedraui, and again at the Mega shopping
center. at the hardware store I found really cheap welded 3- and 4-way corner braces
for making structures out of 3/4" EMT, and really cheap, about 2 to a little over 3 dollars each. keeping those in mind for future projects. at Mega
I dropped my spent batteries in the bin at Telcel, and then got the $20 peso
(dollar) meal deal at Mega. today it was fried fish and red cabbage. the amount
was equivalent to about 3 fish tacos, of course without the tortillas, so I
got a decent meal for my dollar. at Homer's I got some stainless and zinc plated
hardware and some other stuff for my vehicle and snorkel ideas.

my latest stab at an entry for the KGC is bolting pulleys to hand truck wheels.
I found a bunch of cast pot metal pulleys, made in the good old USA, at Ace
hardware for about $100 pesos each, so about 5 bucks. added some 1/4" couplings,
and some longer bolts, and after a few botched attempts at drilling holes in
the pulleys, have one attached. I'll see if I can do a better job on the other
one tomorrow. since I found some longer spacers at the Depot, maybe I won't
have to cut off the bulge with the set screw (I know there's a name for that
part, but I'm blanking on it now) like I did with the first one.

I can use the pulleys both for propelling the thing by hand, kinda like a
wheelchair, and if I can make a rig for a motor, can run a belt to run it
by solar power. I have an idea for mounting my solar panel up top, but
haven't tested it yet. I bought a bunch of 1/4" threaded rod to make custom
U-bolts, because buying the premade ones is so expensive, and they're rarely
the right size for what I want.

belt drive is superior in some ways because it's more tolerating of misalignment
than sprockets and chains. I can use low-tech slippage instead of high-tech
freewheels. and maybe the pulley can double as the disc for a disc brake.

also, when I reached the big arroyo before Walmart, I jogged it as far as the
first crossing road. at the Home Depot end I found a couple of those wild
tomato plants, the ones shaped like Roma tomatoes, that I've seen for years.
debating whether to wait until one ripens and harvest it for seed, or just go
with my trowel, dig one up, and attempt to transplant it.
[comment]

my last 2 batches of boiled city water have tasted sweetish, nutlike. I don't
know if it's something tainted from the city supply or leached from the plastic
gallon container into which I poured it both times, still hot, after boiling.
but when I switch back to RO water in a couple days, my liver will probably
appreciate it.
[comment]

enabled jffs2 on the old Linksys and tried to install tcpdump. 650KB, what?
not enough space. so I extracted the ipk file, which is just a tar.gz, and
gzipped the binary, and saved it in /jffs/usr/lib, along with the libpcap
libraries that did install.

well, I've already started drinking as of about an hour ago, so it's safe to
say I didn't and won't accomplish any of my goals for 2016. I didn't pay down
my debts any significant amount; I didn't get a "kinetic sculpture", or really
any human-powered vehicle built; and kybyz is still unusable for
anybody but me.

however, I did manage to live on my $9/day budget, with $3 each going
towards food, regular non-food (COSF dues, server fees, gym membership, etc.), and miscellaneous (supplies, tools, personal non-food) expenses. and I wasn't a
burden on taxpayers, only on my long-suffering lady.

not sure I'll make any goals for the upcoming year. I never seem to accomplish
them anyway. maybe I'll just keep my list of projects, and hack away at them
as I find the ambition.
[comment]