Technobabelfish

Sunday, May 3, 2015

As I keep playing with Salt, I’m forming a love/hate relationship with it. Love: It automates and codifies installing and configuring things on machines. Hate: The documentation and unexpected/odd behavior.

Many of the things that I hate are outlined in a blog post found here. I want to add to that.

First, documentation. Its more of a stream of thought, than documentation. For example, I tried adding a git repo as a ‘formula’. I found this webpage, in their documentation. Its supposedly a walk through. There is no complete example, and for many of the pieces, it said “do this” with no explanation of how to “do this”. "Do this” is some very specific configuration you must do, which leads to the documentation for that configuration, which has other configuration requirements. When you start discovering this, it feels like you’re going down a rabbit hole. “Here a walk through to do A. You must configure B”, “Here’s some documentation on B, in some circumstances you need to configure C first”, and so on. If you’re going to present yourself as a walkthrough, please put *all* of the requirements into that walkthrough.

The other problem I have is that the app sometimes does…. Nothing. No errors, warnings. Nothing. Turns out, that its a timeout situation, where the action just took too long. But the app doesn’t tell you that. It should tell me that it got sick of waiting, and how to determine when it finishes. What if I’m running 10 of these commands, and half timeout? I have no way of knowing when they are finished, or if they are successful.

Salt is a cool tool, but dangerous. Its quite easy to get some things going and working. As you start to use it though, it starts to show its lack of polish. The shininess is wearing off for me.

Sunday, April 12, 2015

I’ve recently discovered Vagrant (https://www.vagrantup.com/). It has changed my developer life. Having consistent environments for development and testing is ideal, and Vagrant gives us that. Before it was much more difficult to get everyone up and running, but now its as easy as a single Vagrant file.

The next thing I recently discovered is Salt (http://saltstack.com/). I don’t know much about Salt, but I want to. I have a side project I’m working on that will greatly benefit from it.

So naturally, I wanted to use Vagrant to experiment with Salt. Vagrant even has built in support for Salt. But one thing I noticed is that there is not a simple example of running Vagrant and Salt, with a “master and minion”. The examples I found were either running “masterless”, or they have the master and minion on the same VM.

So after a bit of research and experimentation, I figure out how to set up a separate master and minion, in Vagrant. It takes care of the keys also.

There are two files involved, the Vagrantfile and a salt_minion.conf. Here they are:

Saturday, August 2, 2014

I ran into a problem recently with a CloudFormation stack, where I could not access my EC2 instances started within the VPC that was created.

After much pulling out of hair, and swearing, it turns out that I had a duplicate entry in my JSON. When I was attaching my subnets to the route table, I had both subnets listed with the same JSON name:

It took me a long time to find this. It wasn’t until I looked at my route tables that I found the route table had a single entry, and not two. Then I looked at my template, and noticed the duplicate name. I’m using boto to start my stacks, and I received no error or warning. I’m not sure if the main UI would have notified me either.

Sunday, July 27, 2014

I’ve been (very) slowly working my way through various project Euler problems as a side project. For some reason, calculating primes became a fascination for me. I eventually wrote a fast, one line python script that uses a sieve. Not long after that, I decided to revive my C64 hobby.

I’ve had a Commodore since I was 10 or 11.. Started writing assembly when I was 14 or so, and soon discovered the world of demos in the late 80’s. I’ve always wanted to do demos, and even coded a few simple ones as a teenager.

So coming back to the C64 after 25 years, refreshing my 6502 limited coding skills, I set to work. I set a challenge for myself. Could I get this C64 to calculate the prime numbers up to 1,000,000. At first glance, this seems impossible. Then on a lark, I decided to add another challenge: Could I do it in less than one minute.

Firstly, the C64 only has 65536 bytes of memory total. There are 74,000+ primes less than 1,000,000. How would I store them? Even the amount of processing seems to be too much. Having to do multi-byte math on an 8bit processor. Looping over the data hundreds of times. I wasn’t too concerned by the time. I knew if I could get everything to fit in memory, given enough time, it could calculate it.

The first challenge was storing the data. Going to the disk drive for swap is notoriously slow. First thing is first, can I eliminate some of the problem set? Why yes. I can cut the problem set in half, by only needed to look at the odd numbers. So now we’re down to 500,000 numbers. Next on the elimination list is anything divisible by 5. We already eliminated anything that ends in zero, so now we eliminate anything ending in 5. This is another 100,000 numbers. So we are down to 400,000. Still way exceeding the limits of the C64.

My break through came when I figured out that I do not have to store the actual numbers. I just need something to represent each number. In this case, I need to represent if a number is prime or not. A boolean variable. A bit. Which means I can store 8 numbers in a single bit. There’s 400,000 numbers I need to store, and at 8 numbers per byte, I need 50,000 bytes. Yes folks, I can fit the data into the C64’s memory!

To access a particular bit, I used two different numbers. One is an offset, and another is the index into the byte. Each byte contains 8 numbers, but in theory actually represents 20 numbers, but with the unneeded ones removed. So the index is always under 20.

In order to save time doing loop addition in order to multiply (remember, the 6502 doesn’t have multiply or divide instructions), I decided to pre calculate some stuff. What I noticed is that there’s a pattern to the multiplication. Starting with the prime we’re working with, we actually add 2 times the prime (2 X prime) to get the next number, which will not be prime. This skips the numbers that will result in even numbers. The resulting number is really (prime X 3). If we added the (2 X prime) again, we get the result of (5 X prime). We don’t really need this, since we already eliminated all stuff time 5 in our storage. So we add the (2 X prime) again, which really means that we’re adding (4 X prime). The result is (7 X prime). Again, we add (2 X prime) to get (9 X prime). To loop back around, we go one more time, (2 X prime) to give us (11 X prime). If you continue this pattern, you’ll notice that you’re adding like this: 2X, 4X, 2X, 2X.. Here’s a couple examples:

prime = 3

3 + (2 x prime) = 9

9 + (4 x prime) = 21

21 + (2 x prime) = 27

27 + (2 x prime) = 33 (loop around to beginning)

33 + (2 x prime) = 39

39 + (4 x prime) = 51

etc...

prime = 7

7 + (2 x prime) = 21

21 + (4 x prime) = 49

49 + (2 x prime) = 63

63 + (2 x prime) = 77 (loop around top beginning)

77 + (2 x prime) = 91

91 + (4 x prime) = 119

etc...

And so on. So what I did at the beginning of each prime number, was to pre-calculate (2 x prime) and (4 x prime). Store that into a table that looks like this:

(2 x prime), (4 x prime), (2 x prime), (2 x prime)

Due to the way I’m storing the numbers, I’m actually using 3 bytes. 2 bytes for the offset, and a single byte for the index. So when I do my addition, I add the table’s index to the current value’s index, check to see if it carries, if so, increase the offset by one, then add the table’s offset. That gives me the next value I need to turn off.

I start with a prime of 3, pre-calculate my adding table described above, and then start adding. On each add, I turn off a bit. I keep adding until I equal or exceed a value of 1,000,000. In this case, that is when the offset is greater than 50,000.

Once I do that, I search the data for the next prime. In this case it’s 7. Then when it searches after 7, it see that the 9 is not prime, and continues to 11. And so on and so on until we hit the square root of 1,000,000, which is 1,000.

So thats the general algorithm that I used.

Some implementation details. First, I didn’t write the division routine. I used one that I found online, and then modified it slightly. I didn’t need the full 32 bits that it was calculating, and I also didn’t need the error check. The routine did exactly what I wanted though, it allowed me to divide by 20.

Why 20? Because thats the number of ‘numbers’ stored in a byte. This means that the quotient is my offset. The remained tell me which bit I need to turn off, but how do I go from 20 to 8 bits? With a table:

The reminder from the division is the index to this table. This tells me exactly which bit I need to work with. I look at the data in the table via the offset, and AND it with the value from this table. Then store it back in the data at the offset.

To get the speed, I played the ‘follow the carry bit’ game. I looked through my code for places where I could remove a CLC(CLear Carry) or SEC (SEt Carry) before an ADC (ADd with Carry) or SBC(SuBtract with Carry). Sometimes I reordered something so I didn’t have to do one of those. Every one I removed was 2 cycles per loop iteration.

At one point in time, I had some self modifying code. I eventually got rid of it, because it wasn’t giving me enough, and I needed to the space.

Speaking of space, this whole implementation takes 1K. Thats 1024 bytes. That also includes a summing routine that adds all the prime numbers together after the sieve is complete. I needed to keep it under 1K, because of the memory layout. If it was any larger, I would have to play games moving pieces of the app around in memory, turning off the kernel and I/O pages. It was bad enough i turned off BASIC.

All in all, the sieve run in 55 seconds. Not too shabby for a 30 year old computer running at 1Mhz. I’m sure some of the demo scene people could get it faster, probably a lot faster. This happens to be my first 6502 assembly program in nearly 25 years.

And now the source. For my development environment, I use my MacBookPro with OSX, Kick Assembler, and VICE. I’ll also show my makefile, so others can build this easily. After I got it working, I transfer it to my real, stock, c64, and ran it there as a verification. See my other blog post on how I transferred it. Also, here’s a video of it running on my machine: https://www.youtube.com/watch?v=bl8DUJAPyFU

So you decided to go retro and buy yourself a Commodore 64 and a disk drive. Welcome to the world of retro computing. But now what?

You start by searching the net for how to get software onto the c64. You quickly find out there’s an entire realm of ways to hook up the 1541 to your computer through a variety of cables. Some cables are newer and faster than others, and overall is a confusing mess of documentation.

The other route is to buy an extra piece of hardware that lets you use memory cards, or USB, or something else to replace the disk drive, This will work, but it costs money, and well, it removes the disk drive from the picture! You want to go full retro!

I decided to go a different route. I built a ‘modern’ serial port for the commodore, following the instructions http://biosrhythm.com/?p=1136#comment-35325. Its mentioned there that you don’t need to hook up the VCC line, but in my case, I needed to. Without that VCC line, I was not able to transmit.

Once I had that, I spent a better portion of a day writing a BASIC application to handle a simple file transfer. I probably made this overly complicated, but it does have simplistic error checking (not very good, but better than nothing), and its pretty slow. But it does fit onto a single 64 screen!

A few notes. First, the baud rate is set at 600. Not 300, not 1200. 600. This was the fastest baud I could go at, and reliably transfer files. If you want to try 1200, change the 7 in the first line to an 8. For 300 baud, change it to a 6.

To change the filename, edit line 1000.

In order to send the file from my desktop (A MacBookPro), I used a python application that looks like this:

To change the serial port, edit the Serial() line. You’ll also need to change the open() to serve the file you want.

Start the c64 side first, then start the desktop side. The file will be written to disk on the c64 side. If it completes successfully, both sides will exit cleanly. When i was attempting things at 1200 baud, it seems the c64 side would miss a character and ‘lock up’.. At that point, I would have to stop both sides, and do the following on the C64 side:

close1

open1,8,15,”s:zterm”;close1

That will close the file that was being written to, and then delete the incomplete file. You then need to start over.

I would transfer a decent terminal program over first. I tried CCGMS, but that seems to have a buggy Xmodem transfer. Novaterm is too many files to transfer to get working. So I settled on HandyTerm from http://www.zimmers.net/anonftp/pub/cbm/c64/comm/. HandyTerm’s Xmodem seems to work, at least at 300 baud for me (no choice for 600). 1200 failed. I suspect my serial cable is flakey.

Wednesday, June 4, 2014

I’m working through some Project Euler problems with some co-workers. For some reason, I decided to see what it would take to create a single line python expression that calculates prime numbers to X. And do it in a reasonable amount of time. Yeah, its a pointless exercise, you would never use it in real life. There are faster methods of finding primes. But I like a challenge.

You pass in ‘max’ and let’er rip. I’ll explain how it works by breaking it down into its smaller components. First, lets break this into some higher level components:

[2] + sorted(set(SETA) - set(SETB) )

For performance reasons, I skip doing anything with the first prime, 2. This immediately cuts the set of data that I’m working with in half by eliminating anything thats even. Then because I’m working with sets, I have to sort them. Otherwise, its possible to get the primes out of order. If you don’t care about the order, you can eliminate the sorted() call.

The SETA is really just a set of all the even numbers from 3 to max. Using 20 as an example max, this part will give you {3, 5, 7, 9, 11, 13, 15, 17, 19}. Then SETB is all the non-prime odd numbers, in this case {9, 15}. When you subtract SETB from SETA, you end up with {3,5,7,11,13,17,19}. Then append that to the list of [2], and you end up with [2,3,5,7,11,13,17,19].

Thats the high level view. Now lets move to the craziness that calculates the non-primes. This is known as a prime number sieve, which you can read about on wikipedia and others. Lets start in the ‘outer most’ loop

for step inxrange(3, int(max**0.5) + 1, 2)

Right in the middle, you’ll find the above section. This iterates from three to the square root of the max. Plus one to avoid of by one numbers. Why use the square root? Because it cuts down on the amount of work. So if our max it 20, we’ll get a list of [3,5]. The square root of 20 is roughly 4.4, which will get truncated to 4. We then add one to avoid missing any values. Why only up to the square root, and not the max? Because we’re looking for composite numbers. Any numbers larger than the square root of the max multiplied together will be larger than the max.

for x inxrange(step * 3, max, step * 2)

The next inner loop. Here, we’re creating a list using the value from the outer loop. We’re starting with step * 3. Why? Well, step by itself may be prime, so we don’t want it in this list. And if its not prime, it would have been added by a previous iteration of the loop. So thats the original step value. The next step value will be double that, and will be even. We’re ignoring even numbers here, so we don’t want them in our list. So we need to go one more, step * 3. Then the step on this loop is step * 2, as to skip the even values. Taking the max = 20 example, we have have the step values of [3,5]. Going through this loop with 3, we’ll end up with [9,15]. Then when we go through with 5, we get [15]. Since these go into a set, we end up with the final product of {9, 15}.

if step %3or step==3

What is this all about? This was an optimization I added. It seems like this creates more work, but it actually eliminates 1/3 of the work that would happened without it. The only times we hit in the inner loop is if the step is not divisible by 3, or the step is actually 3. I ordered the if statement that way, because 2/3 of the time, the first part of the if will be true, and only hits the second part 1/3 of the time. I tried added step%5 or step==5, but that slowed things down. Diminishing returns.

So there you have it. A reasonably fast prime number finder in a single line of python. I’d be interested in faster/better implementations. I do have a faster two line version of this, thats about 10% faster. But thats not one line.

Tuesday, April 8, 2014

I’ve been working with ZeroMQ at work recently. Its a fairly good library for dealing with network sockets. It does take some time to think in “ZeroMQ-ese”. The learning curve is not really helped by their “guide”. The first two chapters introduce the API. In the 3r4d chapter, they start off by saying that using the API is too complicated, and to use some other higher level API.

This makes it very difficult for people who are unfamiliar with the way ZeroMQ works to follow what is going on. The API isn’t *that* difficult, and if they just provide some examples of how to use it, things would go a lot smoother. For example, I needed to do a multipart message, and I wanted to use the base API. There is *no* example of this in the guide. This is especially bad, because the most common pattern that people use (DEALER/ROUTER) requires use of multipart messages. I can only imagine the difficultly for people who use alternative language bindings that only use the lower level API.

ZeroMQ give you a lot for little cost. Just the auto reconnect worth the price of admission. Its a nice API, even if guide authors don’t think so.