KnightStar

Saturday, February 18, 2017

I ran across the following article in a math geek book I found at Barnes and Noble:

So basically, you sum up the values of ever element that is in an even position and then you take every element in an odd position and multiply it by two. Then if it is a single digit add it to the sum. If it is two digits, add each digit to the sum separately. Then if the answer is evenly divisible by 10, it is a possible valid Card number.

CardArray is a table with the first sixteen elements loaded with each digit of a credit card and after running it through the above source it gave me a number evenly divisible by 10. Except one time when I got one of the digits wrong, then it was not evenly divisible by 10. So I have to say the algorithm works.

Wednesday, September 23, 2015

In the first graph it showed that the improvement had stopped at about 80 processes. I'm going with that, since on one of the runs with 80 processes, Mariadb started kicking out "No connections available" error. It only happened on one run. The 2nd chart shows that it is still improving, I will admit that I was having trouble getting accurate numbers with that many processes running.

75 processes seems ok, anything more and I think I was not making much additional progress.

Saturday, September 19, 2015

MySQL/MariaDB is supposed to be one of the fastest SQL servers on the market. But is that accurate? They do very fast selects (extracting data), but inserts and updates are considerably slower.

As a standard benchmark, I ran a program that inserted about 50,000 records into a table. This was done on an i5 desktop running MariaDB under Linux Mint. I broke it down into approximate 5000 record segments. The input file of random data was created in Free Pascal and written to a Pascal flat file.

The chart is pretty much flat over this group of data. My experience suggests, though, that if I put more data through, the inserts would slow down as the file got bigger.

[Notes on Pascal: The CLI Pascal program could not read from the Pascal Flat File Format and then write to the database. I suspect that there is a memory leak in the standard Pascal File Calls that was damaging the MySQL data structures. I ended up just reading the data and writing it out to Standard Output. I then piped this to MariaDB's command prompt. Neither process used very much CPU:

0.7 2 19670 myuser /usr/bin/mysql
0.0 3 19692 myuser ./LoadData

Why would I even write data in the Pascal Format? It's an obsolete file format that has been around since at least 1980. But, it is very fast. On my i5 laptop, it can write data out as fast as 11 million records per minute. So, if I was in a situation where I had to capture a lot of data, I would consider it.]

In truth, writing to MariaDB is not very fast. If you had a lot of data to import, it would be like watching molasses in January. I have worked with MySQL and it's performance isn't any better.

Both of these database servers are supposed to be multi-threaded. In theory it should be possible to increase throughput by running multiple inserts at the same time. But is that the case?

This show the average of how many records per second were being added when the table had 50,000 records in it. The X axis is the number of processes and the Y axis is the records being added per second.

This graph shows how long it takes in seconds to reach 20,000 records. Again the X axis is number of processes and the Y axis is the number of seconds required. These two graphs are basically mirror images of each other.

The conclusion: Running multiple processes to increase throughput does work. I have increased throughput by more than a factor of 10. I have not hit the point of diminishing returns. Even on a home computer, MariaDB could handle 25 proccesses hitting the same table. What's more, it's not taking that big of a hit on the CPU.

The above shows the load these were adding to my system while running five processes. The first column shows the cpu load for that specific process, and the second column shows what cpu the process is running on. Total for all 10 processes isn't much more than 0.6% spread across four CPU's. Even running 25 processes, I was only pulling 2.3% load spread across four processors. That is only about 0.6% per CPU.

Sunday, April 05, 2015

Just for fun, I compared MySQL access times from several different languages. Python which is REAL popular these days, Free Pascal (An Open Source Object Pascal Compiler) and c (gcc).

Here are the results:

And from the point of view of Transactions per second

I was really kind of surprised how much better c did then Free Pascal. I did a comparison several months ago that showed c and Free Pascal performing much closer to each other, so I assume the Free Pascal Library needs some work. Also, I was surprised that by changing MariaDB from a my-small to my-medium I saw no improvement. Below is a chart from the sort comparison:

Friday, January 09, 2015

Some Linux users
like to run servers and or services. You could be running Samba,
MySQL, Web Servers and maybe even a Chat Server. You could be
running these all on your desktop, but that wastes electricity and
uses up your desktop when you may want to use it for other purposes.
A primary desktop would need to be up all the time, and many desktops
have a high end video card which uses as much electricity as the rest
of the computer combined. Why not run them all on one or more
headless servers? This could be one big processor running multiple
virtual machines or on single less powerful computers (maybe even
“Raspberry Pi”s) each supporting a specific service. A headless
i3 could come in between 60 -70 watts, while a gaming rig can use 400
watts or more.

But how do you
monitor them? You can download and install Nagios or something like
that, but it could be more fun to write your own. You can design the
GUI app to fit your needs, not what someone says you need.
Maybe you want the app to sit in a little corner and not use up much
space, or maybe you want it large on one computer all by itself.

Second, what
language (Compiler or Interpreter) are you going to use.

There are many
choices and in reality you can make almost any of them work, so it is
going to come down to your personal choice. Okay, I like Lazarus,
which is a GUI front end to the Free Pascal Compiler. It is a true
compiler and generates relatively fast code. Lazarus will run
circles around Python performance-wise, but in truth c++ will
outperform Lazarus. Pascal is easier to write then c++ and you can
have a working app in much less time. In performance tests I have
seen, Free Pascal generated programs that run on average 2.5 times
longer then an equal program written in c++, but Python will run
about 6 times longer then an equivalent Free Pascal program. Your
results may vary, in the below charts, Python took 50 times longer to
run then Free Pascal, and the differences between Free Pascal and c
was marginal. On the learning curve, Python is the easiest to learn
(but not by much), followed by Pascal, and the hardest to learn is
c++.

If you think that
this charts is irrelevant to computing today. Sorting is a fairly
good way to compare performance of compilers or interpreters and the
source code is freely available on the internet. And lastly, I have
an application that had a Procedure that updated a MySQL Table with a
sort value that would change on a daily basis, and then sort the
output. It had a performance issue due to the fact that MySQL
updates data slowly, but can select it very quickly. I rewrote the
application to load the table into RAM, update it there and then run
an internal sort. This Procedure went from running minutes to
milliseconds. There may come a point where it will become necessary
to run a sort on data inside your program.

The Algorithm.

I knew that I wanted
to start ssh processes into each server and then process the returned
data. When you put the command right after the address to connect
to, the session will terminate after the command is run. I have
written previous programs that have started background processes that
just returned the output, which could then be processed by the main
application. This is relatively easy to do; you need to add Process
to the uses clause and declare a TProcess and a TStringList Variable.

uses

Classes,
SysUtils, Process;

Var

MyProcess: TProcesss;

MyStrList: TStringList;

And then the
following needs added at the point where you want to start the
process:

MyStrList
:= TStringList.Create;

MyProcess
:= TProcess.Create(nil);

MyProcess.CommandLine
:= 'ValidLinuxCommand';

MyProcess.Options
:= MyProcess.Options + [poWaitOnExit, poUsePipes];

MyProcess.Execute;

MyStrList.LoadFromStream(MyProcess.Output);

{
Do something with the String List Here }

MyStrList.Free;

MyProcess.Free;

That isn't really
all that hard, and it will work as long as the output is smaller then
about 2k. If it is going to be larger, we need to add more
complexity. This is explained in detail at:

http://wiki.lazarus.freepascal.org/Executing_External_Programs

Initially I was
thinking it would need to be a multi-threaded app, and I started
researching that option. I was afraid one stuck process could freeze
the whole application. The more I looked at that option, however,
the more complex it sounded. Each thread could not update graphical
components within the main app (that can crash the Window Manager),
so they would have to update basic objects (Integers, Strings, floats
and such) within the main thread and then have the main thread update
the component objects and then there is added complexity of needing
to use Critical Sections (this would prevent more than one thread
from accessing shared object at the same time).

I then thought, well
maybe I can start each subprocess and not have to wait on it
finishing, and then just create a loop that tests each subprocess's
state. If the subprocess has finished, process its output and then
just start a replacement subprocess. No messing around with all of
the multi-thread stuff.

So first of all I
declared a simple record:

TProcessRec
= Record

Process:
TProcess;

Addr,

User,

Password:
String;

Pct:
Double;

StrList:
TStringList;

End;

And then declared an
Array of that Record:

ProcessTable:
Array[1..KKArraySize]
of TProcessRec;

Now we want to set
up the main loop, but first we want to prime the pump and start all
the ssh processes needed:

For
i := 1 to KKArraySize Do

Begin

If
Length(ProcessTable[i].Addr) > 1 Then

Begin

ProcessTable[i].Process :=
TProcess.Create(nil);

ProcessTable[i].StrList:=TStringList.Create;

ProcessTable[i].Process.CommandLine
:= CmdLine(i);

ProcessTable[i].Process.Options := ProcessTable[1].Process.Options +

[poUsePipes];

ProcessTable[i].Process.Execute;

end;

end;

I am just looping
through my Array and checking to see if there is an address defined,
if so start a ssh. Note, I removed the poWaitOnExit from
ProcessOptions because we don't want to stop and wait for each
process to finish. Instead, we will test the subprocess in the next
step. Also note that I am calling a function CmdLine to build the
Linux command I want to run. Below is that function:

This is pretty
simple, it just uses the Array to fill in user, Password and Address.
As you can also note, I am using sshpass to pass the password to
ssh and am also using the sar utility. sshpass can be found
in most repositories and sar is part of the sysstat monitoring
utilities which also can be found in most repositories. I found sar
gave cpu utilization numbers very close to what top gives, so
that is why I ended up using it. sar or whatever utility you
end up using will have to be installed on all of the servers you are
monitoring.

Well, that is a
mouthful! This is all in a Repeat ... Forever Loop, well, almost
a forever loop because I do have a way to break it. The first While
Loop, is just a means to get the App to wait 5 seconds before
checking on the status of all of our background processes. The first
two If statements seem silly and seem like they should be combined,
but ... Since I only initialized a Process in the array where the
Addr field had actual data in it, if we ran the “Process.Running”
statement for an uninitialized Process variable, the app would have
had a memory error and would have crashed.

ProcessTable[i].StrList.LoadFromStream

(ProcessTable[i].Process.Output);

―— just
loads the output from our ssh into our TStringList, which was
declared in our Array. A TStringList is actually just an array of
Strings and the output of the sar utility has the word Average
in it. So the following block of code just sums up the output from
the sar utility. I summed up user, nice, and system values
listed from the sar utility. You may decide on a different strategy.

For j:=0 to ProcessTable[i].StrList.Count-1 Do

Begin

If
copy(ProcessTable[i].StrList[j],1,7) = 'Average'
then

Begin

Sum := SumProcessorStats
(ProcessTable[i].StrList[j]);

PctStr := Format('%5.2f', [Sum]);

ProcessTable[i].Pct := Sum;

End;

End;

SumProcessorStats is
a function and it does the adding of the sar outputed fields
together.

Function
SumProcessorStats(s: String): Double;

Var

UsrStr,
NiceStr, SysStr: String;

Sum,
Dbl: Double;

begin

UsrStr
:= copy(s,24,6);

NiceStr
:= copy(s,34,6);

SysStr
:= copy(s,44,6);

Sum
:= 0.0;

Dbl
:= StrToFloat(UsrStr);

Sum
:= Sum + Dbl;

Dbl
:= StrToFloat(NiceStr);

Sum
:= Sum + Dbl;

Dbl
:= StrToFloat(SysStr);

Sum
:= Sum + Dbl;

SumProcessorStats := Sum;

end;

Finally, since the
previous subprocess had finished, this last piece starts a new
subprocess:

ProcessTable[i].StrList.Clear;

ProcessTable[i].Process.CommandLine
:= CmdLine(i);

ProcessTable[i].Process.Options
:= ProcessTable[1].Process.Options +

[poUsePipes];

ProcessTable[i].Process.Execute;

Security is an issue
here. The addresses, users and passwords could be hard-coded into
your App. If somebody broke into your system though, they could get
all of your passwords by looking at your source code. An improvement
would be to have all that info in MySQL Table and have the Operator
prompted for the password to access that table. Even better would be
to have the data all encrypted and have the Operator prompted for a
decryption key.

At some point, we
may want to start sending e-mails out when certain conditions occur.
Like if one of the servers is maxed out for several minutes, or if a
server becomes unresponsive. Sending out e-mails isn't that hard to
add. There is a library that can be downloaded for free called
Synapse (http://wiki.freepascal.org/Synapse). You would need to
download the library and add the following libraries to uses clause:

synautil, synacode,
blcksock, pop3send, smtpsend

I copied the whole
Synapse Library into the directory containing my app, and then
whatever Lazarus needed, it would compile. The content of the e-mail
would need to be added to a tstringlist and then to send the e-mail,
use something similar to the following:

If
Not SendTo('ToEmailAddress', 'FromEmailAddress', 'Subject',

'EmailServer', MsgStringList) Then

ShowMessage('E-Mail Didn''t Work');

ToEmailAddress could
be your gmail address on your phone, FromEmailAddress would be your
regular e-mail address, EmailServer will be your e-mail (Probably
something like mail.myservername.com). There is more information on
the Wiki Site listed above.

Performance

On an Intel i5, the
main thread uses about 6% of the CPU on one core. Very acceptable.

Thursday, November 06, 2014

Ok, I am a Republican. A liberal one, but still a Republican. First of all I didn't trust the Media and I didn't trust the Media's Polling. Some of the Polling Firms I was throwing out like Survey/USA and sometimes Rasmussen. I can show that certain Polling Firms lean further to the left then mean average. Ok, Rasmussen's poll data on Presidential Approval Rating was way off from companies like Gallup, but their State Senate Polls weren't as bad.

So I flew by my own assumptions:

1. The Media Organisations were for the Democrats and their polling was trying to manipulate the Electorate instead of inform them. They were party activists instead of being reporters.

2. They were not estimating turnout correctly. I took the conservative assumption that were off by 5%. So I adjusted the Democrat numbers down by 2.5% and upped the Republican numbers equally.

3. The Media was using too long of a range of polls. A Poll taken a month ago was no longer valid, in my opinion.

4. I also thought the Electorate was very angry and election would have a very close correlation to 1978!

So I wrote a little app that used my assumtions and did some calculations. It's crude but gave me some very good estimates.

This run was ran Tuesday morning, just before the election:

So my program estimated a pickup for the Republicans of 10 Senators, not bad. I think I did better then the Pundits. But I kept my mouth shut living in a multi-party family. Also, my assumptions were incorrect in 2012.

As far as the economy goes, it will not improve until ACA is removed. My proof is not perfect, it is statistical. If you take all of the recesions in American history and take an average and standard deviation of their length, you get something like:

Average: 21.5 months
StdDev: 15.75

So, virtually all recessions should be over in (21.5) + (3 * 15.75) = 68.75 months.
That is just under 6 years. If you don't understand why this is valid, it is because when you have a normal Distribution curve about 68% of all data will occur Plus or minus 1 Standard Deviation of the Mean (Average), 95% of all data will occur plus or minus two Standard Deviations of mean. And 99.97% of all data will occur within three Standard Deviations of Mean!

Professional Economists do use this and their published results are actually a little more conservative, with a slightly smaller mean and slightly smaller standard deviation. Which makes my argument just that much stronger.

So why are we still in a recession??? I know we technically we are not in one, but we arn't in a recovery either. Unemployment is still too high and the only ones doing well, are the top couple of percent. This is not normal. The answer is the cost of the ACA is adding about $15,000 dollars to the employers cost for a full time employee per year. When small business persons run their numbers through their spreadsheets they can't make it work and therefore, they don't risk their money and there is no business expansion.

I would also argue that if the Democratic Party wants to be viable again, they need to drop the ACA. If they have to over ride a Presidential veto, do it and do it quickly. If you let the clock run out and the Republicans end up doing it after the 2016 elections they will get all of the credit for an improving economy.