tag:blogger.com,1999:blog-238888112017-07-29T08:46:13.094+01:00Prozak on CodeProzak writes code, in Java, in PERL, in LISP. Here are some ramblings about code.Juan Amiguethttps://plus.google.com/114007402765863356979noreply@blogger.comBlogger9125tag:blogger.com,1999:blog-23888811.post-12009403737555217712015-03-23T06:00:00.000+00:002015-03-23T06:03:45.283+00:00Writing writables is fun when you do it wrong<head><title>Your own writeables</title><!-- 2015-03-23 Mon 06:55 --><meta http-equiv="Content-Type" content="text/html;charset=utf-8" /><meta name="generator" content="Org-mode" /><meta name="author" content="Juan Amiguet" /><style type="text/css"> <!--/*--><![CDATA[/*><!--*/ .title { text-align: center; } .todo { font-family: monospace; color: red; } .done { color: green; } .tag { background-color: #eee; font-family: monospace; padding: 2px; font-size: 80%; font-weight: normal; } .timestamp { color: #bebebe; } .timestamp-kwd { color: #5f9ea0; } .right { margin-left: auto; margin-right: 0px; text-align: right; } .left { margin-left: 0px; margin-right: auto; text-align: left; } .center { margin-left: auto; margin-right: auto; text-align: center; } .underline { text-decoration: underline; } #postamble p, #preamble p { font-size: 90%; margin: .2em; } p.verse { margin-left: 3%; } pre { border: 1px solid #ccc; box-shadow: 3px 3px 3px #eee; padding: 8pt; font-family: monospace; overflow: auto; margin: 1.2em; } pre.src { position: relative; overflow: visible; padding-top: 1.2em; } pre.src:before { display: none; position: absolute; background-color: white; top: -10px; right: 10px; padding: 3px; border: 1px solid black; } pre.src:hover:before { display: inline;} pre.src-sh:before { content: 'sh'; } pre.src-bash:before { content: 'sh'; } pre.src-emacs-lisp:before { content: 'Emacs Lisp'; } pre.src-R:before { content: 'R'; } pre.src-perl:before { content: 'Perl'; } pre.src-java:before { content: 'Java'; } pre.src-sql:before { content: 'SQL'; } table { border-collapse:collapse; } td, th { vertical-align:top; } th.right { text-align: center; } th.left { text-align: center; } th.center { text-align: center; } td.right { text-align: right; } td.left { text-align: left; } td.center { text-align: center; } dt { font-weight: bold; } .footpara:nth-child(2) { display: inline; } .footpara { display: block; } .footdef { margin-bottom: 1em; } .figure { padding: 1em; } .figure p { text-align: center; } .inlinetask { padding: 10px; border: 2px solid gray; margin: 10px; background: #ffffcc; } #org-div-home-and-up { text-align: right; font-size: 70%; white-space: nowrap; } textarea { overflow-x: auto; } .linenr { font-size: smaller } .code-highlighted { background-color: #ffff00; } .org-info-js_info-navigation { border-style: none; } #org-info-js_console-label { font-size: 10px; font-weight: bold; white-space: nowrap; } .org-info-js_search-highlight { background-color: #ffff00; color: #000000; font-weight: bold; } /*]]>*/--></style><script type="text/javascript">/* @licstart The following is the entire license notice for the JavaScript code in this tag. Copyright (C) 2012-2013 Free Software Foundation, Inc. The JavaScript code in this tag is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License (GNU GPL) as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. The code is distributed WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU GPL for more details. As additional permission under GNU GPL version 3 section 7, you may distribute non-source (e.g., minimized or compacted) forms of that code without the copy of the GNU GPL normally required by section 4, provided you include this license notice and a URL through which recipients can access the Corresponding Source. @licend The above is the entire license notice for the JavaScript code in this tag. */ <!--/*--><![CDATA[/*><!--*/ function CodeHighlightOn(elem, id) { var target = document.getElementById(id); if(null != target) { elem.cacheClassElem = elem.className; elem.cacheClassTarget = target.className; target.className = "code-highlighted"; elem.className = "code-highlighted"; } } function CodeHighlightOff(elem, id) { var target = document.getElementById(id); if(elem.cacheClassElem) elem.className = elem.cacheClassElem; if(elem.cacheClassTarget) target.className = elem.cacheClassTarget; } /*]]>*///--></script></head><body><div id="content"> <p>How making your own writeables is the best thing you can do </p> <p>Hadoop is a very nice tool that has over the last few years become the workhorse of the new data science industry that is gathering maturity. It does a lot of things for you, mostly shuffling your data to the computer where it is needed for computation. To do this it has a transport mechanism taking messages across nodes constituted. The messages are constituted of two parts a routing address, which states where it should end, and a payload which is what you are transporting. This is as old as the world, and it also applies to hadoop. </p> <p>I will focus in this post on the payload, as the writables are effectively the entry point to make your application more reliable and maintainable. </p> <p>The basic writable interface is formed of two very simple methods. A <i>write</i> in which it is the responsibility of the developer to write all the fields that need to travel to a <i>DataOutput</i> object and a <i>readFields</i> in which all fields should be read back into the object state. The nice thing about the Data input and output interface is that it provides a read and write method for each basic type and Strings, freeing you from most of the hassle of casting and converting the bytes you would read into something usable. </p> <p>There are however a few idiosyncrasies with it. At first sight the fact that you expect an end of file exception <i>EOFException</i> when the output has been consumed sounds like a good idea. You just put your fields in a <i>while(true)</i> loop and you are a happy bunny. Until you make part of your state a collection of writables. Then you are in a pickle forever. Why? Very simple the contained type will catch the end of file exception react to it as expected and not forward it upstream since in all strict justice the exception has been consumed. So now you have to change the behaviour of a writable depending on its location in the encapsulation. Is it the encapsulated or does it encapsulate? If encapsulated it has to forward up the call chain the exception if it encapsulates it just consumes it. This is a text book example of why its a bad idea to rely on exceptions for program flow. </p> <p>When storing collections in a <i>DataOutoutStream</i> the simplest at most straight forwards way is to pre-pend the size of any collection to the collection elements, like that you do not have to loop forever in the loading of your data. </p> <p>As per usual the code is over at <a href="https://github.com/jamiguet/prozak-on-code">github</a>, this time I decided to use a maven as a build mechanism. Once cloned a simple mvn test will execute all the unit tests associated with the code. Hoping you learn from this valuable code which shows clearly how not to do things and you consider doing them properly. </p></div><div id="postamble" class="status"><p class="author">Author: Juan Amiguet</p><p class="date">Created: 2015-03-23 Mon 06:55</p><p class="creator"><a href="http://www.gnu.org/software/emacs/">Emacs</a> 24.3.1 (<a href="http://orgmode.org">Org</a> mode 8.2.4)</p><p class="validation"><a href="http://validator.w3.org/check?uri=referer">Validate</a></p></div></body><div class="blogger-post-footer"><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/2.5/"><img alt="Creative Commons License" border="0" src="http://creativecommons.org/images/public/somerights20.png"/></a></div>Juan Amiguethttps://plus.google.com/114007402765863356979noreply@blogger.com0tag:blogger.com,1999:blog-23888811.post-73122302123271451102015-03-07T09:10:00.000+00:002015-03-07T09:10:47.641+00:00Coalescing Tools<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head><title>Coalescing Tools</title><!-- 2015-03-07 Sat 10:09 --><meta http-equiv="Content-Type" content="text/html;charset=utf-8" /><meta name="generator" content="Org-mode" /><meta name="author" content="Juan Amiguet" /><style type="text/css"> <!--/*--><![CDATA[/*><!--*/ .title { text-align: center; } .todo { font-family: monospace; color: red; } .done { color: green; } .tag { background-color: #eee; font-family: monospace; padding: 2px; font-size: 80%; font-weight: normal; } .timestamp { color: #bebebe; } .timestamp-kwd { color: #5f9ea0; } .right { margin-left: auto; margin-right: 0px; text-align: right; } .left { margin-left: 0px; margin-right: auto; text-align: left; } .center { margin-left: auto; margin-right: auto; text-align: center; } .underline { text-decoration: underline; } #postamble p, #preamble p { font-size: 90%; margin: .2em; } p.verse { margin-left: 3%; } pre { border: 1px solid #ccc; box-shadow: 3px 3px 3px #eee; padding: 8pt; font-family: monospace; overflow: auto; margin: 1.2em; } pre.src { position: relative; overflow: visible; padding-top: 1.2em; } pre.src:before { display: none; position: absolute; background-color: white; top: -10px; right: 10px; padding: 3px; border: 1px solid black; } pre.src:hover:before { display: inline;} pre.src-sh:before { content: 'sh'; } pre.src-bash:before { content: 'sh'; } pre.src-emacs-lisp:before { content: 'Emacs Lisp'; } pre.src-R:before { content: 'R'; } pre.src-perl:before { content: 'Perl'; } pre.src-java:before { content: 'Java'; } pre.src-sql:before { content: 'SQL'; } table { border-collapse:collapse; } td, th { vertical-align:top; } th.right { text-align: center; } th.left { text-align: center; } th.center { text-align: center; } td.right { text-align: right; } td.left { text-align: left; } td.center { text-align: center; } dt { font-weight: bold; } .footpara:nth-child(2) { display: inline; } .footpara { display: block; } .footdef { margin-bottom: 1em; } .figure { padding: 1em; } .figure p { text-align: center; } .inlinetask { padding: 10px; border: 2px solid gray; margin: 10px; background: #ffffcc; } #org-div-home-and-up { text-align: right; font-size: 70%; white-space: nowrap; } textarea { overflow-x: auto; } .linenr { font-size: smaller } .code-highlighted { background-color: #ffff00; } .org-info-js_info-navigation { border-style: none; } #org-info-js_console-label { font-size: 10px; font-weight: bold; white-space: nowrap; } .org-info-js_search-highlight { background-color: #ffff00; color: #000000; font-weight: bold; } /*]]>*/--></style><script type="text/javascript">/* @licstart The following is the entire license notice for the JavaScript code in this tag. Copyright (C) 2012-2013 Free Software Foundation, Inc. The JavaScript code in this tag is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License (GNU GPL) as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. The code is distributed WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU GPL for more details. As additional permission under GNU GPL version 3 section 7, you may distribute non-source (e.g., minimized or compacted) forms of that code without the copy of the GNU GPL normally required by section 4, provided you include this license notice and a URL through which recipients can access the Corresponding Source. @licend The above is the entire license notice for the JavaScript code in this tag. */ <!--/*--><![CDATA[/*><!--*/ function CodeHighlightOn(elem, id) { var target = document.getElementById(id); if(null != target) { elem.cacheClassElem = elem.className; elem.cacheClassTarget = target.className; target.className = "code-highlighted"; elem.className = "code-highlighted"; } } function CodeHighlightOff(elem, id) { var target = document.getElementById(id); if(elem.cacheClassElem) elem.className = elem.cacheClassElem; if(elem.cacheClassTarget) target.className = elem.cacheClassTarget; } /*]]>*///--></script></head><body><div id="content"> <p>A past Friday whose date I refuse to remember I hit a wall on my Hadoop experiments. A Reducer was dying out of memory and no matter how many of his peers came to the rescue the error was consistent. </p> <ul class="org-ul"><li><i>Nooo, I refuse to work on such large data. What do you think I am, a donkey?!?</i> - My reducers were saying. </li></ul> <p>Obvious things first I went for the two courses of action as most people not understanding a problem in software do. </p> <ul class="org-ul"><li>Here, have more reducers to help you out. <i>-D mapreduce.job.reduces=600</i> </li></ul> <p>Little did I realise that if your reducer is stateless that helps little. </p> <ul class="org-ul"><li>Here, have more memory - </li></ul> <p>As a manger would say, if more developers wont cut it I still have to throw more money at the problem so that it goes away. I have always had little empathy with such kinds but really its not their fault if they do not understand the problem and just want it to go away. Their decisions, in most cases, are still rational and full of good will, but they come from a completely different angle. </p> <p>By time I started throwing memory at the problem with the almighty <i>-D mapreduce.reduce.memory.mb=14336</i> and <i>-D mapreduce.reduce.java.opts=-Xmx14g</i> I knew that if even that worked, I was only buying time. This was still the smallest of my production datasets so a quick fix would not cut it for the larger ones. </p> <p>So then I took my week-end away from the computer. I did the usual shopping and other chores, but already on Friday evening the solution was brewing at the back of my mind. </p> <div id="outline-container-sec-1" class="outline-2"><h2 id="sec-1"><span class="section-number-2">1</span> Problem description</h2><div class="outline-text-2" id="text-1"><p>What the reducer was supposed to do was to merge the collections in the state of a bunch of objects into an object of the same type but will all the data. One of those collections was in the form of a string which had to be concatenated. Further since this is Hadoop the containing object had to be writable and be capable of flying through the whole infrastructure for subsequent processing. That meant the data had to be encoded to occupy as little space possible and the representation had to be unique, that is the same string had to have, unequivocally, the same representation. </p></div></div><div id="outline-container-sec-2" class="outline-2"><h2 id="sec-2"><span class="section-number-2">2</span> Ideal theoretical solution</h2><div class="outline-text-2" id="text-2"><p> I started looking at hashes, but I had another constraint in my problem. The string representations have to be comparable and the comparator needs to order the strings in a given way that I want. If I had a non inversable hash function, which most are, at least it had to be monotonic, good luck with that. A collision free hash would have done the trick for the first criteria, but then I needed the second. Now if there are any mathematicians out there with time and would like to design such a function I will gladly speak with you. Such a function or class of functions would make my life much easier. </p> <p>In the absence of such a nice thing, which may well exist but I do not know about. I had to apply another strategy. </p></div></div><div id="outline-container-sec-3" class="outline-2"><h2 id="sec-3"><span class="section-number-2">3</span> Brute force and Ignorance</h2><div class="outline-text-2" id="text-3"><p>The choice for data representation was clear it had to a lose-less compression. Nice, <i>java.util.zip</i> is full of goodies to do just that. </p> <p>Since memory was not big enough, I went for disk. The idea was to coalesce the state of the objects into files. Concatenating in the file the state and then reading the compressed versions of the files into memory. I am lucky in that the state of the object, specially the string compresses well since it has a lot of repetition. </p> <p>Now, there may be a need to coalesce coalesced objects. That is my merging of state can be done in stages. To solve that problem I needed to be careful because most compression mechanisms need a dictionary that is they compress what they have seen not something that may potentially arrive later. I could have potentially used something like a Bloom filter, but if my strings are not preserved verbatim I need to re-think some of the theory of what I do with them and that may take yet another PhD. </p></div></div><div id="outline-container-sec-4" class="outline-2"><h2 id="sec-4"><span class="section-number-2">4</span> Rules of engagement</h2><div class="outline-text-2" id="text-4"><p>State is stored in objects uncompressed at first. When I need to merge I dump that state to a file, UUIDs make for nice file names. When I am done with merging I read the contents of the file through a <i>DeflaterInputStream</i> which hands be the compressed representation of the file as bytes. To make things transport ready I BASE64 encode it and store it in my object state, making sure I remember it is compressed. When I next have to merge the state of objects which are compressed I write back the state to file through a <i>InflaterInutStream</i> which writes to file the uncompressed version of the data. I can then append all the data I want to that file, which will then be slurped back into memory and compressed in the process. </p> <p>Since I have several types I play with what I described above is what happens to strings. For the longs and ints that I also use I do the same only I write them to a <i>DataOutputStream</i> so that I do not have to handle their string representation with <i>toString()</i>, <i>valueOf()</i> and field delimiters explicitly. </p> <p>Of course you then have to use the data, and preferably with it not blowing up your memory. For the string a buffered reader way of accessing it will appear in the code base shortly, it will come to play in later parts of the processing. For the ints and longs a <i>DataInputStream</i> makes them available to you, so you can consume them and as long as you do not hold on to too many of them in memory you are good. </p> <p>To read the compressed versions you use the <i>read</i> function. To read the usable, non compressed versions, you use the <i>readData</i> function which hands over to you a <i>DataInputStream</i>. </p></div></div><div id="outline-container-sec-5" class="outline-2"><h2 id="sec-5"><span class="section-number-2">5</span> Here is one I made earlier</h2><div class="outline-text-2" id="text-5"><p>There is no need for you to program all that, you can view, review, inspect, poke use, abuse and extend an implementation of this, it is over at <a href="https://github.com/jamiguet/prozak-on-code/">github</a>. It ships with a test class, for my own sanity and so that utilisation is documented. And all is blended with an old-school Makefile, just because. </p> <p>Enjoy and let me know in the comments if you run into any problems or have a better solution. </p></div></div></div><div id="postamble" class="status"><p class="author">Author: Juan Amiguet</p><p class="date">Created: 2015-03-07 Sat 10:09</p><p class="creator"><a href="http://www.gnu.org/software/emacs/">Emacs</a> 24.3.1 (<a href="http://orgmode.org">Org</a> mode 8.2.4)</p><p class="validation"><a href="http://validator.w3.org/check?uri=referer">Validate</a></p></div></body></html><div class="blogger-post-footer"><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/2.5/"><img alt="Creative Commons License" border="0" src="http://creativecommons.org/images/public/somerights20.png"/></a></div>Juan Amiguethttps://plus.google.com/114007402765863356979noreply@blogger.com0tag:blogger.com,1999:blog-23888811.post-76230662721247902562012-08-04T15:56:00.000+01:002012-08-04T15:56:05.316+01:00Who said you can not multiply in a linear program?<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head><meta http-equiv="Content-Type" content="text/html;charset=utf-8"/><meta name="generator" content="Org-mode"/><meta name="generated" content="2012-08-04 16:52:50 CEST"/><meta name="author" content="Juan Amiguet-Vercher"/><meta name="description" content=""/><meta name="keywords" content=""/><style type="text/css"> <!--/*--><![CDATA[/*><!--*/ html { font-family: Times, serif; font-size: 12pt; } .title { text-align: center; } .todo { color: red; } .done { color: green; } .tag { background-color: #add8e6; font-weight:normal } .target { } .timestamp { color: #bebebe; } .timestamp-kwd { color: #5f9ea0; } .right {margin-left:auto; margin-right:0px; text-align:right;} .left {margin-left:0px; margin-right:auto; text-align:left;} .center {margin-left:auto; margin-right:auto; text-align:center;} p.verse { margin-left: 3% } pre { border: 1pt solid #AEBDCC; background-color: #F3F5F7; padding: 5pt; font-family: courier, monospace; font-size: 90%; overflow:auto; } table { border-collapse: collapse; } td, th { vertical-align: top; } th.right { text-align:center; } th.left { text-align:center; } th.center { text-align:center; } td.right { text-align:right; } td.left { text-align:left; } td.center { text-align:center; } dt { font-weight: bold; } div.figure { padding: 0.5em; } div.figure p { text-align: center; } div.inlinetask { padding:10px; border:2px solid gray; margin:10px; background: #ffffcc; } textarea { overflow-x: auto; } .linenr { font-size:smaller } .code-highlighted {background-color:#ffff00;} .org-info-js_info-navigation { border-style:none; } #org-info-js_console-label { font-size:10px; font-weight:bold; white-space:nowrap; } .org-info-js_search-highlight {background-color:#ffff00; color:#000000; font-weight:bold; } /*]]>*/--></style><script type="text/javascript"><!--/*--><![CDATA[/*><!--*/ function CodeHighlightOn(elem, id) { var target = document.getElementById(id); if(null != target) { elem.cacheClassElem = elem.className; elem.cacheClassTarget = target.className; target.className = "code-highlighted"; elem.className = "code-highlighted"; } } function CodeHighlightOff(elem, id) { var target = document.getElementById(id); if(elem.cacheClassElem) elem.className = elem.cacheClassElem; if(elem.cacheClassTarget) target.className = elem.cacheClassTarget; } /*]]>*///--></script><script type="text/javascript" src="http://orgmode.org/mathjax/MathJax.js"><!--/*--><![CDATA[/*><!--*/ MathJax.Hub.Config({ // Only one of the two following lines, depending on user settings // First allows browser-native MathML display, second forces HTML/CSS // config: ["MMLorHTML.js"], jax: ["input/TeX"], jax: ["input/TeX", "output/HTML-CSS"], extensions: ["tex2jax.js","TeX/AMSmath.js","TeX/AMSsymbols.js", "TeX/noUndefined.js"], tex2jax: { inlineMath: [ ["\\(","\\)"] ], displayMath: [ ['$$','$$'], ["\\[","\\]"], ["\\begin{displaymath}","\\end{displaymath}"] ], skipTags: ["script","noscript","style","textarea","pre","code"], ignoreClass: "tex2jax_ignore", processEscapes: false, processEnvironments: true, preview: "TeX" }, showProcessingMessages: true, displayAlign: "center", displayIndent: "2em", "HTML-CSS": { scale: 100, availableFonts: ["STIX","TeX"], preferredFont: "TeX", webFont: "TeX", imageFont: "TeX", showMathMenu: true, }, MMLorHTML: { prefer: { MSIE: "MML", Firefox: "MML", Opera: "HTML", other: "HTML" } } }); /*]]>*///--></script></head><body> <div id="preamble"> </div> <div id="content"> <p>Lately I have been using linear programming, for work. Linear programs differ much from procedural programming. They are a way of stating an optimisation problem in a very effective way. </p><p>A linear program consist of variables, a target function, and constraints. Both the target function and the constraints are expressed in terms of variables. The target function is the expression which you aim to maximise or minimise in line of the constraints you specified. </p><p>It is not called linear by chance, both the target function and the constraints have to be linear i.e. of the form \(a + b * x &lt;=&gt; c\) where \(a,b\) and \(c\) are constants. Here we only have one variable \(x\) but you can have as many as your solver can handle. The variable can be of several types. It can be an integer, a boolean (restriction on integer with only values 0 and 1), or a real number. Restricting the variable to a boolean enables to describe optimisation problems which can involve mappings. </p><p>Lets say we want to map the elements of two sets \(A\) and \(B\). That both sets and all the combinations of elements are enough so we need a computer but not enough for all the combinations to be too many to enumerate in practice. We also assume that we can calculate the cost of a given mapping. That is that the cost \(c_11\) of \(a_1\) being mapped to \(b_1\) is computable in practice. We can then define a target function as the sum of the costs of the active mappings \(target =\sum_i,j m_{ij}*c_{ij}\). The variable \(m_{ij}\) is 0 if \(a_i\) and \(b_j\) are mapped. We can then specify as constraints any other rules that we may want. At one stage the constraints may become constraints about the mappings themselves. We can very simply imaging that the costs are altered if two elements are mapped. </p><p>For the sake of the argument lets say that we want to decrease the cost by a constant \(z\) if \(a_1\) is mapped to \(b_1\) and \(a_2\) is mapped to \(b_2\). That is modifying the target function to be \(target =\sum_{i,j} m_{ij}*c_{ij} - m_{11} * m_{22} *z\) This last statement would all be fine if you could multiply the two variables \(m_{11}\) and \(m_{22}\). You need for both mappings to exist i.e. be 1 in order for \(z\) to be subtracted from the target function. The thing with linear programs is that you can not multiply two variables, the problem definition would then cease to be linear. We would not be able to solve the mapping with the same techniques. </p><p>Here we are then truly stuck. We can not perform an and. We can not multiply the two boolean variables, which is another way of saying the same. But no, there is a way of doing it. </p><p>What we need is a set of constraints that hold emulating the truth table of an and. That is given three variables \(a\), \(b\) and \(c\). We want \(c\) to be equal 1 only when both \(a\) and \(b\) are 1 and 0 otherwise. Bear in mind that the variables can only take one of two values 0 or 1. And that the value that the variable takes has always to be the maximal if we maximise, or the minimal otherwise. </p><p>We will present here the method if we are maximising. </p><p>Since all the constraints have to hold, the trick is to first find a constraint that will give us the result we want. That is when both variables are 1 the result is also worth 1 $a + b &lt;= 1 +c $ is such a constraint. Pretty but there is a but. The problem is now that for all other cases of either \(a\), \(b\) or both being 0 the constraint also holds for \(c\) being 1. Which is not what we want. OK, not a cool thing, but we can define other constraints which will prevent those cases from occurring. </p><p>This can be done with the help of the two following constraints. \(c &lt; = a\) and \(c &lt; = b\). This means that when \(a\) or \(b\) are 0 \(c\) has to be 1. Technically the constrains as stated hold for \(a = 0, c = 0\) but since we are maximising \(c\) is forced to take the value 1, in such scenario. (NOTE: To better understand this last point look at the subtle role that the free variables play in the SIMPLEX algorithm.) If we then combine all the constraints we have: </p><table border="none" cellspacing="0" cellpadding="6" rules="none" frame="none"><caption>Constraints mimicking an and operation</caption><colgroup><col class="left" /></colgroup><tbody><tr><td class="left">$a + b &lt; = 1 +c $</td></tr><tr><td class="left">\(c &lt; = a\)</td></tr><tr><td class="left">\(c &lt; = b\)</td></tr></tbody></table> <p>Which hold for the following values of \(a\), \(b\), and \(c\) </p><table border="none" cellspacing="0" cellpadding="6" rules="none" frame="none"><caption></caption><colgroup><col class="left" /></colgroup><tbody><tr><td class="left">\(a = 0, b = 0, c = 0\)</td></tr><tr><td class="left">\(a = 1, b = 0, c = 0\)</td></tr><tr><td class="left">\(a = 0, b = 1, c = 0\)</td></tr><tr><td class="left">\(a = 1, b = 1, c = 1\)</td></tr></tbody></table> <p>But not for: </p><table border="none" cellspacing="0" cellpadding="6" rules="none" frame="none"><caption></caption><colgroup><col class="left" /></colgroup><tbody><tr><td class="left">$a = 0, b = 0, c = 1$</td></tr><tr><td class="left">$a = 1, b = 0, c = 1$</td></tr><tr><td class="left">$a = 0, b = 1, c = 1$</td></tr><tr><td class="left">$a = 1, b = 1, c = 0$</td></tr></tbody></table> <p>Which is exactly how the truth table of an and looks like. So we have found a way to emulate an and, which is the multiplication of two boolean variables in a linear program. </p><p>Impossible is just another word, for look-around. </p></div> </body></html><div class="blogger-post-footer"><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/2.5/"><img alt="Creative Commons License" border="0" src="http://creativecommons.org/images/public/somerights20.png"/></a></div>Juan Amiguethttps://plus.google.com/114007402765863356979noreply@blogger.com0tag:blogger.com,1999:blog-23888811.post-63817409908139404242011-12-19T22:26:00.000+00:002011-12-19T22:26:18.619+00:00Clocks with dynamic hands... and how Fortran makes sense<div>I have recently encountered the need to iterate through arrays with a dynamic number of dimensions. The usual way of having nested for loops does not work any more since the number of for loops would need to be adapted at run time, Generating code is cool, but to do it at runtime based on a dynamic data structure in a language which is not LISP can be infeasible.<br />The solution we propose comes in two parts, first the vectorisation of the matrix in order to facilitate access to the elements i.e. make it more consistent. When thinking about this, Fortran all of a sudden made sense.<br /><br />The second part involves one of the possible implementations of the access to the data structure.<br />This new representation makes for a very simple way of iterating through arrays of dynamic dimensions.<br />Lets first provide a bit more context to illustrate all this.<br />In Octave you can create data structures with any number of dimensions.<br />When you want to iterate through them you either know the number of dimensions ahead of time or you can use a handy operator ":" that converts to a vector any matrix. This operation is called vectorisation and as such can be found on wikipedia.<br /><br />This approach is fine but if you have to do something special per row, column or any slice of the array playing with indices can get scary. A way of getting the best of both words is to calculate the index of each array position as a function of an index of each one of the dimensions. When figuring out the mapping, which is really not difficult. It then becomes apparent why Fortran has its column first strategy.<br />The mapping is indeed very simple, and even simpler if we apply the Fortran convention<br />With the help of an array $D$ which describes the dimensions of the matrix.<br />$D$ can be used besides for validating if the element is inside bounds. To compute quickly what is the index of the element in the vectorized form of the matrix.<br />For the bound check we have that for a given index vector $V$ any of its elements $v_i$ for dimension $i$ has to be strictly bound by the value of $D$ for the same dimension $i$ a.k.a $d_i$ or in other words $0\leq v_i &lt; d_i$. <br />The position of an element in the vectorialised form of the array is given by the following formula. $p = \sum_i=0^|D| v_i \prod_j=1^i d_j $ as you can see the first dimension is skipped.<br />The cool thing about this formula is that it still holds for the row first or Cs' way of storing arrays, all you need to do there is swap $v_0$ and $v_1$ as well as $d_0$ and $d_1$.<br /><br />The other method although more involved programatically yields itself better to more complex manipulations. I like to call it the dynamic hand clock, because it functions just like a clock only the number of hands and the maximum value that they reach are dynamic.<br />The index of the current element is stored in an array $v$ where $v_i$ stores the current position for dimension $i$. Alongside the vector $v$ we also need a vector $m$ storing the maximum valid value for each dimension. The next index $v$ is then calculated by adding 1&nbsp;modulo the maximum value for that index. If a rol-over occurs the vector is incremented at position $i+1$.<br /><br />It becomes then very easy to react on such rol overs at each dimension, making this method very convenient for an iterator or a delegate pattern.</div><div class="blogger-post-footer"><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/2.5/"><img alt="Creative Commons License" border="0" src="http://creativecommons.org/images/public/somerights20.png"/></a></div>Juan Amiguethttps://plus.google.com/114007402765863356979noreply@blogger.com0tag:blogger.com,1999:blog-23888811.post-89322690370684430962011-06-22T00:42:00.001+01:002011-07-08T14:38:44.384+01:00First test with AsciiMathML{EAV_BLOG_VER:fb91687c1ec3a8ab}<br /><br />Since I am now going a phase of re-connecting with my past<br />I have decided to give this old blog some new posts.<br />I am back now to write code but mostly in Octave, which when integrated with strong typed languages does pose some problems. Notably how to iterate over a data structure which is completely dynamic, pretty much like the arrays in octave. And to do so without reverting to linked list or other container which may hurt performance and may not be as fun.<br /><br />As part of preparing to post that solution, I wanted to experiment with AsciiMath ML since I will need it fairly soon.<br /><br />The only thing preventing me from posting the solution is that I want to still figure one small part of the problem and that it is 1:35, and I have an early flight tomorrow.<br />But enough rambling In order to describe my solution I will need to use equations of the sort:<br /><br />$idx= j+i|J|+k|J||I|$<br /><br />You can imagine what this is all about.<div class="blogger-post-footer"><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/2.5/"><img alt="Creative Commons License" border="0" src="http://creativecommons.org/images/public/somerights20.png"/></a></div>Juan Amiguethttps://plus.google.com/114007402765863356979noreply@blogger.com0tag:blogger.com,1999:blog-23888811.post-1144578815691630542006-04-09T11:22:00.000+01:002006-04-09T11:37:39.910+01:00Fibbo KOne of the oldest most visited problems in any computer science course the Fibonacci series a very good explanation including sample algorithms can be found in the <a href="http://http://en.wikipedia.org/wiki/Fibbonaci_Series">almighty Wikipedia</a><br /><br />This program presents a slight extension in that you can calculate series formed by the sum of the k prior elements. All you have to do is change the size of the array that stores the k prior numbers.<br /><br />There is just a little trick to work out in which element of the array will the nth element be stored.<br /><br /><span style="font-weight:bold;"><br />;working array of only two el elements<br />;If the array is initialised to k the program will calculate series or order k<br />;That is f(i)=f(i-1)+...+f(i-k) when k=2 its the Fibonacci series<br />(setq arr (make-array 2))<br /><br />;initialising the array<br />; builds the first array for k as (0 ... k)<br />(dotimes (i (array-dimension arr 0))<br /> (setf (aref arr i ) i )<br />)<br /><br />;function summing all the elements in the array<br />(defun sum (vect)<br />(setq temp 0)<br />(dotimes (i (array-dimension arr 0))<br /> (setq temp (+ temp (aref vect i )))<br />)<br />(setq sum temp)<br />)<br /><br />;function calculating the fibbonaci-k series without recursion<br />(defun fib(pos)<br />(setq buffSize (array-dimension arr 0 ))<br />(dotimes (i pos)<br /> (setq j (mod i buffSize))<br /> (setf (aref arr j) (sum arr))<br />)<br />(setq extract (mod (- pos 1) (array-dimension arr 0)))<br />(setq fib (aref arr extract))<br />)<br /><br />(print '(I will calculate for you the nth fibonacci number n= ) )<br />(setq num (read))<br />(print '(using k=))<br />(print (array-dimension arr 0))<br /><br />(setq res (fib num))<br /><br />(print res)<br />;(print arr)<br /></span><br /><br />This is one of the typical algorithms that gains alot from being de-recursified. Like this the memory footprint is low, the computation time also, as we only add a modulus operation, and everyone is happy.<div class="blogger-post-footer"><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/2.5/"><img alt="Creative Commons License" border="0" src="http://creativecommons.org/images/public/somerights20.png"/></a></div>Juan Amiguethttps://plus.google.com/114007402765863356979noreply@blogger.com0tag:blogger.com,1999:blog-23888811.post-1144362935757906002006-04-06T23:35:00.000+01:002006-04-06T23:40:48.573+01:00Renversant?I decided to name this code snippet after the french term for reversing because this algorithm does just that. It iterates through the perimeter of the array, extracts the anti-diagonal &quot;(-1,1) direction&quot; and reverses that list. Then it simply asigns it back to the same array.<br /><br /><br />Little idea extracted from an exercise of &quot;Programming pearls&quot;<br /><br /><br />The program reads:<br /><br /><br /><strong><br />;example of array transposition based on rotations<br />;The method consists of the following: we reverse each of the anti-diagonal arrays<br />;first we have to extract and re assign each of the elements<br /><br />(setq arr (make-array '(11 11)))<br /><br />;initialise the array<br />(dotimes (i (array-dimension arr 0))<br /> (dotimes (j (array-dimension arr 1))<br /> (setf (aref arr i j ) (+ (* i (array-dimension arr 0 )) j) )<br /> )<br />)<br /><br />;utility method for computing the sum of the elements in a list<br />(defun sum(in)<br /> (setq temp 0)<br /> (loop for item in in do<br /> (setq temp (+ temp item))<br /> )<br /> (setq sum temp)<br />)<br /><br />;utility for working out the first co-ordinate of the diagonal<br />(defun next-diag(pos dim)<br /> (setq x (car pos))<br /> (setq y (cadr pos))<br /> (cond<br /> ((eq dim x)<br /> (setq y (+ y 1 ))<br /> )<br /> ((eq dim x)<br /> (setq x (- x 1))<br /> )<br /> )<br /> (cond<br /> ((eq y 0)(setq x (+ x 1)))<br /> ((eq y 0)(setq y (+ y 1)))<br /> )<br /> (setq pos (list x y))<br /> (setq next-diag pos)<br />)<br /><br />;method extracting the contents of an anti-diagonal of the array <br />;starting at the specified co-ordinates<br />(defun extract-diagonal(pos data)<br />(setq x (car pos))<br />(setq y (cadr pos))<br />(setq temp(list (aref data x y)))<br />(loop while(and (&gt; x 0) (&lt; y (- (array-dimension data 1) 1))) do <br /> (setq x (- x 1))<br /> (setq y (+ 1 y))<br /> (setq temp (append temp (list (aref data x y )) ) )<br />)<br />(setq extract-diagonal temp)<br />)<br />(defun insert-diagonal(pos data diagonal)<br />(setq x (car pos))<br />(setq y (cadr pos))<br />(dolist (k diagonal)<br /> (setf (aref data x y) k )<br /> (setq x (- x 1))<br /> (setq y (+ 1 y))<br />)<br />)<br /><br />(defun transpose(data)<br />(setq xdim (- (array-dimension data 0) 1))<br />(cond<br /> ((evenp xdim)(setq modif 2))<br /> ((oddp xdim)(setq modif 3))<br />)<br />(setq numdiag (- (sum (array-dimensions data)) modif ))<br />(setq diagpos '(0 0))<br />(dotimes (i numdiag)<br /> (setq diagpos (next-diag diagpos xdim))<br /> ;step one extract the diagonal to a list<br /> (setq diagonal (extract-diagonal diagpos data))<br /> ;step two reverse<br /> (setq diagonal(reverse diagonal))<br /> (print diagonal)<br /> ;step three profit ;-)<br /> (insert-diagonal diagpos data diagonal)<br />)<br />)<br /><br />(print 'Before )<br />(print arr)<br />(print 'transpose)<br />(transpose arr)<br />(print 'After )<br />(print arr)<br /></strong><br /><br />Program output is also very pretty:<br /><br /><strong><br />prozak@Mistral:~/Exploration/lisp/pearls%clisp transpose.lisp<br /><br />BEFORE <br />#2A((0 1 2 3 4 5 6 7 8 9 10)<br /> (11 12 13 14 15 16 17 18 19 20 21)<br /> (22 23 24 25 26 27 28 29 30 31 32)<br /> (33 34 35 36 37 38 39 40 41 42 43)<br /> (44 45 46 47 48 49 50 51 52 53 54)<br /> (55 56 57 58 59 60 61 62 63 64 65)<br /> (66 67 68 69 70 71 72 73 74 75 76)<br /> (77 78 79 80 81 82 83 84 85 86 87)<br /> (88 89 90 91 92 93 94 95 96 97 98)<br /> (99 100 101 102 103 104 105 106 107 108 109)<br /> (110 111 112 113 114 115 116 117 118 119 120)) <br />TRANSPOSE <br />(1 11) <br />(2 12 22) <br />(3 13 23 33) <br />(4 14 24 34 44) <br />(5 15 25 35 45 55) <br />(6 16 26 36 46 56 66) <br />(7 17 27 37 47 57 67 77) <br />(8 18 28 38 48 58 68 78 88) <br />(9 19 29 39 49 59 69 79 89 99) <br />(10 20 30 40 50 60 70 80 90 100 110) <br />(21 31 41 51 61 71 81 91 101 111) <br />(32 42 52 62 72 82 92 102 112) <br />(43 53 63 73 83 93 103 113) <br />(54 64 74 84 94 104 114) <br />(65 75 85 95 105 115) <br />(76 86 96 106 116) <br />(87 97 107 117) <br />(98 108 118) <br />(109 119) <br />(120) <br />AFTER <br />#2A((0 11 22 33 44 55 66 77 88 99 110)<br /> (1 12 23 34 45 56 67 78 89 100 111)<br /> (2 13 24 35 46 57 68 79 90 101 112)<br /> (3 14 25 36 47 58 69 80 91 102 113)<br /> (4 15 26 37 48 59 70 81 92 103 114)<br /> (5 16 27 38 49 60 71 82 93 104 115)<br /> (6 17 28 39 50 61 72 83 94 105 116)<br /> (7 18 29 40 51 62 73 84 95 106 117)<br /> (8 19 30 41 52 63 74 85 96 107 118)<br /> (9 20 31 42 53 64 75 86 97 108 119)<br /> (10 21 32 43 54 65 76 87 98 109 120))<br /></strong><br />This is the sort of algorithm that may be handy if you have little memory and need to do everything in place, there is no need for a second array making things easy on the memory side of things.<div class="blogger-post-footer"><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/2.5/"><img alt="Creative Commons License" border="0" src="http://creativecommons.org/images/public/somerights20.png"/></a></div>Juan Amiguethttps://plus.google.com/114007402765863356979noreply@blogger.com0tag:blogger.com,1999:blog-23888811.post-1143938929016836332006-04-02T01:48:00.000+01:002006-04-02T02:24:35.506+01:00A friend of mine passed me today a very good book <a target="_blank" title="Programing pearls in amazon" href="http://tinyurl.com/gwpud"><em>Programming Pearls</em> </a>by Jon Bentley, in it I saw a method for rotating the elements of a vector that I just had to implement. Since I now mainly write java for work I decided to implement it in Lisp.<br /><br />The method for rotating the vector is very simple and elegant it just holds in the following code.<br /><br />To rotate an array of size N&nbsp; by k positions<br /><br /><ol><li>We start by reversing the first k elements,&nbsp;</li><li>&nbsp;Then we reverse the N-k last elements</li><li>Finally we reverse the whole array<br /> </li></ol><br /><br />&nbsp;Its simple and completely fool proof its one&nbsp; of those methods that just work out of the box, the code is so simple that it can not go wrong. If it appears unclear play around with some examples until it cliks in.<br /><br />Here are the same three lines translated to lisp.&nbsp;<br /><br />&nbsp;<strong>;rotate by num elements the collection<br />(defun rot (collection num)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (myreverse collection 0 num)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (myreverse collection (+ num 1) (- (length collection) 1))<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (myreverse collection 0 (- (length collection) 1 ))<br />)</strong><br /></p><br /><br />I decided not to use any libraries to make the example easier to understand&nbsp; so here is the full code of a demo:<br /><br /><strong>;Example of an in place N rot i in O(N)<br />;this implementation is as presented in programming pealrs by Jon Bentley<br /><br />(setq arr (make-array 23))<br /><br />;psedudo algo is as follows for all data i between start and end swap<br />(defun myreverse(data start end)<br /> (dotimes (i (/ (- end start) 2))<br /> (setq k (- end i)) <br /> (setq l (+ start i))<br /> (swap data k l)<br /> )<br />)<br /><br />;standard swapping of two elements in a vector<br />(defun swap(vect i j)<br /> (setq temp (aref vect i))<br /> (setf (aref vect i) (aref vect j))<br /> (setf (aref vect j) temp)<br />)<br /><br /><br />;rotate by num elements the collection<br />(defun rot (collection num)<br /> (myreverse collection 0 num)<br /> (myreverse collection (+ num 1) (- (length collection) 1))<br /> (myreverse collection 0 (- (length collection) 1 ))<br />)<br /><br />;initialising the collection with simple integers <br />(dotimes (i (length arr))<br /> (setf (aref arr i) i)<br />)<br /><br />;invoking the rotation of five elements<br />(print &quot;Before: &quot; )<br />(print&nbsp; arr)<br /><br />(print &quot;(rot arr 5)&quot; )<br />(rot arr 5)<br />(print &quot;After: &quot; )<br />(print arr)</strong><br /><br />Running it in your favoutite Lisp interpreter will give you:<br /><br /><strong><br /><em>prozak@Mistral:~/Exploration/lisp/pearls%clisp rot.lisp<br /><br />&quot;Before: &quot;<br />#(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22) <br />&quot;(rot arr 5)&quot; <br />&quot;After: &quot; <br />#(6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 0 1 2 3 4 5) <br /></em><br /></strong><br /><br />There is some inherent beauty in the simple basic elements of programming, today I had the chance to re-discover the basics. Wonder what it will be like if I finally get round to read Knuth's books. ;-)<div class="blogger-post-footer"><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/2.5/"><img alt="Creative Commons License" border="0" src="http://creativecommons.org/images/public/somerights20.png"/></a></div>Juan Amiguethttps://plus.google.com/114007402765863356979noreply@blogger.com0tag:blogger.com,1999:blog-23888811.post-1142117582409854822006-03-11T22:52:00.000+00:002006-03-11T22:53:02.413+00:00What is this?I have spent a lot of my life writing, writing about waht happens in my life or around me. Now I spend most of my time writing code, so I will write about it here. Weapons of choice, Java, PERL, LISP, SciLab, and any other thing that I happen to get my hands on. Watch this space, if you may... <div class="blogger-post-footer"><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/2.5/"><img alt="Creative Commons License" border="0" src="http://creativecommons.org/images/public/somerights20.png"/></a></div>Juan Amiguethttps://plus.google.com/114007402765863356979noreply@blogger.com1