tag:blogger.com,1999:blog-57766490984817089702017-06-03T02:06:48.779-06:00bt-blogHi there, welcome to Brian Tanner's research homepage. I'm a provisional Ph.D candidate at the University of Alberta, in Computing Science (Artificial Intelligence).Brian Tannerhttp://www.blogger.com/profile/14181938377727002212noreply@blogger.comBlogger10125tag:blogger.com,1999:blog-5776649098481708970.post-45270076610039892792007-10-11T03:09:00.003-06:002008-07-20T12:45:33.595-06:00I moved AGAIN!<span style="font-size:180%;">UPDATE JULY 2008 AGAIN I MOVED<br />Come visit at <a href="http://research.tannerpages.com">http://research.tannerpages.com</a></span><br /><br />I've moved my site again.<br /><br />I know, I know... too often.<br /><br />The new site is using the Joomla content management system with a Rocket Theme theme. I really like it.<br /><br />See you at REDACTED.<span style="text-decoration: underline;"></span><a href="http://brian.tannerpages.com/"></a>Brian Tannerhttp://www.blogger.com/profile/14181938377727002212noreply@blogger.com0tag:blogger.com,1999:blog-5776649098481708970.post-37299297601751008812007-05-10T09:33:00.001-06:002007-05-10T09:38:43.150-06:00Why my page changes so often<div class="widget-content"> <p>I hate webpages. I mean, I like web pages, but I <b>despise</b> keeping my website up to date, looking good. </p><p>I haven't yet found a convenient solution to the web-site update management problem (wump). Every 6-12 months or so, I decide whatever I've been doing is exactly the wrong thing and I should try something new. The iteration of my website before this was a Google groups site <a href="http://groups.google.com/group/brian-tanner/web/">here</a> and before that it was a site I threw together with Apple's iWeb software. Before that it was phpNuke, postNuke, Mambo, Wordpress, and numerous hand crafted web sites. <br /></p><p>I'm really hard to please.</p><p>The odyssey continues with Blogger. </p> </div>Brian Tannerhttp://www.blogger.com/profile/14181938377727002212noreply@blogger.com0tag:blogger.com,1999:blog-5776649098481708970.post-4383673784209457702007-05-07T09:33:00.000-06:002007-05-07T09:41:12.509-06:00How to install sCons on Intel Mac OS 10.4.9sCons (<a href="http://www.scons.org/">http://www.scons.org/</a>) is an awesome, cross platform, python based build tool. I am using it for my <a href="http://code.google.com/p/bt-glue/">bt-glue</a> project.<br /><br />Anyways: installation is quite straightforward. But you need to know <span style="font-weight: bold;">1 key thing</span>! Get the latest build, and NOT the stable build. The stable build is several years old and missing features you want. I promise.<br /><br />1) Download the latest tarball, unpack it<br />2) Execute: <span style="font-family: trebuchet ms;">sudo python setup.py install</span><br />3) Edit your .profile and update your path to include:<br />/System/Library/Frameworks/Python.framework/Versions/Current/bin<br /><br />If you don't know what that means, that's ok.<br />i) Create a file in your home directory called ".profile".<br />ii) Add a line like this:<br /><span style="font-family: trebuchet ms;">export PATH=$PATH:/System/Library/Frameworks/Python.framework/Versions/current/bin</span><br /><br />iii) Save the file<br /><br />Now, when you open a new terminal session, you should be able to type: <span style="font-family: trebuchet ms;">scons -version</span> and have good things happen.<br /><br />Happy sCons'ing.Brian Tannerhttp://www.blogger.com/profile/14181938377727002212noreply@blogger.com2tag:blogger.com,1999:blog-5776649098481708970.post-27326991034817313402007-05-07T09:19:00.000-06:002007-05-24T22:14:45.131-06:00How to install it++ on Mac OS X 10.4.9Note: Thanks to Alex Alvarado for pointing out a small error with the instructions. Also thanks to Brad Joyce for pointing our numerous other leaps in the directions.<br /><br />There is some cool software called <a href="http://itpp.sourceforge.net/">it++</a> which lets you write and do Matlab/Octave-like things in C++. So far I've had some trouble getting it installed.<br /><br />You see, it uses blas, cblas, LAPACK, and fftw (and maybe some other stuff). Generally you'd want to have all these installed yourself before using it++, but on Mac OS X (since 10.2 I think), everything except fftw comes in the vecLib framework (which comes <span style="font-weight: bold;">WITH</span> OS X).<br /><br />So, how do you get this thing running?<br /><br />1) You need a fortran compiler, and apparently you won't have one by default. In theory, you can install it with Fink. That didn't work for me. I went to <a href="http://hpc.sourceforge.net/">http://hpc.sourceforge.net/</a> and used his binaries <shrug>. You're looking for <span style="font-weight: bold;">g77 3.4</span>. There are instructions there on exactly how to unpack the archive so that everything goes to the right place. Works great. Make sure that you add it to your path in ~/.profile and that you restart terminal (so that the changes take effect). G77 will go in /usr/local/bin, so you should add a line to your ~/.profile like:<br /><span style="font-family:trebuchet ms;">export PATH=/usr/local/bin:$PATH</span><br /></shrug><shrug><br />2) You need <a href="http://www.fftw.org/">fftw</a>. I just downloaded, did a ./configure , make, sudo make install and it went off without a hitch.<br /><br />3) Now you want to install it++. First, go download and unpack it. Then, you should be able to:<br /><span style="font-family:trebuchet ms;">./configure</span><br /><br />You'll see this (hopefully)<br />------------------------------------------------------------------------------<br />itpp-3.10.10 library configuration:<br />------------------------------------------------------------------------------<br /><br />Directories:<br />- prefix ......... : /usr/local<br />- exec_prefix .... : ${prefix}<br />- includedir ..... : ${prefix}/include<br />- libdir ......... : ${exec_prefix}/lib<br />- docdir ......... : ${datarootdir}/doc/itpp-3.10.10<br /><br />Switches:<br />- debug .......... : no<br />- exceptions ..... : no<br />- html-doc ....... : no<br />- shared ......... : yes<br />- static ......... : no<br /><br />Documentation tools:<br />- doxygen ........ : no<br />- latex .......... : yes<br />- dvips .......... : yes<br />- ghostscript .... : yes<br /><br />Testing tools:<br />- diff ........... : yes<br />- sed ............ : yes<br /><br />External libs:<br />- BLAS ........... : yes<br />* MKL .......... : no<br />* ACML ......... : no<br />* ATLAS ........ : no<br />- CBLAS .......... : yes<br />- LAPACK ......... : yes<br />- FFT ............ : yes<br />* MKL .......... : no<br />* ACML ......... : no<br />* FFTW ......... : yes<br /><br />Compiler/linker flags/libs/defs:<br />- CXX ............ : g++<br />- F77 ............ : g77<br />- CXXFLAGS ....... : -DASSERT_LEVEL=1 -O3 -fno-exceptions -pipe<br />- CXXFLAGS_DEBUG . :<br />- CPPFLAGS ....... :<br />- LDFLAGS ........ :<br />- LIBS ........... : -lfftw3 -llapack -lblas<br /><br /><span style="font-weight: bold;">Note</span>: If y ou didn't install fortran (g77), <span style="font-weight: bold;">it WONT FIND LAPACK OR BLAS</span>. That took me a long time to sort out. You've been warned.<br /><br />Now, if everything went according to plan, you can do:<br /><span style="font-family:trebuchet ms;">make</span><br /><span style="font-family:trebuchet ms;">make check</span><br /><br />This will run some checks to make sure things are working ok.<br /><br /><span style="font-family:trebuchet ms;">sudo make install</span><br /><span style="font-family:trebuchet ms;"></span><br />That's it, now you're done. Great work. Watch for a follow up post of how to use it++!</shrug>Brian Tannerhttp://www.blogger.com/profile/14181938377727002212noreply@blogger.com6tag:blogger.com,1999:blog-5776649098481708970.post-52307400025771261162007-05-05T15:03:00.000-06:002007-05-05T15:50:28.862-06:00Don't Order Pizza Online From Pizza Hut (at least in Canada)Somebody should get fired.<br /><br />So, Mom Harriott is visiting, and we thought we'd order Pizza. Usually we order online from <a href="http://www.pizza73.com/">Pizza 73</a> which is a painless (and actually enjoyable experience). Their interface is all "Web 2.0" with dynamic updating of pages without reloading, and its well done.<br /><br />But Ariel doesn't like Pizza 73 as much as others, so we're going to explore. She's got a Panago flier, but I prefer Pizza Hut, so I'm checking their website.<br /><br />I should mention, all of this frustration happened yesterday, and I eventually gave up. I'm reliving the experience today so that I can tell you about it.<br /><br />I go to <a href="http://www.pizzahut.ca/">www.pizzahut.ca</a> and <span style="font-weight: bold;">score</span>, they have online ordering (I don't like the phone). So I click the link, and am presented with a <a href="http://www.pizzahut.ca/order/online">map of Canada</a>: click Alberta. Easy so far.<br /><br />Ok, now I'm at "Already registered" or "Not Registered?". First off, I don't want to register, I just want to order a pizza. This is the first bad sign. But whatever, I can dig it. I'll go the "Not Registered" route, type in my postal code, and see if the service is available in my area (PS: of course it will be).<br /><br />Ok, it takes me to a page for entering all of my info.<br /><a href="https://quikorder.pizzahut.com/phorders/registration.php?zipCode=a1b2c3&streetAddress=">https://quikorder.pizzahut.com/phorders/registration.php?zipCode=a1b2c3&amp;streetAddress=</a><br /><br />I don't want to, but I do. I get to the end, hit submit, and I get "You must enter a valid zipcode". Oh, so they don't do careful validation, I typed my postal code like a0b1c2, I should try: a0b 1c2. No good. a0b-1c2. Nope. Ok, so it thinks I'm in America. How to change, no idea. How did I make them think that? No idea.<br /><br />Maybe I made a mistake. So I start over. Same process, but when I get to the registration form, it's this one:<br /><a href="ttps://quikorder.pizzahut.com/phorders/registration.php?zipCode=a0b1c2&streetAddress=">https://quikorder.pizzahut.com/phorders/registration.php?zipCode=a0b1c2&amp;streetAddress=</a><br /><br />I know what you're thinking. That's the same link! Yup. But this time the page has a red background instead of white. And now that field actually says postal code! Great! This is gonna work!<br /><br />So I fill it all in and go "confirm", and I'm on the next screen.<br /><br />Choose a username. <span style="font-weight: bold;">Why</span> do they do this? I thought username was a solved problem : use my e-mail address as username. It's unique, and its mine, and I've already typed it in on the previous screen. Nope, can't use that, username can't have the @ symbol.<br /><br />Try btanner.<br /><br />Now the password. Oh cute, they have a little password validator, makes sure you choose a good password. This is <span style="font-style: italic;">very</span> important for ordering pizza. Lets pretend my school password is thrcew## (It's not). This password should be good enough for pizza hut, its good enough for Unix and everything else I use. <span style="font-weight: bold;">Nope</span>! Pizza hut says this is TOO SIMPLE. Ok, instead of ## at the end, I'll use thrcew33. That's apparently a stronger password, and it is accepted. Also, for fun, I tried "password11". That is also a strong enough password.<br /><br />Now I need to choose a security question. There is a short list of options, fine. Birthplace city. Winnipeg. But, for some reason, the security answer field is password field, so it's ********. Interesting choice, even my bank lets me choose my security question in plain text. Whatever.<br /><br />Click Finish.<br /><br />Account name in use. Please try again. And it erased my password field, but not my security question fields. Fine.<br /><br />Try. brian_tanner. Can't have "_" in it. Brian.tanner is out too then I guess. Hmm. Oh, "suggest a username" button. It suggests TannerB. How clever. Fine.<br /><br />Oh. That one is in use, all fields erased again.<br /><br />Try: <span style="font-weight: bold;">IHatePizzaHut</span>.<br /><br />Oh. That one is in use, all fields erased again.<br /><br />This is getting frustrating, lets make the username and password be thrcew33.<br /><br />Nope, username and password cannot be the same.<br /><br /><span style="font-weight: bold;">ARE YOU KIDDING ME. I JUST WANT TO ORDER A PIZZA.</span><br /><br />USERNAME: thrcew<br /><br />Aha! I've cracked the Pizza Code! I'm in! Remember, by the way, I haven't even seen a menu or the prices/specials yet.<br /><br />I'm not actually going to order anything now, because we ended up just calling Panago.<br /><br />But, I hope you get the feeling of how horrible and frustrating this was. Let's sum up:<br />1) I chose Canada and got to the US registration page<br /><br />2) Doesn't allow e-mail as username. Many username's already taken, including IHatePizzaHut.<br /><br />3) Username suggestion is a waste of time, it suggested something that was taken<br /><br />4) Password strength meter is more aggressive than anywhere else (in the wrong ways) and weaker (in the right ways). By the way, the Google password strength tester thinks that <span style="font-weight: bold;">thrcew</span> and <span style="font-weight: bold;">thrcew##</span> are strong passwords (too weak for pizza hut). Google thinks that <span style="font-weight: bold;">password11</span> is only fair (strong enough for pizza hut).<br /><br />5) Security answer is a password field (this is just weird, not a big deal)<br /><br />At the end of the day, I just wanted to order a pizza. Now, I will never order a pizza from them again. My time is valuable, and someone at their company wasted a bunch of it.<br /><br />I hopped over to the Pizza73 website to remind myself what thei process was all about.<br />1) Pick your food<br />2) Fill in details.<br /><br />For Pickup, they need: <span style="font-weight: bold;">Name, Phone, City</span>.<br />For Delivery, they need <span style="font-weight: bold;">address</span> (and if you live in an apartment, the suite number). You must also choose a payment method, which means if you want to pay credit card you have to type it in. However : paying cash is an option.<br /><br />Optional are e-mail address and comments.<br /><br />Now, <span style="font-weight: bold;">that</span> is easy. I know who I'm ordering from next time.<br /><br />So, if you care, be like me: don't support a company that puts such a low priority on useability.<br /><br /><br />PS: If you do want to order from Pizza hut, you can use this account and order as me, John Smith:<br /><br />Username: thrcew<br />Password: thrcew33<br />Security Answer: Winnipeg (or winnipeg).<br /><br />Post a comment here if you do. I'm going to send a link to this post to pizza hut, see if they care to comment.Brian Tannerhttp://www.blogger.com/profile/14181938377727002212noreply@blogger.com12tag:blogger.com,1999:blog-5776649098481708970.post-30550782070280846852007-05-05T13:55:00.000-06:002007-12-02T13:59:51.081-07:00Samsung ML-1610 Printer on Mac OS X (Intel Mac)<b>NOTE: Some of the steps in these instructions don't work anymore because the files don't exist. I've updated the instructions to work on Leopard, which you can check out here:</b> <a href="http://brian.tannerpages.com/content/view/55/69/" mce_href="content/view/55/69/">Samsung ML-1610 Printer on Mac OS X 10.5 Leopard (Intel Mac) </a>.<br /><br />Hi. I have tried so many times to figure out how to get my Samsung ML-1610 printer working on my Intel Mac.<br /><br />There are many quick explanations on the web of how to do this, but they don't seem to have all of the details... and as a non Unix guru, I've always had trouble making them work.<br /><br />--<br /><br />Ack. I just spent hours trying to actually get this working. I don't know why I try to be so clever. The instructions are posted below. The things is, all of the directions use OLD files, made for powerPC. I'm using a new Intel Mac, so I thought instead of following the direct links from instructions (to old files), I should find the up to date versions. NO LUCK with that.<br /><br />Moral of the story: don't try to be clever. Here are the directions.<br />1) Go here: <a href="http://www.linux-foundation.org/en/OpenPrinting/MacOSX/hpijs">http://www.linux-foundation.org/en/OpenPrinting/MacOSX/hpijs</a><br /><br />2) Download and install <a href="http://prdownloads.sourceforge.net/gimp-print/espgs-7.07.1.ppc.dmg?download"><span style="font-weight: bold;">espgs-7.07.1.ppc.dmg (5.4 MB)</span></a><br /><br />"But Brian" (you say), "that's powerPC! I have an Intel Mac." That's true.<br /><br />You might also find it interesting to know that ESPGS (ESP Ghost Script) has been replaced by Gutenprint some time ago. It doesn't matter, don't go for Gutenprint. I tried that route, and had no end to my problems. I tried to build it from source and had errors, then installed it from a disk image only for it to 'work' but instead of printing my documents it would just fire up my printer and then not do anything.<br /><br />Just install the old one from the page.<br /><br /><span style="font-weight: bold;">This worked on OS X 10.4.9 on an brand new Intel Imac on May 5 2007.</span><br /><br />If you are more clever than me and got it working with newer software please tell me.<br /><br />3) Download and install <a href="http://www.openprinting.org/download/printdriver/macosx/hpijs-foomatic-2.0.2.ppc.dmg"><span style="font-weight: bold;"> hpijs-foomatic-2.0.2.ppc.dmg (1.6 MB)</span></a><br /><br />Same story, I know, I know, it looks old. It works. Suck it up. :P<br /><br />4) Go to System Preference --> Print &amp; Fax<br /><br />5) Click the + button to add a printer<br /><br />6) Samsung should be in the list now! Thank heavens. But, ML-1610 isn't there! Oh no! Just use ML-1210 Soomatic + GDI. Seems to work fine.<br /><br />There, all done.<br /><br />Again, if you have a more up to date method than this, please share it with me. I'm just glad to finally have this working.Brian Tannerhttp://www.blogger.com/profile/14181938377727002212noreply@blogger.com18tag:blogger.com,1999:blog-5776649098481708970.post-17856615309426378682007-04-07T17:41:00.000-06:002007-04-07T17:47:07.023-06:00BT-GlueBT-Glue is a project that I'm currently putting a lot of my time into. Basically, the idea is to create a library of agents and environments implemented in C++ that can easily be plugged together. There is also a visualization component : currently implemented in objective-c with Cocoa on OS X, you can easily write a visualizer for an environment and watch the agent's value function, policy, whatever.<br /><br />It's coming along quite well, and I think its sort of interesting. I'm also working on the "Second World", which is a large, very complex, environment that will be used for reinforcement learning research. The idea is that it is not supposed to be exactly like the real world, but it is supposed to be rich and complex enough to require beyond state of the art RL techniques in order to create successful agents. The agent's experience consist of rapid (10-100 hz?) primitive sensors signals that it must respond to with primitive actions. The project is a little unique because the environment is a real time environment, 1 second of world time roughly corresponds to 1 second of real time. <br /><br />BT-Glue is cool because I want the agents from Second World to be portable to Mountain Car for example : generic agents, applicable to a variety of problems. That's where BT-Glue comes in.<br /><br />Anyways, I should say more here, and I will later, but for now I just want to link to BT-Glue.<br /><br /><a href="http://code.google.com/p/bt-glue/">http://code.google.com/p/bt-glue/</a>Brian Tannerhttp://www.blogger.com/profile/14181938377727002212noreply@blogger.com0tag:blogger.com,1999:blog-5776649098481708970.post-76020611460226369382007-04-02T20:14:00.000-06:002007-04-02T20:17:29.910-06:00Chronological Papers List<p>This is a list of the papers I've published, in chronological order. If you follow the links, they will take you to Google base items with abstract, bibtex, and a link to the PDF version. The cute thing about using Google base items is that they are standalone "web items" that exist out on the Internet. If this page ever disappears, or I move it, the actual papers won't move... they will (in theory) be there as long as Google keeps the base service running. If you try out this approach, I'd love to hear how it works out for you.</p><p> </p><h3>2007 </h3><ul><li>Brian Tanner, Vadim Bulitko, Anna Koop, and Cosmin Paduraru.<a href="http://www.google.com/base/a/1530190/D6428254921053256546"><u>Grounding Abstractions in Predictive State Representations.</u></a> Twentieth International Joint Conference on Artificial Intelligence. January 2007. 6 pages.</li></ul><h3>2006 </h3><ul><li>Brian Tanner and Richard Sutton. Predictive Action Descriptions from Experience. 16th International Conference on Inductive Logic Programming. Short paper. August 2006. 3 pages. 2005.</li></ul><h3>2005 </h3><ul><li>Brian Tanner. <a href="http://www.google.com/base/a/1530190/D17827107769903128079">Temporal Difference Networks</a>. Master’s Thesis. September 2005 (published). 58 pages.</li><li>Brian Tanner and Richard Sutton. <a href="http://www.google.com/base/a/1530190/D8801934931963004433">Temporal Difference Networks with Eligibility Traces</a>. International Conference on Machine Learning 2005 (published). 8 pages.</li><li>Brian Tanner and Richard Sutton. <a href="http://www.google.com/base/a/1530190/D2325704698667369319">Temporal Difference Networks with History</a>. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence. July 2005. 6 pages. </li><li>Eddie Rafols, Mark Ring, Richard Sutton, Brian Tanner. <a href="http://www.google.com/base/a/1530190/D9432777834925390819">Using Predictive Representations for Generalization in Reinforcement Learning</a>. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence. July 2005. 6 pages. </li><li>Robert C. Holte, Jeffery Grajkowski, Brian Tanner. <a href="http://www.google.com/base/a/1530190/D11669331728687183755">Hierarchical Search Revisited</a>. Symposium on Abstraction, Reformulation and Approximation (SARA). July 2005 (published). 13 pages.</li></ul><h3>2004</h3><ul><li>Richard Sutton and Brian Tanner. <a href="http://www.google.com/base/a/1530190/D6337897526194723419">Temporal Difference Networks</a>. In Proceedings of NIPS. December 2004 (published). 6 pages.</li><li>John Anderson, Brian Tanner, and Jacky Baltes. <a href="http://www.google.com/base/a/1530190/D7741364319926143499">Dynamic coalition formation in robotic soccer</a>. In Proceedings of the AAAI-04 Workshop on Forming and Maintaining Coalitions and Teams in Adaptive Multiagent Systems. 2004. </li><li>John Anderson, Brian Tanner, and Jacky Baltes. <a href="http://www.google.com/base/a/1530190/D9334424310882499968">Reinforcement Learning from Teammates of Varying Skill in Robotic Soccer</a>. In Proceedings of the 2004 FIRA Federation of International Robotic-soccer Association World Congress. 2004.</li></ul><h3>2002<br /></h3><ul><li>John Anderson, Brian Tanner, and Ryan Wegner. <a href="http://www.google.com/base/a/1530190/D16860526236519712159">Peer reinforcement in homogeneous and heterogeneous multi-agent learning</a>. In Proceedings of the IASTED International Conference on Artificial Intelligence and Soft Computing 2002.</li><li>John Anderson, Ryan Wegner, and Brian Tanner. Exploiting opportunities through dynamic coalitions in robotic soccer. In Proceedings of the AAAI International Workshop on Coalition Formation in Dynamic Multiagent Environments. 2002.</li></ul>Brian Tannerhttp://www.blogger.com/profile/14181938377727002212noreply@blogger.com0tag:blogger.com,1999:blog-5776649098481708970.post-36399176712658347632006-04-03T11:17:00.000-06:002007-04-03T11:18:08.795-06:00On the speed of researchI was at home for the holidays, and I found myself talking to old friends about Ray Kurzweil and his predictions about the future of technology and society. I'll admit, his book "The Age of Spiritual Machines, when computer exceed human intelligence" was important to me; it was one of the motivators that got me interested in strong artificial intelligence.<br /><br />So found myself talking about how technological evolution is an exponential process, and how this has impacted the speed of research. I'm going to rehash the example I gave them here, because it's good to have things written down.<br /><br />Let's say the year is 1990; and I have an interesting artificial intelligence idea. Is my idea novel? How can I find out? I'll mention it to my collegues, to determine if anyone has heard of similar work. I'll then take my leads and head to the library. Now I'll search through a book, microfiche, maybe a computer of conference and journal article titles (and maybe abstracts). This could take a very long time. Finally, I'll have a compiled list of works that may be relevant.<br /><br />Some of these sources will be available in the well stocked University library. All I need to do is spend the afternoon running around, finding appropriate volumes, and marking down which volumes that I need are currently checked out. I'll have to fill out a form requesting the unavailable volumes when they return. That could take a week or so. For the sources not stocked locally, I'll fill out a request form, and those issues will be sent from wherever in North America they are to me; for an inter-library loan. Very cool. That will take several weeks as well. These are all short period loan items, so of course I'll have to spend a few hours photocopying everything that I might want a copy of.<br /><br />So, after looking at these articles, I will probably learn that they are not exactly what I wanted. But!, they will probably cite related work that IS exactly what I wanted. So, I'll go back to the library with my new list of sources, and get my hands on what I can.<br /><br />The funny thing about this story is that on one hand; it is fantastic. Using the library and inter-library loan procedure, almost any bit of published information is available to me. Quite amazing. The downside of course, is that it can take weeks to get access to some of the information, and it is not easy to search.<br /><br />Of course, in todays information age, there are no such restrictions. Not only do we have access to most of these works; they are now accessible from our desk, they are also searchable, and it takes seconds instead of weeks to get the information. Literally, I can have an idea in the morning and have a feeling for the related work by the afternoon, while back in 1990 it would have taken weeks. That's the speed of research.<br /><br />Also, a brief word on processing power. Computers are obviously much faster now than in 1990; but what has the impact of this been? In the past, it would have been necessary to commit significant computing resources to run a new algorithm (or a tweaked or bug-fixed algorithm) on some reasonable dataset, especially if multiple trials need to be performed to establish statistical significance. These experiments could take days or weeks. To run that same experiment now would take seconds or minutes. This acceleration of result availability means that we can try more ideas in a week than could have previously been done over the course of a research period.<br /><br />In the past, after an interesting idea was published, it might take a year or two before other research groups could follow up on the idea and extend it. That latency has been cut drastically by the combination of the issues I've mentioned, along with others like prepublication manuscripts and online technical reports. This will continue, and we will see new ideas and improvements to existing ideas at an increasingly rapid pace.<br /><br />Ok, so it's not deep, or life altering information. It's just a thought and/or musing. And it's exciting.Brian Tannerhttp://www.blogger.com/profile/14181938377727002212noreply@blogger.com0tag:blogger.com,1999:blog-5776649098481708970.post-77775600799802666602006-04-03T11:15:00.000-06:002007-04-03T11:17:29.217-06:00Where do rewards come from?<p>This is actually several blog entries from my old website stitched together. I hate to lose these things when I migrate software, so I’m trying to keep it alive.</p><div style="line-height: 20px; text-decoration: none;"> </div><div style="line-height: 20px;">These are some pretty random thoughts, btw. My opinions have likely changed since writing this :)</div><div style="margin-bottom: 0px; margin-top: 0px; font-weight: bold; line-height: 20px;">Entry 1:</div><div style="line-height: 20px;">So, I'm reading this book "How the Mind Works" by Steven Pinker. Its great. It speaks to the methods by which time has evolved very specialized mental function in the brain. The idea is that we sometimes take for granted that complex physical structures have evolved, but we think of the mind as some general purpose thinking machine. Pinker's view is that the mind has evolved in a similar way as the rest of the body. So this got me thinking about the reward function in reinforcement learning...</div><div style="line-height: 20px; text-decoration: none;"> </div><div style="line-height: 20px;">So, in reinforcement learning, we generally have some states and a reward function, and we want to find a policy that maximizes the discounted sum of future rewards generated by this function. We have decent solutions to finding such a policy in fairly complex domains.</div><div style="line-height: 20px; text-decoration: none;"> </div><div style="line-height: 20px;">But... it takes a long time. And really, in real life we don't have a long time. Take animals for example. I know, I know - trying to relate a new idea to something I hardly understand from nature is a farce, but just hear out this illustrative example. Animals know bad tastes from good tastes. They have natural aversion to things that taste bitter and natural attraction to things that taste sweet. This isn't something that they learn, it is something that they are born with. Why? Why not learn it? Because if animals had to learn everything from scratch, they would die. Extinction. Animals run from loud noises. Same deal. Evolution programmed some things in to help animals survive.</div><div style="line-height: 20px; text-decoration: none;"> </div><div style="line-height: 20px;">Ok, but how does this apply to learning? Well, animals learn to associate things. They can learn to associate people with loud noises for example - so stay away from people. Or, maybe they will associate people with food (don't feed the wildlife) - so people become a secondary reinforcer, so approaching people is a good things.</div><div style="line-height: 20px; text-decoration: none;"> </div><div style="line-height: 20px;">Maybe (and just maybe) if we want our agents to learn quickly and generalize well, we need to tailor their reward function more than we have been. I mean, look at a big maze. You can give a reward of -1 for every state action pair except escape, and then marvel that the agent learns the fastest way out. The problem is that they learn this optimal policy in the limit, which can take a long time. When people first learn of reinforcement learning, they almost always will say "Can't we give positive rewards for going near the exit and negative rewards from being far from it". The common answer with the classical viewpoint is "no". The reason, because then you are doing all of the work, analyzing the domain, crafting a reward function that helps the agent. Better, or so we're told, is to just tell the agent the end result of what you want it to do, exit the maze. Make everything else bad, and eventually the agent will work things out.</div><div style="line-height: 20px; text-decoration: none;"> </div><div style="line-height: 20px;">All those points are valid. But what if we have additional constraints. Say, the agent is a failure if it does not exit the maze within a fixed limit of time. Even if the agent is given the task over and over, it may take a huge amount of time before it finds a way out. But... if we had a reward function that rewards subgoal behaviour, perhaps this agent could learn it's way out quickly, and on the first try. Wouldn't that be neat? I think so.</div><div style="line-height: 20px; text-decoration: none;"> </div><div style="line-height: 20px;">So, what does it take to tailor a reward function. Work. You have to try a bunch of them, and do some sort of local search to get better ones. The good news, you can do it in parallel, which saves some time.</div><div style="line-height: 20px; text-decoration: none;"> </div><div style="line-height: 20px;">I think the big win here actually is going to be with function approximation. What we often find is that we have a function approximator which doesn't provide optimal discrimination along lines that are necessary to maximize some reward function. Like, we want the robot to get out of its pen, but the pen is round and our function approximator uses squares. So, maybe the agent needs to do some bumping into walls and zigzagging because some squares are "good sometimes" and "bad othertimes". This is a bit wishy washy, but stay with me. Maybe with an evolving reward function, we can make the task easier to learn. Maybe we can provide rewards in such away that the overall task (escaping the pen) is made easiest given the function approximation. Maybe the reward function can evolve to exploit regularities in the function approximator. Heck, maybe we can evolve the reward function and the feature set in parallel and find interesting features that give us discrimination and generalization exactly where we need it. Maybe we could even evolve a starting policy at the same time and build in <shudder> instinctive behaviour.</div><div style="line-height: 20px; text-decoration: none;"> </div><div style="line-height: 20px;">Anyways, these are some ideas. I don't know if they've been done. I'm about to read Geoffrey Hinton's paper on "How learning can guide evolution". I think maybe its backwards and we should use evolution to guide learning... but maybe not.</div><div style="line-height: 20px; text-decoration: none;"> </div><div style="line-height: 20px;">If this is new, its going to be some sort of search in reward space, maybe I can bundle it up into a neat paper.</div><div style="line-height: 20px; text-decoration: none;"> </div><div style="font-weight: bold; line-height: 20px;">Entry 2:</div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px;">This is a bit of an extension to the story above... I did some more thinking and read Geoffrey Hinton's paper.</div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px; text-decoration: none;"> </div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px;">So, we're talking about crafting a reward function. But, this makes the bottom fall out of our barrel. If agents are supposed to maximize their reward, and we are learning a reward function to help the agent succeed, the obvious degenerate case is for the the agent to get high reward for doing nothing (or doing anything).</div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px; text-decoration: none;"> </div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px;">How does nature deal with this problem? Nature doesn't even consider the problem, because the goal and the rewards are distinct. It doesn't matter how happy I am in my life, or how much reward I accumulate, if I do not reproduce, then my gene's have failed in their goal, which is to propagate themselves. 1 distinct, simple goal. Survive.</div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px; text-decoration: none;"> </div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px;">We can see this is many different aspects of human nature (I think - I'm no psychologist). Why is getting better than having? Why is the thrill in the chase? Why do rich people gamble? Why take the smaller payout instead of the larger one spread over time? People like to get.</div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px; text-decoration: none;"> </div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px;">Where am I going with this?</div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px; text-decoration: none;"> </div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px; text-decoration: none;"> </div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px;">I'm going to postulate that people like getting because there is a reward for getting. I'll come back to this if I can make it more clear.</div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px; text-decoration: none;"> </div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px;">In the maze example, we can see what we need to do. The reward evolution decides how the agent is rewarded, but (like in nature - sheesh I'm doing it again) the agent needs to be evaluated by an external process. Did they get out of the maze? Did they get out of the maze fast? It doesn't really matter how much reward (fun) the agent had running around the maze, it matters if he got out. That is the fitness function that guides the reward evolution and eventually evaluates the agent. Will this work with more complex tasks?</div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px; text-decoration: none;"> </div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px;">Maybe it'll work better. (Maybe not). We really need to provide input here, which I would prefer if we didn't but for now we will and keep it simple. Say we are making a robot that walks. If it falls, it fails. If it moves forward some distance, it passes. There is the fitness function. Pass/fail. Maybe.</div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px; text-decoration: none;"> </div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px;">What about something like playing blackjack. This is harder. Rationally, it seems that people are bad at gambling. People get addicted. People chase their losses with more good money. If leaving with less than you started with is failing, and leaving with much more than you started with is winning, maybe our gambling reward function does just the right thing? The expected value of gambling is losing, so perhaps a few big bets is better than many smaller bets. If you are down a bunch of money, the only way to not be down a bunch of money is to win. Maybe chasing lost money is actually a good thing, to the goal of not being down a bunch of money.</div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px; text-decoration: none;"> </div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px;">Anyways, so this *is* the hard part, I won't deny it. By going a level up from the reward function, we have to come up with a simpler fitness function, something that is almost braindead. If not, then my whole argument can be called recursively to some higher goal. Maybe that's not a terrible idea, but its not the one I want to explore. What do we do? Maybe standard RL goes are ok. Playing a game - winning the game is good, losing is bad. If we are using a large population of agents, then the stochasticity of games and different opponents works itself out. If we are playing with a single agent, this doesn't work so well. But, would evolution work with a small population? Nope.</div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px; text-decoration: none;"> </div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px;">I'm trying to think of something with a really complicated reward function. An example at ICML '04 where they did inverse reinforcement learning was this car driving task. You wanted to stay on the road, not hit people, not get hit by people, go fast, etc, etc. Their argument (if I recall) for inverse RL was that people can perform this task well, but have a hard time constructing the reward function for an agent to do as well. If the penalty for going off the road is too weak, then the agent will drive off road to avoid the stochastic nature of traffic. If this penalty is too strong, then the agent will crash into other cars instead of veering into the shoulder. What could we do here? I'm not quite sure. Maybe I'll come back and edit this. Otherwise, send me an e-mail if you have some idea.</div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px; text-decoration: none;"> </div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; font-weight: bold; line-height: 17px;">Entry 3:</div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px;">So, previously - I was all about reward function shaping. I read some work by Andrew Ng, and he showed that reward shaping can be a little dangerous, and perhaps we should do something that he describes with a potential function. Many of the benefits, with less risk. Then I looked at Eric Wiewora's research note showing that this potential function scheme was the same as just setting the initial value function to the potential function.</div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px; text-decoration: none;"> </div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px;">Maybe this is a win? Initial value function is the same as using a potential function which is safer and has all the benefits of changing the reward function.</div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px; text-decoration: none;"> </div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px;">So - I thought about evolving the value function. What I decided was that evolving the value function has little benefit over starting the agent over multiple times with random value functions and then taking some of what was learned in each one and combining it. This then, is the same as learning off policy with a few good exploratory policies?</div><div style="margin-bottom: 0px; margin-top: 0px; font-size: 15px; line-height: 17px; text-decoration: none;"> </div><div style="margin-bottom: 0px; margin-top: 0px; padding-bottom: 0pt; font-size: 15px; line-height: 17px;">So, is this whole direction a waste? Perhaps. I want to speak further with Vadim about them and see what he thinks.</div>Brian Tannerhttp://www.blogger.com/profile/14181938377727002212noreply@blogger.com0