My first script on this site. I hope it's useful to someone.Possibly interesting parts: Adding the mouseover text to the bottom of the image and the title to the top in the local copy.

Read before running:1. The directory where you'll store the comic has to exist. The script doesn't check. I could add it but it's trivial to make the dir yourself.2. You need to have imagemagick installed. convert must be runnable from your PATH variable.

How to download every comic in the archives:for i in {1..692} ; do ; echo "Downloading $i" ; xkcd $i ; done

It will fail for comic #404, which leads to the sites 404 page - but it doesn't break. It just creates an entirely blank comic.

I did a little bit of trivial adapting in this forum's inputbox without testing because I actually use it in a cron job to automatically download the new one on monday, wednesday and friday but I'm fairly certain this will run. If someone does try it out and it won't run, let me know.

What a cool script! I like the 'for' loop, "for i in {0..5000}; do". This is the first time I saw that. It seems much better than using (( i=0; i <= 5000; i++)). Too bad it cannot be used with a variable (e.g. "for i in {0..$x})! ...Or can it?

The package 'imagemagick' looks cool. I had to install it and it works great. I like how you used it to add a label and a title.

I love the idea of using 'bash' to get information from the internet. I like how you used the webpage to isolate the picture ("wget -q http://www.xkcd.com/$i"... "wget -q $IMGNAME"). I saw something like this is "Wicked Cool Shell Scripts", but your script is better (IMO).

It's cool that you run this as a 'cron' job. (I still haven't tried playing with 'cron').

Thanks a lot. I really like it and look forward to trying something like this for myself.

I'm glad you like it, and that you think the comics are good. I kind of like having them on my hard disk.

It hadn't occured to me to try the expansion with a variable. I checked just now, and I don't immediately find a way to make it expand properly with: for i in {0..$x}. That sort of thing is really more something you'd use a while loop for.

cron jobs are very useful for things like this. They do have one problem you should be aware of: if your pc is not running when their time comes to trigger, they never run at all. You can remedy this by using anacron, or avoid doing that by using cron.hourly and using a timestamp/check before it does anything to ensure it only runs once every day.

Usually your distro will have cron running, and will have cron.hourly, cron.daily, cron.weekly and cron.monthly directories made under /etcYou shouldn't have to do anything with the daemon at all - just put your script in the directory and it will run. It's very handy if you're forgetful like I am

Thanks again for this! This was just the sort of exercise that I wanted.

Since I have a long way to catch up (so far I have read the first 40 or so comics). I decided to take a different approach than you. Using your code as the basis, I rewrote the script to get and process comics within a specified range ( xkcd [start] [end]). This way I can process only a few comics at a time, rather than download the whole lot. I also broke down the process into three functions: 'xkcd-web' (to download the webpages), 'xkcd-tag' (to get the image and process the details), and 'xkcd-com' (to put together the finished comic). This way saves files at each stage of the process (to be removed later) and allows me to process the whole batch at once from one stage to the next. I like how it is turning out.

I wonder which way is more efficient--processing each comic from start to finish or moving the whole batch from one stage to the next. What do you say?

~~~~~

It's funny that you that you mentioned using 'while' loops instead of variable expansion in a 'for/next' loop. I have not actually used a (single) 'while' loop yet. Guess I should try!

Thanks for the advice for 'cron' and 'anacron'. I will have to keep this in mind. When I get caught up with the 'xkcd' comics, I will try to implement a scheduled task like you have done.

You can definately speed it up if you process all the comics the way you do. You could download a bunch of them at the same time that way in the background.

Code:

wget -qb $COMIC1 -O comic1 &wget -qb $COMIC2 -O comic2 &wget -qb $COMIC3 -O comic3... (I don't know how far you can go with this. I tested it with three and it works fine but you can probably go higher.)

I'll post my weird cron script out here a bit later... I'm kind of ashamed of how much of a hack it is.

This made it much easier to read and write the details of the html image tag. Previously I had been writing the URL, Title, Alt, etc. to a string separated by colons, and using 'cut -d: -f' to retrieve the fields. I didn't see another neat way around the fact that there is a colon in the URL. Writing the descriptors out line by line works very well.

IFS=$old_IFS}##======================================================================# Body: This is the main body of the program. It first sets the proper# directories. It then runs the functions: 'get_current_number',# 'get_range', 'read_tag', 'get_image', and 'assemble_comic'.#======================================================================# # Print titleecho; echo "[xkcd: Get xkcd Comics]"

[fn: read_tag]Reading xkcd webpage(s)... Comics: 695 to 695...>>>1) [695] Reading WEBPAGE: http://xkcd.com/695/IMG_TAG: <img src="http://imgs.xkcd.com/comics/spirit.png" title="On January 26th, 2213 days into its mission, NASA declared Spirit a 'stationary research station', expected to operational for several more months until the dust buildup on its solar panels forces a final shutdown." alt="Spirit" />IMG_URL: http://imgs.xkcd.com/comics/spirit.pngFILE_ORIG: spirit.pngFILENAME: spiritEXT: pngTITLE: On January 26th, 2213 days into its mission, NASA declared Spirit a 'stationary research station', expected to operational for several more months until the dust buildup on its solar panels forces a final shutdown.ALT: SpiritCreating DETAILS_FILE: /home/geoffrey/Comics/xkcd/parts/695.txt

[fn: assemble_comic]Assembling comic(s) (image+title+alt)... Comics: 695 to 695... >>>1) [695] Reading DETAILS_FILE: /home/geoffrey/Comics/xkcd/parts/695.txtUsing IMG_FILE: /home/geoffrey/Comics/xkcd/parts/695.pngUsing TITLE: On January 26th, 2213 days into its mission, NASA declared Spirit a 'stationary research station', expected to operational for several more months until the dust buildup on its solar panels forces a final shutdown.Using ALT: SpiritAssembling to COMIC_FILE: /home/geoffrey/Comics/xkcd/695--spirit.png[convert] /home/geoffrey/Comics/xkcd/parts/695.png --> /home/geoffrey/Comics/xkcd/695--spirit.png

[fn: read_tag]Reading xkcd webpage(s)... Comics: 695 to 695...>>>1) [695] Reading WEBPAGE: http://xkcd.com/695/IMG_TAG: <img src="http://imgs.xkcd.com/comics/spirit.png" title="On January 26th, 2213 days into its mission, NASA declared Spirit a 'stationary research station', expected to operational for several more months until the dust buildup on its solar panels forces a final shutdown." alt="Spirit" />IMG_URL: http://imgs.xkcd.com/comics/spirit.pngFILE_ORIG: spirit.pngFILENAME: spiritEXT: pngTITLE: On January 26th, 2213 days into its mission, NASA declared Spirit a 'stationary research station', expected to operational for several more months until the dust buildup on its solar panels forces a final shutdown.ALT: SpiritCreating DETAILS_FILE: /home/geoffrey/Comics/xkcd/parts/695.txt

Details File: 695.txt

Code:

695http://imgs.xkcd.com/comics/spirit.pngspirit.pngspiritpngOn January 26th, 2213 days into its mission, NASA declared Spirit a 'stationary research station', expected to operational for several more months until the dust buildup on its solar panels forces a final shutdown.Spirit

[fn: assemble_comic]Assembling comic(s) (image+title+alt)... Comics: 695 to 695... >>>1) [695] Reading DETAILS_FILE: /home/geoffrey/Comics/xkcd/parts/695.txtUsing IMG_FILE: /home/geoffrey/Comics/xkcd/parts/695.pngUsing TITLE: On January 26th, 2213 days into its mission, NASA declared Spirit a 'stationary research station', expected to operational for several more months until the dust buildup on its solar panels forces a final shutdown.Using ALT: SpiritAssembling to COMIC_FILE: /home/geoffrey/Comics/xkcd/695--spirit.png[convert] /home/geoffrey/Comics/xkcd/parts/695.png --> /home/geoffrey/Comics/xkcd/695--spirit.png

Assembled Comic (695):

You do not have the required permissions to view the files attached to this post.

I wonder which way is more efficient--processing each comic from start to finish or moving the whole batch from one stage to the next. What do you say?

apethedog wrote:

Quote:

You can definately speed it up if you process all the comics the way you do. You could download a bunch of them at the same time that way in the background.

Hello again,

Okay, since we have this question before us, we might as well test it out. It makes intuitive sense that a "lateral" processing of batches stage-by-stage would be faster than a "linear" processing file-by-file, but I would like to know for sure. (To indulge the "little professor" within me.)

I rewrote my version of the xkcd download script (the "lateral" stage-by-stage method) to process the comics in a "linear" (file-by-file) fashion. I moved the loops from the individual function "stages" to the main body of the program.

# Restore original IFSIFS=$old_IFS#IFS=' /t/n'}##----------------------------------------------------------------------# 5) assemble_comic: Called from main body, this function runs the# 'convert' program to assemble the image with a title and caption and# saves it in the the main directory ($XKCD) with a number, original # filename and (possibly altered) extension (.png).#----------------------------------------------------------------------#function assemble_comic{ # Signal function startecho; echo "[fn: assemble_comic]"echo "Assembling comic(s) (image+title+alt)... Comic: $COMIC_NUMBER... "; echo ">>>" # SET the Internal Field Separator # (IFS) to new line (/n) onlyold_IFS=$IFSIFS=$'\n'

IFS=$old_IFS}##======================================================================# Body: This is the main body of the program. It first sets the proper# directories. It then runs the functions: 'get_current_number',# 'get_range', 'read_tag', 'get_image', and 'assemble_comic'.#======================================================================# # Print titleecho; echo "[xkcd: Get xkcd Comics]"

#!/bin/bash#=======================================================================## NAME: speed_test: 'xkcd--sl' vs 'xkcd--bl' # DESCRIPTION: runs two programs, 'xkcd--sl' and 'xkcd--bl', # one after the other, timing the duration of # run-time over five runs, and outputs total # times for each batch for each program. This # program loops each of the test programs for # five batchs of five and calculates total times # for all five batches # BY: G O Free =:) # FOR: bashscripts.org # DATE: February 1, 2010 #=======================================================================## # Display name of programecho; echo "speed_test: 'xkcd--sl' vs 'xkcd--bl'"; echo # Set the format for 'time' # output (real time only, in # seconds, no letters/labels)TIMEFORMAT='%3R' # Test that a start point was # givenSTART_POINT=${1:?"No start point given: Usage: speed_test [start point]"} # Run time test for five batchesfor (( RUN = 1; RUN <= 5; RUN++ )); do echo "RUN: $RUN" # Set range for batch# let START=$((RUN-1))*5+$START_POINT# END=$START+4 let START=$((RUN-1))+$START_POINT END=$START # Display test range echo "START: $START" echo "END: $END" echo

### Results # Decide which program had # shorter total run time # Display total time # difference, if any if [ $(echo "$SL_TOTAL<$BL_TOTAL" | bc) = 1 ]; then DIFF=$(echo "$BL_TOTAL-$SL_TOTAL" | bc) echo "'xkcd--sl' took $DIFF seconds less time than 'xkcd--bl'" elif [ $(echo "$BL_TOTAL<$SL_TOTAL" | bc) = 1 ]; then DIFF=$(echo "$SL_TOTAL-$BL_TOTAL" | bc) echo "'xkcd--bl' took $DIFF seconds less time than 'xkcd--sl'" else echo "'xkcd--sl' and 'xkcd--bl' took the same time" fiechodoneecho

After 25 comics downloaded (five batches of five), the "big loop" method of processing the comics turned out to be slightly (3.129 seconds) faster than the "small loops" method.

The times are very close. So close that the difference might be due to nothing more than that there is one loop in the "big loop" method, where there are three loops in the "small loops" method--two less loops to process.

That makes sense...

If there was any time saved in doing all the downloading at once, before assembling the comics, it seems to have been negated by having extra loops to process.

Who is online

Users browsing this forum: Yahoo [Bot] and 1 guest

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum