I've been looking for a refurbished mac for development recently and found myself leaving the refurbished site window open in the background and constantly checking the page. What a waste of time. Solution: write a web scraper to do the work for me. Below is a super simple scraper I wrote up that looks at Apple's refurbished page and scrapes for search terms (e.g. Mac mini, iMac, etc.). If new computers are listed that match your criteria it will shoot off an email (using a gmail account) to you with the listings description, price, and product URL.

"""Author: Daniel McGraw (@danielmcgraw, danielmcgraw.com, danielmcgraw.tumblr.com)Description: A scraper used to find refurbished Apple computers."""importreimporttimeimporturllibimportsmtplibfromemail.mime.multipartimportMIMEMultipartfromemail.mime.textimportMIMETextfromBeautifulSoupimportBeautifulSoupclassRefurbScraper(object):def__init__(self,fromAddr,fromPass,toAddr,productName,interval):self.fromAddr=fromAddrself.fromPass=fromPassself.toAddr=toAddrself.productName=productNameself.interval=intervalself.activeURLList=[]defgetSource(self,url):page=urllib.urlopen(url)source=page.read()returnsourcedefparseSource(self,source):soup=BeautifulSoup(source)products=soup.findAll(re.compile('^table'))productList=[]forproductinproducts:secondA=product.findAll(re.compile('^a'))[1]productURL='http://store.apple.com'+secondA.attrs[0][1]productName=secondA.contents[0].lstrip().rstrip()span=product.findAll(re.compile('^span'))[0]productPrice=span.contents[0]productList.append((productName,productPrice,productURL))returnproductListdeffilterList(self,productList):newProducts=[]printself.activeURLListtemp=self.activeURLListself.activeURLList=[]str=re.compile('^Refurbished %s'%self.productName)products=filter(lambdaproduct:re.match(str,product[0]),productList)forproductinproducts:ifproduct[2]notintemp:newProducts.append(product)self.activeURLList.append(product[2])printself.activeURLListprintreturnnewProductsdefsendEmail(self,products):ifproducts:msg=MIMEMultipart('alternative')msg['Subject']="Refurbished %s's"%self.productNamemsg['From']=self.fromAddrmsg['To']=self.toAddrbody="Refurbished %s's:\n\n"%self.productNameforproductinproducts:body+="\t%s\n\t%s\n\t%s\n\n"%(product[0],product[1],product[2])body+="This has been an automated email from Daniel McGraw's Apple Referb Scraper.\n"body+="Follow him on twitter(@danielmcgraw), tumblr(danielmcgraw.tumblr.com), or his blog(danielmcgraw.com)."msg.attach(MIMEText(body,'plain'))printmsg# Use gmail to send email.smtp=smtplib.SMTP('smtp.gmail.com',587)smtp.ehlo()smtp.starttls()smtp.ehlo()smtp.login(self.fromAddr[:-9],self.fromPass)smtp.sendmail('<Refurbished Mac Scraper>%s'%self.fromAddr,self.toAddr,msg.as_string())smtp.close()defloop(self):whileTrue:source=self.getSource('http://store.apple.com/us/browse/home/specialdeals/mac')productList=self.parseSource(source)products=self.filterList(productList)self.sendEmail(products)time.sleep(float(self.interval))

Usage is simple. Copy the script above into a file named AppleRefurbScraper, or anything else you want for that matter, but that's the name I'll be using. Then from a prompt start up python.

>>>importAppleRefurbScraper>>>scraper=AppleRefurbScraper.RefurbScraper('Gmail from address','Gmail from password','To address','search term','loop delay time in seconds')>>>scraper.loop()

The search terms are case sensitive. The applicable search terms are:

MacBook
MacBook Air
MacBook Pro
Mac mini
iMac
Mac Pro
Xserve

Also note that I use regex to match the search term so feel free to use a regex string as the search term.

If you like this project please subscribe to my feed, and follow me on twitter or tumblr and say hi.