I'm having a really have time getting the beautifulsoup module to work with python 2.7.3 in eclipse/pydev

I'm new to python so I'm not sure if this could be an installation error on my part.. maybe I dont have eclipse and pydev setup correctly?? I can upload a screenshot of my eclipse setup if that would help. thx!!

any advise would really be appreciated I've been spinning my wheels on this for a day now!!

Either you're using an old version, or the new code you got has a problem with it. If you're sure you have what you're supposed to, you should contact the BeautifulSoup people, let them know exactly what version you're using and give them the full traceback.

Join the #python-forum IRC channel on irc.freenode.net for off-topic chat!

Please prefer not to PM members. The point of the forum is so that anyone can benefit. We don't want to help you over PMs/emails/Skype chats that others can't benefit from

micseydel wrote:Either you're using an old version, or the new code you got has a problem with it. If you're sure you have what you're supposed to, you should contact the BeautifulSoup people, let them know exactly what version you're using and give them the full traceback.

I'll try to install it again.. this sounds like a good option too!

Have you done any screen scraping with BeautifulSoup? I just stumbled on Scrapy.. I'm not sure if one is better then the other?

igeek wrote:I just stumbled on Scrapy.. I'm not sure if one is better then the other?

BeautifulSoup is for parsing html/xml. That's it. Scrapy is a more complete framework. E.g. it gives you a web crawler, allows you to automatically download images, etc. It's a lot more heavy-duty than BeautifulSoup and comes with more batteries. Because of that, it has a steeper learning curve (though not by that much) and may be more difficult to install (especially on Windows, where you'll need to get OpenSSL separately). The two actually serve quite different purposes (and IIRC, you could actually get Scrapy to use BeautifulSoup as its HTML parser).

Which you use depends on what you want to do. If you want to scrape some text from a well-known set of web pages, use BeautifulSoup. If you want to scrape entire websites (potentially with media content), use Scrapy.

@OP:Just this morning, someone on HN posted a link to a short web scraping in Python primer that you might find interesting. Also, check out HN comments for a good discussion of the relative merits of the various options available in Python (BeautifulSoup, lxml, mechanize, scrapy, pyquery, etc).

igeek wrote:I just stumbled on Scrapy.. I'm not sure if one is better then the other?

BeautifulSoup is for parsing html/xml. That's it. Scrapy is a more complete framework. E.g. it gives you a web crawler, allows you to automatically download images, etc. It's a lot more heavy-duty than BeautifulSoup and comes with more batteries. Because of that, it has a steeper learning curve (though not by that much) and may be more difficult to install (especially on Windows, where you'll need to get OpenSSL separately). The two actually serve quite different purposes (and IIRC, you could actually get Scrapy to use BeautifulSoup as its HTML parser).

Which you use depends on what you want to do. If you want to scrape some text from a well-known set of web pages, use BeautifulSoup. If you want to scrape entire websites (potentially with media content), use Scrapy.

I need a program to scrape a huge list of parts (inventory id, price, quantity, description.. ect) on several different e-commerce web sites on different servers.

I need it to update databases on a main web site.. this needs to happen every day and I need to know how much data was imported.The main web site is running using LAMP right now. I don't think I would need to download any images since I could just link to them via a URL.I need to be able to parse through HTML and it needs to be fairly easy to maintain since we will need to tweak the code once in a while.

setrofim wrote:@OP:Just this morning, someone on HN posted a link to a short web scraping in Python primer that you might find interesting. Also, check out HN comments for a good discussion of the relative merits of the various options available in Python (BeautifulSoup, lxml, mechanize, scrapy, pyquery, etc).