cloud web scraping tool

Menu

Category Archives: Projects

I saw recently an event at an university in Romania (Universitatea Politehnica Bucuresti) that aims to help students to choose the subject for their degree thesis. At this event companies are invited to present themes in front of students. You will find below a short list of themes related to our industry:

1. Automatic website classification

Possible categories: e-commerce, company website, news/blog, other.

2. Detecting website structure (and representing as a tree)

E.g. The first level of an online store contains main categories, second level sub categories and n level product page. The entire website can be represented as a tree.

3. Logo detection on internet

When detecting logos on a website page there are multiple issues that might occur. For example: many logos in same image, scaled logos.

Please let us know if you want to develop one of the above themes, and we will help you with results of our research.

I always thought that companies have needs that are different from those of end users (see classification by target, B2C or B2B). And I think that this hypotheses is also true in internet area. These days I was busy with developing a TheWebMiner Filter and I want to talk in the following lines about internet search.

What is internet searching?

What I understand (and maybe many of you) by search is sorting. Google, Bing and other search engines try hard to find most relative page for our query and results are impressive. A colleague of mine told me that if you describe a movie scenario in a Google search, Google will find the Wikipedia page of movie. But this is an end user point of view.

From the very beginning, one of the core services TheWebMiner provided was aggregated data and insight into the mobile app landscape. We managed to offer our clients custom aggregated data for all major mobile app marketplaces (iOS AppStore, Google Play, Amazon AppStore etc.) as well as primary analysis on the extracted data.

It’s been a while since our last post and lots of things happened in The Web Miner secret workshop so, be ready for we are about to launch a tool that should not be missed by no serious marketing department.

The problem we are trying to solve is the fast access to information that is needed to take a decision regarding the course of prices of a whole ecommerce. Just as it is said in entrepreneurship, if you heard it, it’s already to late to start a business with it, we can transcript that for those who set up the price policy of any store. They need to be in contact with the market without a break and to be aware that even for a slight change of their prices the sales could go up and down at an astonishing rate, because in the end if we neglect the rate of trust of users for certain sites or other facilities regarding shipping, all online shops are the same for final consumers.

How was this solved in the past is simple: it was not! Mainly because people were not aware of the advantages that a quick response rate to changes can give to a shop. Also before 2010 the competition between online businesses was not that high so this didn’t became a problem until rather recently.

A thought for the end is that the market will never finish evolving and that ecommerce shops haven’t reached the peak yet, and so, new rules are being added to this industry so better be aware. After all, better safe than sorry!

If you ever tried to use a tool for price comparison between two or more products of different online shops you definitely came in contact with the limitations that were imposed by such platforms. Usually this kind of apps work by periodically crawling a number of sites and periodically updating a certain product/price table. This is not very useful for users who want to choose products from various shops in different geographical areas or from smaller sites that haven’t been crawled by that certain app.

Now, maybe it’s a bit early to talk about the capabilities of this next product but it’s rounding up nice and can be a real help for finite consumers around the world and even for market analysts, and the best part is that it’s completely free, and with no commercials either. PriceAlert wants to be a new solution for measuring prices from various sites. Until now there’s nothing amazing but the technology behind allows users to compare prices on any ecommerce platform all over the world, because unlike other similar platforms limited to a number of well defined shops it uses an algorithm that automatically extracts data like specifications of that certain product or of course price and availability.

For now, only a beta version is available but new features are programmed to come up starting very soon. Useful proprieties like an email alert when a certain price has changed or statistics over the change in price for a period of time will be available, not to mention the capability of exporting the data gathered into various useful formats like excel or CSV.

So we think there’s no reason for you not to check out this interesting new toy and maybe leave a review in case that you feel so. You can find it here at the address //thewebminer.com/pricealert/ and we hope that together we can bring one more interesting tool to the use of people who need it.

We, at TheWebMiner we have often the need of processing large text files, and when i say large i mean files of few hundreds of Megabytes or bigger. Out of all the text processing tools that we’ve tested so far we concluded that the best was Vim or gVim (the windows version of this famous editor).

Regulated Expressions

Another useful tool in file processing are regulated expressions, or, more simple RegEx. These expressions help us find, or find and replace pieces of text of a certain format, all being done automatically. By combining the two definitions we discover a new problem.

How do we use Regulated Expressions in Vim?

Vim has its own format for RegEx so we cannot use standard regulated expressions Of Vim but we have created and put to your disposal a convertor for this purpose. You can find the converter on our site (www.thewebminer.com/regex-to-vim) , and we hope that this will come to your help.

Good day everyone, or should i say better bonjour, because along this week we have launched the french version of TheWebMiner.com.

It is a certitude that the need for data increases every day in every possible direction and we want to keep up with this trend. Although English is the language of the internet we want to reach also to other users from smaller environments that might need our services, and because French is the official language in 29 countries it seemed as an obvious choice.

So, from now on along with the English version and the Romanian, which is the base country of our company a third version is available to choose in the language menu from the upper right corner of our site.

We hope you will enjoy your experience and will provide a good feedback on our expansion.

We often need to process big text files (larger than 100 mb) and we discovered that best text editor for this is Vim and gVim (windows version). Also a powerful mode to process text automatically is to use regular expressions (also called RegEx).

Using RegEx in Vim

Vim doesn’t support standard RegEx, but we built a tool that converts standard regex to Vim regex. This tool it’s available here: RegEx to Vim.

We tested 4 RSS to Kindle delivery services and all of these are non-intuitive and paid services. This is the reason for we want to build a better service for this need. Another problem it’s that the most of RSS feeds put on stream only an excerpt of article not entire article. For this reason we created a tool that you can use to extract the main article of a page.

We hope to announce soon an complete service that delivers valuable content on Kindle or another eBook reader.