I installed media-wiki on ec2 machine and loaded the wikipedia page-articles dump content and other relevant data into that. I want to update the data regularly(daily) but I didn't get any resource for that. Is there any source from where I can get wikipedia daily updates?

Note: I can get the list of modified files using bots. But I need to get the modified content also.

3 Answers
3

Just download the normal dump whenever it becomes available. This means you won't get daily updates at all. For the English Wikipedia, a new dump is generated about once a month.

Use the adds/changes dumps (already mentioned by ojdo). There are two problems with using this:

It doesn't include all changes. Specifically, information about moves, deletes and some undeletes are not included.

It's an experimental feature and I wouldn't be surprised if it were discontinued in the near future (because a better option is coming, see below).

Use the API (possibly combined with the IRC recent changes feed) to get the text of all new revisions to a wiki. I think this might work for very small wikis, but it's certainly not feasible for huge active wikis like the English Wikipedia.

Use binary incremental dumps (which is a project I built over this summer). This will do exactly what you want: it will allow you to download only changes since the last dump and it should allow creating dumps much more often (the hope is for daily dumps). The only problem is that this is not live yet, so you will have to wait before using this (I have no idea how long, but I would expect it to go live this year).

It's strange that Wikimedia hasn't come up yet with an easy way to "track" changes. OpenStreetMap (diffs) seems to be in much better shape in comparison.
– ojdoSep 30 '13 at 11:17

Right now only the second solution helps in my project(Not completely..But something is better than nothing.). Dbpedia is getting live updates from wikipedia. I don't know how much effort it takes to make that live feed public to everyone.
– vinodOct 1 '13 at 5:15

@vinod I don't think DBPedia does that. For example, when I look at dbpedia-owl:wikiPageRevisionID of Albert Einstein, it points to a revision from April.
– svickOct 1 '13 at 8:05

@svick There are 2 versions of dbpedia. dbpedia, dbpedia-live. I was referring to dbpedia-live. Try the query SELECT * WHERE { <http://en.wikipedia.org/wiki/Albert_Einstein> ?p ?o } at "live.dbpedia.org/sparql" (DBPedia live data has been last updated on aug 29th, 2013. It is down for maintenance purpose. It should be up and running very soon.)
– vinodOct 1 '13 at 8:47

@vinod It seems you're right. It looks like that uses update feed service, which is not public.
– svickOct 1 '13 at 8:53

@Andra.. I tried that already but rate of error is more than we expected and they stopped providing updates from aug 29th(They told me that they would start providing them asap).
– vinodOct 14 '13 at 9:52