Other sites

European Fishing

I am playing around with Eurostat data and ggplot2 a bit more. As I progress it seems the plotting gets more easy, the data pre-processing a bit more simple and the surprises on the data stay.

Eurostat data

The data used are fish_fleet (number of ships) and fish_pr (production=catch+aquaculture). After a bit of year selection, 1992 and later, I decided to pull the data not as xls but as csv with formatting ‘1 234.56‘. The consequence is that the data now comes as tall and skinny, which may actually be better. However, the actual number format and ‘:’ for missing still make a bit of processing needed.

Plot about the fleet

Only preparation needed was to select Tonnage as property and use only countries. EFTA and EEA and EU have a number like 15, 25 or 27 in them

f2 <- fleet[grep(‘Tonnage’,as.character(fleet$VESSIZE)) ,]

f2 <- f2[-grep(’15|25|27′,f2$GEO),]

f2 <- f2[complete.cases(f2),]

f2$VESSIZE <- factor(f2$VESSIZE)

f2$GEO <- factor(f2$GEO)

order levels of VESSIZE by value for a nice displaylev <- gsub(‘(-|\\+).*’,”,levels(f2$VESSIZE))

nlev <- as.numeric(gsub(‘^[[:alpha:]]* ‘,”,lev))

f2$VESSIZE <- factor(as.character(f2$VESSIZE),

levels= levels(f2$VESSIZE)[order(nlev,lev)])

levels(f2$VESSIZE) <- gsub(‘Tonnage ‘,”,levels(f2$VESSIZE))

First aim is a dotplot of the last year (2010). With countries ordered by size of fleetf3 <- f2[f2$TIME==2010 ,]

f4 <- f3[f3$VESSIZE==’Total all Classes’,]

f3$GEO <- factor(as.character(f3$GEO),

levels=as.character(f4$GEO[order(f4$Number)]))

ggplot(f3,

aes(y=GEO,x=Number,colour=VESSIZE)) +

geom_point() +

labs(colour=’Tonnage’)

It seems Greece had the largest fleet. All my thoughts that Netherlands was a fishing country have been erased.

For a time related plot I chose to put the number of vessels on a logarithmic scale. As the number of countries is a bit large the biggest countries have been selected.

mfleet <- aggregate(f4$Number,list(GEO=f4$GEO),max)

bigfleet <- mfleet$GEO[mfleet$x>quantile(mfleet$x,1-9/nrow(mfleet))]

ggplot(f2[f2$GEO %in% bigfleet & f2$VESSIZE!=’Total all Classes’ ,],

aes(x=TIME,y=Number,colour=VESSIZE)) +

geom_line() +

facet_wrap( ~ GEO, drop=TRUE) +

scale_y_log10() +

labs(colour=’Tonnage’)

The interesting thing about this plot is that the number of vessels is decreasing. That is, except for one category, the biggest, more than 2000 Tonnage, there are only a few tens of those, but they must count for loads of smaller vessels.

Catch

Fish caught is probably same thing. In this case, SPECIES and GEO have far too many levels for a decent display. So the biggest catches are shown. On top of that three SPECIES categories are almost the same. These are ‘Total’, ‘Aquatic animals’ and ‘Finfish and invertebrates’.

Finfish probably needs an explanation. To quote wikipedia: Many types of aquatic animals commonly referred to as “fish” are not fish in the sense given above; examples include shellfish, cuttlefish,starfish, crayfish and jellyfish. In earlier times, even biologists did not make a distinction – sixteenth century natural historians classified also seals, whales, amphibians, crocodiles, even hippopotamuses, as well as a host of aquatic invertebrates, as fish.[15] However, according the definition above, all mammals, including cetaceans like whales and dolphins, are not fish. In some contexts, especially in aquaculture, the true fish are referred to as finfish (or fin fish) to distinguish them from these other animals.

c2010 <- catch[catch$TIME==2010,]

c2010 <- c2010[complete.cases(c2010),]

mcatch <- aggregate(c2010$Tonnes,list(GEO=c2010$GEO),max)

bigcatch <- mcatch$GEO[mcatch$x>quantile(mfleet$x,.5)]

c2010 <- c2010[c2010$GEO %in% bigcatch,]

mcatch <- aggregate(c2010$Tonnes,list(SPECIES=c2010$SPECIES),max)

bigcatch <- mcatch$SPECIES[mcatch$x>quantile(mcatch$x,.5)]

bigcatch <- bigcatch[!(bigcatch %in%

c(‘Aquatic animals’,’Finfish and invertebrates’))]

c2010 <- c2010[c2010$SPECIES %in% bigcatch,]

c2010$SPECIES <- factor(c2010$SPECIES)

ggplot(c2010,

aes(y=GEO,x=Tonnes,colour=SPECIES)) +

geom_point() +

labs(colour=’Tonnes live weight’)

The surprise here is Denmark. It is getting loads of fish. Same is true for UK, Spain

Combination of fleet and catch

Since we have both data sets, they can be combined. The merging id’s are GEO and TIME, which means the data have to be transposed beforehand. The newly created variables have Number and Tonnes in the newly created variables, which are not needed for me.

I like very much how ggplot2 defaulted TIME as colour variable. It shows very nicely how catches and fleets are getting smaller. The latter obviously not true for the biggest ships as seen above. It is also shown that Denmark and Iceland have remarkably efficient fleets. Small but catching loads of fish. In contrast Greece has a big fleet but small catch. That does not seem economical, but tonnes do not equal Euro’s. Regarding UK and Spain, yes the Spanish are just a bit bigger than the UK, so that pain may exist.

Catch per species

As a final, I wanted to look per species. However, this would be a bit too long for this blog, so I only show one. It runs in a function, which just takes a bit of string from the SPECIES variable. To keep the plot simple only the six largest countries are taken. Facet_wrap does two things here. It puts a title even if there is only one species and makes separate panes if more than one value for species fits the string.

byspecies <- function(species) {

ca <- catch[grep(species,catch$SPECIES,ignore.case=TRUE),c(-4,-5,-6)]

ca <- ca[complete.cases(ca),]

ca <- ca[!(ca$GEO %in% c(‘EFTA’,’EU (15)’,’EU (27)’)),]

ag <- aggregate(ca$Tonnes,list(GEO=ca$GEO),median)

ag <- ag[order(-ag$x),]

ca <- ca[ca$GEO %in% ag$GEO[1:6],]

ca$GEO <- factor(ca$GEO)

ggplot(ca , aes(y=Tonnes,x=TIME,colour=GEO)) +

geom_line() +

facet_wrap( ~ SPECIES, drop=TRUE)

}

byspecies(‘octop’)

Related

To leave a comment for the author, please follow the link and comment on his blog: Wiekvoet.