If we look at the US page, there are tables, so it should be easy to extract it. For instance,

Even if the war did last 1 day, we will say that the US were at war in 1811. The information we want to confirm can be “there were 21 full years – from Jan 1st till Dec 31st – where the US were not at war, once, during those years“. From the row above, we can claim that the US were at war in 1811. Most of the time, we have

I.e. there is a beginning (here 1775) and an end (1783). So here, the US are said to be at war in 1775, 1776, 1777, 1778, 1779, 1780, 1781, 1782, 1783. To extract the information, we look for regular expressions in the first column, with number, on 4 digits.

Well, sometimes it can be a bit tricky, since we have 3 dates, 1941, 1945 and (in the legend) 1944. But if we consider the minimal and the maximal dates, we have our range of dates.

Now that we we how to extract information, let’s do it. The code will be
library(stringr)
ext_date=function(x){
dates12="[0-9]{4}"
#grep(pattern = dates2, x = col1[1])
L=str_extract_all(as.character(x),dates12)
return_L=list()
if(length(L)>0){
for(j in 1:length(L))
if(length(L[[j]])==1) return_L[[j]]=as.numeric(L[[j]])
if(length(L[[j]])>=2) return_L[[j]]=seq(min(as.numeric(L[[j]])),max((as.numeric(L[[j]]))))
}
return(return_L)}

An Open Lab-Notebook Experiment

Some
sort of unpretentious (academic) blog, by a surreptitious economist and
born-again mathematician. A blog activist, and an actuary, too. Always curious.
Because academics are probably more than the sum of our publication lists, grants and conference talks...

Used to live in Paris (France),
Leuven (Belgium), Hong-Kong (China), and Montréal (Canada). Professor and researcher in
Montréal, currently back in Rennes (France). ENSAE ParisTech & KU Leuven Alumni