H&R Block "MyBlock" App + USA Government Website Analytics = PROFIT

I like data mining. For better or worse, it's the gold of the digital age. So
when the USA government decided to make the analytical data for their publicly
facing websites available for download, I
jumped at the opportunity. Thanks to this lovely data source, I can get
insights into how popular various browsers and operating systems are, how
frequently devices connect to USA government websites from foreign IP
address, and more.

Sadly, the website only offers metrics for the past 30 days. Luckily, it's
pretty easy to setup a raspberry pi or other small device to periodically fetch
the freshest numbers and build a larger dataset. This is what I've been doing
since August of 2016. If you're interested, send me an email and I'll be happy
to share. After all, according to the government's website: "this website and
its data are free for you to use without restriction."

Continuing my story, I was skimming over the most recent metrics when I noticed
a funny browser user-agent:

HRB-MOBILE-IOS-PHONE-MYBLOCK-TOUCHID-6.1.0-Mozilla

With a quick search, I figured out that
MyBlock is a mobile app
offered by H&R Block. More interesting though is the juicy information H&R
Block decided to embed in these user-agent strings. As we can see, they contain
the name of the app, the version number, the OS (iOS or Android), the
device form factor (phone or tablet), and in the case of iOS, it even mentions
if TouchID or FaceID was used. As a security researcher, I'm particularly
interested in this last tidbit because people use H&R Block to file taxes and
these user-agents started appearing January 7, 2018 (i.e., tax season). So how
many people use the various authentication methods offered by Apple to protect
their tax filing app? Let's find out!

The following is a small Python script I wrote to filter the data. The parsing
and filtering leaves much to be desired, but I didn't want to spend too much time
on such a simple task:

#!/usr/bin/env pythonimportsysimportjsondefparse_subtokens(tokens):""" Parses subtokens and returns a dictionary. If invalid, None is returned. We expect Android user agents to be in the form of: HBR MOBILE ANDROID [PHONE|TABLET] MYBLOCK [VERSION] <BROWSER> and iOS user agents to be in the form of: HBR MOBILE IOS [PHONE|TABLET] MYBLOCK <TOUCHID|FACEID> [VERSION] [BROWSER] """res={}iftokens[0]!='HRB':returnNoneiftokens[1]!='MOBILE':returnNoneiftokens[2]!='ANDROID'andtokens[2]!='IOS':returnNoneres['OS']=tokens[2]iftokens[3]!='PHONE'andtokens[3]!='TABLET':returnNoneres['DEVICE']=tokens[3]iftokens[4]!='MYBLOCK':returnNoneres['APP']=tokens[4]iftokens[2]=='ANDROID':iflen(tokens[5:])==1:res['BROWSER']='N/A'res['VERSION']=tokens[-1]res['AUTH']='N/A'eliflen(tokens[5:])==2:res['BROWSER']=tokens[-1]res['VERSION']=tokens[-2]res['AUTH']='N/A'else:returnNoneiftokens[2]=='IOS':iflen(tokens[5:])==2:res['BROWSER']=tokens[-1]res['VERSION']=tokens[-2]res['AUTH']='N/A'eliflen(tokens[5:])==3:res['BROWSER']=tokens[-1]res['VERSION']=tokens[-2]res['AUTH']=tokens[-3]else:returnNone# Cleanups:# 1) Some versions of the Android app prefix 'v' onto versionifres['VERSION'][0]=='v':res['VERSION']=res['VERSION'][1:]assertlen(res)==6returnresdefis_hrb(year,filter,line):""" Validate that a line should be parsed and added to the buckets. Specifically, entry should contain the right year, be a HRB user-agent, and contain the filter keyword if one was provided. """ifline[:4]!=year:returnFalseifline[11:14]!='HRB':returnFalseifnotfilterisNoneandnotfilterinline:returnFalsereturnTrueif__name__=='__main__':iflen(sys.argv)<3:print'Usage:',sys.argv[0],'<tax-year>','<filter>','<filepath>'sys.exit(0)iflen(sys.argv)==3:filter=Noneelse:filter=sys.argv[2]withopen(sys.argv[-1],'r')asifile:data=[line.strip()forlineinifileifis_hrb(sys.argv[1],filter,line)]buckets={'OS':{'IOS':0,'ANDROID':0,},'DEVICE':{'PHONE':0,'TABLET':0,},'APP':{'MYBLOCK':0,},'AUTH':{'TOUCHID':0,'FACEID':0,'N/A':0,},'VERSION':{},'BROWSER':{},}forlineindata:tokens=line.split(',')iflen(tokens)!=3:print'WARNING: Cannot tokenize:',linecontinuesubtokens=parse_subtokens(tokens[1].split('-'))ifsubtokensisNone:print'WARNING: Cannot subtokenize:',tokens[1].split('-')continuetry:count=int(tokens[-1])exceptValueError:print'WARNING: Could not parse count from:',linecontinuebuckets['OS'][subtokens['OS']]+=countbuckets['DEVICE'][subtokens['DEVICE']]+=countbuckets['APP'][subtokens['APP']]+=countbuckets['AUTH'][subtokens['AUTH']]+=countifsubtokens['VERSION']inbuckets['VERSION']:buckets['VERSION'][subtokens['VERSION']]+=countelse:buckets['VERSION'][subtokens['VERSION']]=countifsubtokens['BROWSER']inbuckets['BROWSER']:buckets['BROWSER'][subtokens['BROWSER']]+=countelse:buckets['BROWSER'][subtokens['BROWSER']]=countprintjson.dumps(buckets,indent=4)

Results

So here's what I uncovered, listed in no particular order:

From January 7 through February 8, 232,248 requests were made by MyBlock
apps.

230,226 requests were made from phones while 2,022 were tablets; over
99% of the requests were phones.

0 requests were made by Android tablets.

Over 99% of requests were made by devices running iOS.

Two versions of the app appear in the dataset: 6.0.0 and 6.1.0.

Version 6.1.0 makes up over 99% of the requests.

The first requests made by version 6.1.0 occurred on January 13; 6 days
after the first 6.0.0 request.

100% of requests from Android devices were version 6.1.0.

The requests made from Android devices contain no information about
authentication method or browser.

Discussion

For the requests from iOS devices that didn't mention an authentication method
in their user-agent, I assume the user typed in a password or pin, though I
haven't confirmed this. I also haven't looked into why all the iOS requests have
"Mozilla" at the end of their user-agent. It's probably related to the browser
framework used by the MyBlock app.

Judging by the fact that no requests from version 6.0.0 of the app used FaceID,
it's possible that this feature wasn't implemented until 6.1.0, though this is
just speculation.

Most interestingly, users appear to be comfortable with using Apple's TouchID
to protect their MyBlock. Even more interesting is that people are comfortable
with using FaceID, considering that this feature is relatively new. It appears
that in mobile computing, biometric authentication is a widely accepted trend.

It's also worth mentioning that while MyBlock doesn't appear to have been available during
the 2017 tax season, another H&R Block app does appear:

HRB-MOBILE-IOS-PHONE-TAXES-6.4-Mozilla

This app seems to have two version: 6.4 and 6.3, but the total number of
requests is very low; only a few thousand. Another interesting finding is
13 requests made on April 26, 2017 with this user-agent:

HRB-MOBILE-IOS-PHONE-TAXES-nil-Mozilla

Perhaps this was a test version of the app?

Future Work

We still have 2 months to go in this year's tax season, so I'll be interested
to check the numbers once the season closes. I'm also interested to see how
many people continue to use this app outside of the tax season and how these
results will change in 2019.