The growth of Internet banking services led to a corresponding increase of online banking frauds. These are becoming more and more sophisticated, seriously threatening the security and trust of online banking business. In this thesis we propose BankSealer, an effective online banking semisupervised and unsupervised fraud and anomaly detection framework, with the goals
of automatically detecting frauds and anomalies in a real online banking dataset. For the realisation of this project, we collaborated with an an IT security company and an important Italian banking group. BankSealer builds a profile for each customer on the basis of his or her past transactions, and can detect never seen before frauds in large banking datasets. It uses methods with a clear statistical meaning in order to provide the analyst a justifiable score and an easy to understand model of each customer’s spending habits.
In particular, we developed three complementary types of analysis on user behaviour, based on tools to measures the anomaly of new transactions. The first is a local profile analysis, which computes the HBOS and measures the
anomaly of a transactions with respect to the user’s historical behaviour profile. The second is a global profile analysis, which measures the anomaly of the user’s behaviour with respect to all other users. It uses an iterative version of DBSCAN. The last is a temporal profile analysis, which uses a
threshold monitoring system to measure the anomaly of the current spending pattern of each user. In addition to these models, BankSealer addresses the rarely considered problem where the lack of past data prevents the building
of a well trained profile, and is also able to follow the change of habits of the
users through the updating of their models. Finally, we implemented a web
application to show the potential of our method and the representation of its results. Our test on the given anonymised dataset with synthetically injected frauds, show that BankSealer is able to detect even very complex frauds with a generally high degree of accuracy and low computational complexity, despite the limitations of the context.