8 3 rd Wave: Predictive Analytics Predictive Analytics focuses our attention on important / suspect transactions. Comes in many different flavors o Each somewhat more sophisticated o Each making audit work more accurate and our lives easier (GTAG 16, 2011, The use of data analysis can significantly reduce audit risk by honing the risk assessment and stratifying the population ) Personal Computers Electronic Spreadsheets The end of hand calculation ERPs data in one place Database analysis Age of Rules Predictive Analytics Sophisticated Statistical Insights True Predictive & Continuous Audit

10 Statistical Insights: Benford s Law The most famous name in forensic accounting does not belong to an accountant. In 1938 at the age of 55 he published a paper titled The Law of Anomalous Numbers. Benford s Law is a statement about the occurrence of digits in lists of data. Useful in detecting fraudulent invoices or other numbered documents Frank Benford ( ), an American physicist.

12 Which to Investigate? For distributions that appear to be anomalous: 1. Calculate the Kolmogorov- Smirnov distance between the vendor s first digit distribution and the ideal Benford distribution. 2. Investigate those with the largest numerical scores. Benford s Law of first digit distribution follows a logarithmic pattern and applies to a large number of surprising datasets including country populations, Twitter users by follower count and many more. See testingbenefordslaw.com for more examples. Kolmogorov-Smirnov distance is the absolute value of the greatest distance between the cumulative distribution functions (CDF). Source: Graph: Pivotal, Inc., Machine Learning for Forensic Accounting,

13 Fuzzy Logic Duplicate Invoice Detection Problem: Deterministic rules expect key information to be exactly the same. Vendor name Address Phone Invoice amount Date Bank account TIN If the criteria is kept tight: Too many false negatives missed duplicates. If the criteria is made loose: Too many false positives result in too many items to investigate

15 Fuzzy Matching in Numerical Strings Numerical Values (strings) are considered close when: Invoice IDs Edit distance is small Dates Are the same Are within 7 days of each other Are inversed (3/11/14 vs 11/3/14) Payments Amounts are identical Edit distances are small TINS, Bank Accounts, Other Numerics Edit distances are small Substitutions Additions Deletions Transposes Edit Distance calculated with the Damerau-Levenschtein value

16 Fuzzy Matching Using as many features of the invoice as desired o Not limited to 3 dimensions 1. Determine the best distance metric for each dimension o o Some are text-based Others numerical strings 2. Calculate the distance between invoices 3. Adjust the measurement values to yield the best true positive result 4. Investigate any pair of invoices where the distance is within your threshold

19 Type 1: Prediction by Scoring ML continuously monitors and scores from 1 to 100 examine only the high scoring items. Your Financial System Future You Do this once - ML learns what is FWA Examine lots of possible FWA invoices every month Machine Learning System Current You

25 What are Big Data Analytics? 1 st The haystack gets a lot bigger Traditional structured data Unstructured data o Documents o o Web content o Social Media 2 nd Thanks to Hadoop and Massive Parallel Processing Query and retrieval times are short Cost of even massive storage is very low 3 rd Many predictive modeling techniques can also be applied to structured and unstructured data Models become more accurate 4 th New techniques for unstructured data based on NLP Sentiment analysis

26 Focus on Social Media Risks* *Risk also arises from other types of unstructured and semi-structured data: Internal documents Images stored centrally or on users machines

27 Social Media Risks They gave me financial aid then I cancelled all my classes and kept the money Sit in at the Chancellor s Office at 3:00 Joe sold me the answers to tomorrow s test Can t believe how much I made on ebay today I ll fix them. I put a virus on the lab computer. Professor X is such a perv The instructor said I could make money after school fixing cars in the auto shop I just downloaded a bunch of student financial data from the finance system I found out they re cutting my budget. I m going to the union before this gets out Did you hear we re losing accreditation. Don t sign up next term Source: 2014 Internal Audit Capabilities and Needs Survey Report, Protiviti

28 You Don t Need to be a Data Scientist, Just a Smart Tool User The Age of Smart CAATs Personal Computers Electronic Spreadsheets The end of hand calculation ERPs data in one place Database analysis Age of Rules Predictive Analytics Statistical Insights True Predictive & Continuous Audit Social media, text, image Improved accuracy Cost effective continuous audit

Qi Liu Rutgers Business School ISACA New York 2013 1 What is Audit Analytics The use of data analysis technology in Auditing. Audit analytics is the process of identifying, gathering, validating, analyzing,

www.cognitro.com/training Predicitve DATA EMPOWERING DECISIONS Data Mining & Predicitve Training (DMPA) is a set of multi-level intensive courses and workshops developed by Cognitro team. it is designed

Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

Big Data Executive Full Questionnaire Big Date Executive Full Questionnaire Appendix B Questionnaire Welcome The survey has been designed to provide a benchmark for enterprises seeking to understand the

DEMYSTIFYING BIG DATA What it is, what it isn t, and what it can do for you. JAMES LUCK BIO James Luck is a Data Scientist with AT&T Consulting. He has 25+ years of experience in data analytics, in addition

CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address

Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of

Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize

Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

Analytics for Banks and Finance Companies November 6, 2016 Outline About AlgoAnalytics Problems we can solve Our experience Technology Page 2 About AlgoAnalytics Analytics Consultancy Work at the intersection

DATAOPT SOLUTIONS What Is Big Data? WHAT IS BIG DATA? It s more than just large amounts of data, though that s definitely one component. The more interesting dimension is about the types of data. So Big

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate Description The Helzberg School of Management has launched two graduate-level certificates: one in Data

White Paper: SAS and Apache Hadoop For Government Unlocking Higher Value From Business Analytics to Further the Mission Inside: Using SAS and Hadoop Together Design Considerations for Your SAS and Hadoop

MACHINE LEARNING [Hands-on Introduction of Supervised Machine Learning Methods] DURATION 2 DAY The field of machine learning is concerned with the question of how to construct computer programs that automatically

Finding the signals in the noise Niklas Packendorff @packendorff Solution Expert Analytics & Data Platform Legal disclaimer The information in this presentation is confidential and proprietary to SAP and

Table of Contents Data Analysis Then & Now 1 Changing of the Guard 2 New Generation 4 Core Data Analysis Tasks 6 Data Analysis Then & Now Spreadsheets remain one of the most popular applications for auditing

Mining Text Data for Useful Information in Higher Education John Zilvinskis Indiana University Institutional Researchers Credo We have not succeeded in answering all our problems indeed we sometimes feel

Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used

Get M.A.D. with the Numbers! Moving Benford s Law from Art to Science BY DAVID G. BANKS, CFE, CIA September/October 2000 Until recently, using Benford s Law was as much of an art as a science. Fraud examiners

COOKBOOK SERIES 7 Steps to Successful Data Blending for Excel What is Data Blending? The evolution of self-service analytics is upon us. What started out as a means to an end for a data analyst who dealt

Collaborations between Official Statistics and Academia in the Era of Big Data World Statistics Day October 20-21, 2015 Budapest Vijay Nair University of Michigan Past-President of ISI vnn@umich.edu What

INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate

Leading Trends in Conducting Leading Trends in Conducting Risk-based Data Analytics for Internal Audit and Compliance Top issues for Life Sciences companies what we are seeing in Asia Bribery and corruption