Machine Learning in Auditing

Current and Future Applications

In Brief

Machine learning provides the potential for significant improvements in audit speed and quality, but also entails certain risks. The authors provide a general overview of machine learning, including some important terminology, and explore current and potential future uses in the audit profession. They also examine the challenges that machine learning technology presents and the possible impact that machine learning will have on CPA firms and their staff.

***

Machine learning is a key subset of artificial intelligence (AI), which originated with the idea that machines could be taught to learn in ways similar to how humans learn. While humans are just beginning to comprehend the dynamic capabilities of machine learning, the concept has been around for decades. The proliferation of data, primarily due to the rise of the Internet and advances in computer processing speed and data storage, has now made machine learning a significant component of modern life. Common examples of machine learning can be found in e-mail spam filters and credit monitoring software, as well as the news feed and targeted advertising functions of technology companies such as Facebook and Google.

Machine learning has the potential to disrupt nearly every industry during the next several years, and the auditing profession is no exception (Julia Kokina and Thomas H. Davenport, “The Emergence of Artificial Intelligence: How Automation Is Changing Auditing,” Journal of Emerging Technologies in Accounting, Spring 2017, http://bit.ly/2Heshyk). Jon Raphael, chief innovation officer at Deloitte, expects machine learning to significantly change the way audits are performed, as it enables auditors to largely “avoid the tradeoff between speed and quality” (“Rethinking the Audit,” Journal of Accountancy, Apr. 1, 2017, http://bit.ly/2Vxx7RB). Rather than relying primarily on representative sampling techniques, machine learning algorithms can provide firms with opportunities to review an entire population for anomalies. When audit teams can work on the entire data population, they can perform their tests in a more directed and intentional manner. In addition, machine learning algorithms can “learn” from auditors’ conclusions on specific items and apply the same logic to other items with similar characteristics.

Machine learning technology for auditing is still primarily in the research and development phase. Several of the larger CPA firms have machine learning systems under development, and smaller firms should begin to benefit as the viability of the technology improves, auditing standards adapt, and educational programs evolve. This article explains how machine learning works, describes its current and potential impact on the auditing profession, and presents some challenges for auditors that must be addressed for machine learning tools to reach their full capabilities.

What is Machine Learning?

Machine learning is a subset of artificial intelligence that automates analytical model building. Machine learning uses these models to perform data analysis in order to understand patterns and make predictions. The machines are programmed to use an iterative approach to learn from the analyzed data, making the learning automated and continuous; as the machine is exposed to increasing amounts of data, robust patterns are recognized, and the feedback is used to alter actions. Machine learning and traditional statistical analysis are similar in many regards, but different in execution. While statistical analysis is based on probability theory and probability distributions, machine learning is designed to find the combination of mathematical equations that best predict an outcome. Thus, machine learning is well suited for a broad range of problems that involve classification, linear regression, and cluster analysis.

Machine learning approaches can mostly be divided into two categories: supervised and unsupervised learning (Nikki Castle, “Supervised vs. Unsupervised Machine Learning,” Datascience.com Blog, July 13, 2017, http://bit.ly/2VqkT8s). Supervised learning algorithms use labeled examples, which means that there are inputs with known outputs. Supervised learning is used in situations where historical data can be used to predict future outcomes, such as determining which customers are most likely to default on their debt. Unsupervised learning is used where there are no labels on the output variables; the system is not “told” what the assumed answer is, but instead figures out the data patterns on its own. Unsupervised learning contains different techniques that can be used on transactional data (e.g., cluster analysis) and may be beneficial if used as part of the risk assessment process to discover previously unforeseen risks. There is also semisupervised learning, which contains a combination of labeled and unlabeled output data.

Artificial neural networks are an important part of the future of AI and machine learning (see Exhibit). Data systems can be set up to form simple or multilayered neural networks. Similar to neurons in the human brain, artificial neural networks are connected via nodes. Deep learning combines the computing power of machines with the connection patterns in neural networks to understand complex relationships such as medical diagnosis and location recognition.

Exhibit

Relationship between AI, Machine Learning, and Deep Learning

One problem with advanced machine learning can be “overfitting,” where the computer picks up idiosyncrasies in the data that are not representative of patterns in the real world. This can happen when, for example, the model is tested on the same data that was used to build it. Overfitting can result in the machine “forgetting” that statistically significant correlations between variables do not necessarily imply a causal relationship. Conversely, “underfitting” occurs when the model is not complex enough to pick up patterns in the data. Due to this potential for flaws in data output, human understanding and judgment are still critical components in machine learning. Users need to have an understanding of the nature of the inputs, data, what the machine does with the data, and the final output.

The predictive reliability of machine learning is dependent on the quality of the historical data that has been input. New and unforeseen events may create invalid results if left unidentified or inappropriately weighted. As a result, human biases can play an important role in the use of machine learning. Such biases can affect which data sets are chosen for training the AI, the methods chosen for the process, and the interpretation of the output. Finally, although machine learning has great potential, its models are still currently limited by many factors, including data storage and retrieval, processing power, algorithmic modeling assumptions, and human understanding and judgment.

Deep learning combines the computing power of machines with the connection patterns in neural networks to understand complex relationships.

Current and Potential Future Uses

Although there are limitations to the current capabilities of machine learning, it excels at performing repetitive tasks. Because an audit requires a vast amount of data and has a significant number of task-related components, machine learning has the potential to increase both the speed and quality of audits. The machine-based performance of redundant tasks should allow auditors more time for review and analysis, which would give them a greater ability to focus on the areas of greatest risk, as well as a better understanding of the larger picture (Bill Brennan, Mike Baccala, and Mike Flynn, “Artificial Intelligence Comes to Financial Statement Audits,” CFO.com, Feb. 2, 2017, http://bit.ly/2Jx3CYO).

Current uses.

Audit firms are already testing and exploring the power of machine learning in audits. One example is Deloitte’s use of Argus, a machine learning tool that can read documents such as leases, derivatives contracts, and sales contracts. Argus is programmed with algorithms that allow it to identify key contract terms, as well as trends and outliers (Ben Kepes, “Big Four Accounting Firms Delve into Artificial Intelligence,” Computerworld, Mar. 16, 2016, http://bit.ly/30jYmxo). Auditors can then focus on interpreting the key features of documents (Raphael 2017). It is not difficult, for example, to imagine a machine reading a lease contract, identifying the key terms, and determining whether the lease is capital or operating. If designed appropriately, machine learning tools could also identify patterns and outliers, such as nonstandard leases with significant judgments (e.g., those with unusual asset retirement obligations). This would allow auditors to focus specifically on the contracts with the highest inherent risk, thus improving both the speed and quality of the audit.

Another example of machine learning technology currently used by PricewaterhouseCoopers is Halo (Kokina and Davenport 2017). Halo analyzes journal entries and can identify potentially problematic areas, such as entries with keywords of a questionable nature, entries from unauthorized sources, or an unusually high number of journal entry postings just under authorized limits. Halo allows auditors to test every journal entry a company made during a given year; by subjecting all journal entries to testing and focusing only on the outliers with the highest risk, both the speed and quality of testing procedures are increased.

Potential future uses.

CPA firms and academics are already studying additional ways that machine learning can be used in financial statement audits, particularly in the risk assessment process. Ting Sun and Miklos Vasarhelyi propose that machine learning technologies such as speech recognition could be used for the executive fraud interviews required by auditing standards (“Deep Learning and the Future of Auditing: How an Evolving Technology Could Transform Analysis and Improve Judgment,” CPA Journal, June 2017, http://bit.ly/2VYCI2r). The software could recognize when interviewees give questionable answers, such as “sort of” or “maybe,” that suggest deception. Speech recognition technology could also identify significant delays in responses, which might also indicate concealment.

Facial recognition technologies could someday be used for fraud interviews as well. The University of Arizona is working with the Department of Homeland Security to develop software that uses facial recognition to identify facial patterns that suggest excess nervousness or deceit during entrant interviews (Steve Sutton, Matthew Holt, and Vicky Arnold, “The Reports of My Death Are Greatly Exaggerated: Artificial Intelligence Research in Accounting,” International Journal of Accounting Information Systems, September 2016, http://bit.ly/2JCgnBu). Although many accounting firms train their employees on how to conduct fraud interviews, it can be difficult for a human to detect certain behavioral patterns consistently and in real time. The assistance of speech and facial recognition technology in fraud interviews could certainly complement auditors and notify them when higher-risk responses require further investigation.

CPA firms are already studying additional ways that machine learning can be used in financial statement audits, particularly in the risk assessment process.

In the future, machine learning technology could allow CPA firms to detect patterns that currently might otherwise go unnoticed. For example, a restaurant might use historical financial data related to satellite imagery of parking lots, guest count information obtained from point of sale systems, and restaurant employee schedules to demonstrate a strong correlation between high revenues and the number of cars in parking lots during peak hours, high customer guest counts, and high employee wages. By recognizing these patterns, the system could identify locations with revenues inconsistent with vehicle counts, guest counts, or wages. This would allow the auditors to focus on restaurants with inconsistencies rather than selecting restaurants on a random basis.

Some restaurant companies are already utilizing machine learning technology to make better predictions of customer behavior. For example, McDonald’s is using smart kiosk technology to recommend products based on season, weather, and new or repeat customer preference (Trevor Mogg, “McDonald’s to Use AI to Tempt You into Extra Purchases at the Drive-thru,” DigitalTrends.com, Mar. 26, 2019, http://bit.ly/2w43BDF). Auditors may be able to leverage this data by using machine learning tools to gain a better understanding of the activity behind the numbers. The machine learning algorithm may find inconsistencies with traditional metrics such as turns per hour, average revenue per turn, and outside deliveries; auditors would need to investigate these inconsistencies. In addition, this customer behavior could provide audit teams with valuable insight as they try to make independent predictions for analytical procedures using supervised learning techniques. Unsupervised learning techniques may also reveal previously hidden risks.

Academics are also beginning to explore nontraditional data relationships. Research by Kyunghee Yoon (Three Essays on Unorthodox Audit Evidence, doctoral dissertation, Rutgers University, 2016, http://bit.ly/2VmN4VJ) studied the impact of weather on sales. Specifically, unfavorable weather conditions were expected to hinder customer store visits and thus decrease sales. The results discovered that the model was less accurate than using data from stores with similar characteristics, but that adding weather variables to the peer store data improved predictability. Extrapolating from this, the use of nontraditional data relationships could become an important component in audits, particularly as a part of risk assessment procedures.

Challenges for Auditors

Audit firms and regulators must overcome several barriers in order for machine learning technologies to reach their full capabilities. Obtaining relevant and useful data (particularly nonfinancial data) from clients and external sources may be difficult. Due to statutory and regulatory limitations, auditors do not typically have access to vast amounts of information from data stores like Google or Facebook. Auditors are also bound by certain ethical and client confidentiality requirements, which may limit their ability to access the quality and quantity of data needed to build their training datasets (Hussein Issa, Ting Sun, and Miklos Vasarhelyi, “Research Ideas for Artificial Intelligence in Auditing, The Formalization of Audit and Workforce Supplementation,” Journal of Emerging Technologies in Accounting, Fall 2016, http://bit.ly/2VVIF0j).

When relevant and useful data is available for use, auditors must understand and test the internal controls over data integrity and validate the completeness and accuracy of the input data in order to rely on the output. Data security and information integrity will be critically important in determining the reliability of the input data used in machine learning. Auditors will need to work with cybersecurity experts to determine that the client data is secure; otherwise, unauthorized access to financial and nonfinancial data may allow for inappropriate data manipulation that could skew the results.

If the use of nontraditional data increases, auditing documentation standards will also need to change. Rather than simply documenting why certain procedures were performed and explaining why samples were representative of total populations, auditors will need to document their evaluation and application of the data analysis. The ability to use a broader range of inputs could give auditors a much greater ability to understand the impact of environmental forces on a company’s business. Although there is the potential for deeper and broader understanding, auditors will need to remain skeptical about machine learning results; the patterns identified may not be accurate or even logical. As a result, future auditors will likely have to work with data scientists to understand the algorithms, similar to how current auditors work with information technology, actuarial, and valuation experts. Auditors will also need to understand the reasonableness of the output in order to appropriately document why causation is implied so that audits do not become subject to spurious correlations (i.e., auditors need to be able to understand and effectively document why causal relationships are or are not actually present).

Although there is the potential for deeper and broader understanding, auditors will need to remain skeptical about machine learning results; the patterns identified may not be accurate or even logical.

Because of the inherent limitations of machine pattern-finding, auditors will continue to need an understanding of the individual business and its industry, as well as the external business environment and societal forces. For example, user accounts might be the best predictor of revenues for companies such as Facebook and therefore should be given the appropriate weighting in the internal algorithm. Without judgment as to what to specifically look for, the authenticity of accounts and the presence of “bots” may not be detectable by machines and could lead auditors to reach incorrect conclusions. Auditors will need to understand and validate the completeness and accuracy of the input data in order to reach an appropriate conclusion on the output. Furthermore, there will always be potential blind spots when evaluating empirical evidence; therefore, an auditor’s intuition will likely continue to be an important source of knowledge.

Auditors will also need to understand and consider the role of human biases entered to the machine. These include availability bias, confirmation bias, overconfidence bias, and anchoring bias (Rebecca Fay and Norma R. Montague, “I’m Not Biased, Am I?” Journal of Accountancy, Feb. 1, 2015, http://bit.ly/2JBjM3f). Auditors may be subject to availability bias in using the most easily accessible information to identify risks or form conclusions; they should also be aware of the possibility of confirmation bias, which would manifest in overweighting or using only input data that supports preexisting beliefs. Although overconfidence bias tends to arise in individuals who overestimate their own abilities, machine learning could create a new type of over-confidence bias in auditors who become too reliant on machine outcomes and fail to investigate the appropriateness of the input data and weighting of the machine learning results. Anchoring biases may become more of a risk as audit clients develop and enhance their own machine learning tools and auditors begin to “anchor” their input data with the client’s data, rather than considering alternative options and contradictory evidence.

Given the potential ambiguity of the input data used and results that are subject to interpretation, the auditing standards for machine learning tools will likely need to change. For example, the standards for analytical procedures require the auditor to make certain assumptions in performing analytical procedures; however, a significant benefit of machine learning centers on its potential ability to find unique or unusual relationships.

What Comes Next

Machine learning is likely to have a large impact on the audit profession in the near future. KPMG uses IBM’s Watson on its audit engagements, and PricewaterhouseCoopers and Ernst & Young have machine learning tools in place as well. As machine learning tools become more readily available, more opportunities will be available for smaller CPA firms. Robotic process automation (RPA) technology is already having an impact on the audit process by automating routine tasks (Miklos Vasarhelyi and Andrea Rozario, “How Robotic Process Automation Is Transforming Accounting and Auditing,” CPA Journal, June 2018, http://bit.ly/2F7t5aE), and intelligent process automation tools that expand on RPA by learning from prior decisions to automatically adjust themselves are currently under development (Naveen Joshi, “Robotic Process Automation Just Got ‘Intelligent’ Thanks to Machine Learning,” Forbes, Jan, 29, 2019, http://bit.ly/2JLadPh). Many academics and practitioners alike speculate that the new technology will reduce the demand for auditors. But a deeper look into how the technology works leads the authors to believe that it is still too early to conclude how dramatic the impact will be. The skills necessary to be an auditor will certainly change with the automation of repetitive and redundant tasks.

Researchers are beginning to identify and uncover the current and potential uses that machine learning may have on the audit. Researchers are limited, however, by the amount of readily available empirical evidence due to the newness of the technology. In the very near term, there should be a plethora of research opportunities, particularly if CPA firms are willing to collaborate with academics.

While machine learning technology can provide significantly improved opportunities for auditors to explore their intuition, auditors must change their mode of thinking in order for these insights to be effective.

In Disrupt IT: A New Model for IT in the Digital Age, Ian Cox notes the importance of engagement and collaboration when dealing with disruptive technologies (Axin Ltd., 2014). Audit firms will need to ensure that their audit teams are engaged and collaborative. New technologies may even require that audit firms and their engagement teams become more agile (Richard Newmark, Gabe Dickey, and William Wilcox, “Agility in Audit: Could Scrum Improve the Audit Process?” Current Issues in Auditing, Spring 2018, http://bit.ly/2HlcnUt). It is not unrealistic to think that the risk assessment process (i.e., identification of audit areas of focus) could also be enhanced later in the audit process, as machine learning allows auditors to become more adaptable; this improved adaptability could complement the upfront planning part of the process. As such, CPA firm leaders may need to reexamine how audit teams are staffed and structured.

Future auditors will need to become more versatile and have a solid understanding of information systems, data science, and general business, in addition to an increasingly complex set of accounting and auditing rules and regulations. Whereas in the past audits have had a largely transactional focus, future audits will become increasingly interconnected. Audit firms need to be aware of changing auditor skillsets in order to help manage the disruption risks associated with machine learning technologies.

While machine learning technology affords auditors a greater ability to consider internal systematic relationships and external environmental forces, auditors must also exhibit a solid understanding of the input, processing, and output of data from a broader range of sources. In addition, while machine learning technology can provide significantly improved opportunities for auditors to explore their intuition, auditors must change their mode of thinking in order for these insights to be effective. Although it is impossible to foretell exactly how machine learning will ultimately change the audit process, now is the time to begin contemplating its current impact and future implications.

Gabe Dickey, CPA is an assistant professor of accounting at the University of Northern Iowa, Cedar Falls, Iowa.

Sandra Blanke, PhD is an associate professor of management–cybersecurity at the University of Dallas, Irving, Tex.

Lloyd Seaton, PhD, CPA, CMA is an associate professor of accounting at the University of Northern Colorado, Greeley, Colo.

About The CPA Journal

The CPA Journal is a publication of the New York State Society of CPAs, and is internationally recognized as an outstanding, technical-refereed publication for accounting practitioners, educators, and other financial professionals all over the globe. Edited by CPAs for CPAs, it aims to provide accounting and other financial professionals with the information and analysis they need to succeed in today’s business environment.