In the Philippines, if you’re travelling along a public road that is under construction, you’ll often encounter billboards that announce project details like its cost and target completion date. This is a good start. However, if you would really want to ascertain whether the citizens’ taxes are getting value for money, you need to be able to “follow the money” from budgeting, to procurement, to execution, and eventually to audit. Is this possible?

Tracking the Budget – From Planning to Evaluation

Yes. New information technologies allow you to do this. Globally, many countries are embracing mandatory e-Budget and e-Procurement systems, which require all steps of budgeting and contracting to be revealed. Along with this surge in the collection of digital data, the Open Government Data movement is seeking to ensure much of this data see the light of day.

The Philippines urgently needs this. In 2016, the Philippine government will increase its spending on public infrastructure to P766.5 billion in 2016 – 35 percent more than what was spent in 2015, and four times what was spent in 2010. However, stories about overpriced, irregular, incomplete or substandard road projects worth billions of dollars were not uncommon in the Philippine media. While the Department of Public Works and Highways (DPWH) was once notorious, the past few years have seen its internal project tracking systems significantly strengthened. A report by the Philippine Center for Investigative Journalism in 2012 highlighted that the DPWH had successfully lowered the cost of public works by increasing the number of competing bidders, cancelling deals that did not comply with procurement policy, and reshuffling DPWH personnel.

Yet a major challenge affecting transparency and accountability in all sectors and spheres of government is that it is currently not possible to track individual budget items. The absence of unique project IDs across government data systems means that reconciling different sources of project information requires tedious manual work. It is also challenging to test the comprehensiveness and integrity of different systems. For instance, compliance with procurement disclosure regulations is uneven. These problems need to be addressed urgently.

Reality of Tracking the Budget in the Philippines: Tracking information at project level straddles multiple systems with no unique project ID.

In 2014, the Philippine government introduced a unique project ID: a 54-digit Unified Account Code Structure (UACS). The aim: giving a clearer identity to each peso in the budget. But in the absence of automated and integrated systems, the full code cannot be operationalized. Why? Because it is practically impossible to manually code 54-digits for millions of separate transactions and then maintain ledgers that can be used for analysis.

What to do? Full implementation of a government-wide automated information system will take time. So in the meantime, you need practical ways to link project data recorded in different systems with each other, so that you can follow the money, in the interest of transparency and accountability.

Following the Money

Given the lack of a unique ID, how then do you piece together the life of an individual project? Ideally, this can be achieved if at each step in the cycle, agencies would exactly copy the information that uniquely identifies a project, e.g. project name, amount, location, as it moves from one agency to the next. In other words, they should ‘plagiarize’, but then how can you detect such copying?

To accomplish this, we take a page out of the experience of teachers. For them, plagiarism-detection software is an essential weapon against copy-paste composition. An algorithm is able to detect and identify whether or not a student has copied text from somewhere else. Using the same logic, this method can also be applied to help link project information across different datasets.

OnTrackPH

Seeking a fast-working, cost-effective, and scalable way to track the budget, we collaborated with Manila-based data science consultancy Thinking Machines Data Science to further develop and pilot such an algorithm. OnTrackPH measures the statistical similarity between different texts, adapting techniques that originally found wide application in detecting plagiarism. OnTrackPH applies the same mathematical methods to a whole new problem – measuring the likelihood that several different government documents actually refer to the same project with confidence intervals assigned for each match. OnTrackPH is a proof-of-concept software that can semi-automatically match road projects across disparate databases containing millions of project records spanning multiple years. It uses advanced data science techniques like machine learning and natural language processing.

For instance, OnTrackPH addresses inconsistencies in text formatting, data encoding practices, and naming or spelling conventions that make it nearly impossible for analysts to conduct reliable “Control-F” searches. Databases may contain dozens of ways of referring to the same entity. The Department of Public Works and Highways may be referred to as “DPWH” or “Dept. of Public Works & Highways.” OnTrackPH uses machine-learning algorithms that learn to group or “cluster” these inconsistencies over time – a technique that some technology companies use to identify groups of customers that share the same behavioural patterns. Read our OnTrackPH technical note to learn more about how the tool works and the data science behind it.

[Animation] OnTrackPH: Using Data Science to Connect the Budget

The Pilot Test

For our pilot, OnTrackPH was tasked with searching for 2,238 DPWH-implemented road projects across millions of records across the Philippines budgeting, procurement and implementation systems (see the figure below). These road projects formed part of the P65B investments through the Tourism Road Infrastructure Project and Regular Farm-to-Market Program from 2012-2015.

OnTrackPH: Using Data Science to Connect the Budget, Procurement and Implementation

Using these methods, OnTrackPH could accurately match data in a mere fraction of the man-hours needed to do so manually. For comparison: a team of five research analysts took two months to manually match the government records for 2,238 DPWH-implemented road projects. During its pilot test, OnTrackPH found correct matches for 85 percent of those 2,238 projects in less than 30 minutes, and identified 1,037 potential new matches that the analysts weren’t able to manually identify.

OnTrackPH also highlights compliance to government regulations. For instance, Philippines Republic Act No. 9184 mandates all public work contracts to be posted on PhilGEPS. OnTrackPH provides a tool to identify contracts that cannot be found in PhilGEPS allowing agencies to provide clarifications. It also highlights budget variations at various stages of a projects life cycle – from budgeted to procured, contracted, disbursed, completed and audited.

However, these 2,238 automatically matched and manually verified projects only represent a fraction of the 7 million records in the PhilGEPS and hundreds of thousands of records in the DPWH-ePLC databases. While OnTrackPH eliminates the work of finding the most likely matches for millions of remaining records, planners, auditors and analysts will ultimately still to need verify the correctness of these software-identified matches.

Scaling of OnTrackPH for Government and CSOs

In the coming months, data scientists at Thinking Machines will explore developing OnTrackPH into a tool that government, civil society organizations, journalists, researchers, and the general public can use to monitor not only road projects, but other areas of public spending, including education, health and reconstruction.

All actors can use OnTrackPH to track the budget, project implementation and monitor compliance across key government databases. For instance, the Department of Budget and Management, the Department of Finance and the Commission on Audit can use OnTrackPH to promote compliance through performance-based budgeting and performance audits and move towards assigning unique project IDs to allow projects to be tracked across government systems. And implementing agencies such as DPWH that have established project-tracking systems, can support a progressive dialogue on data disclosure, compliance and practical ways to harmonize systems. In cases where agencies are maintaining manual records for reporting GOP can request data in a structured form and develop strategies for automation for use in OnTrackPH to track and verify compliance.

Fully integrating data systems across government agencies will remain the long-term goal, but citizens demand to be able to follow the money today, not in some distant future. This work shows that there are indeed ways of achieving transparency and accountability immediately, without waiting for government-wide, fully automated information systems.

In conclusion, just as roads empower communities by linking them to one another, innovations like OnTrackPH unlock the power of interrelated data to further transparency and accountability in governance. Payoffs from this work run in various directions. First, the tool provides a more systematic way of assessing how comparable the information across different electronic systems (e.g. budget, procurement and implementation) actually is. Second, creative ways of linking data allows us to interrogate discrepancies (e.g. between procured and contracted costs for a road project), potentially homing in on agencies or projects with implementation risks. Finally, it can help government progressively improve tracking and monitor compliance by focusing on a flagship or demonstration programs. and more transparency and accountability so that citizens can follow their money.

Thinking Machines Data Science is a data consultancy specializing in data science, data strategy, data storytelling, and data engineering. We believe in the unreasonable effectiveness of data. We care deeply about the communities we live in. We work with both public sector and private sector clients, helping them understand and fully leverage their data using machine learning. You can get a sense of what we do by reading our portfolio of projects.

Footnotes

Government Databases: National Budget (GAA 2013-2016) and Procurement (PhilGEPS from January 01, 2010 to April 13, 2015) released via Philippines Open Data (data.gov.ph). DPWH ePLC released on request by DPWH for the period to July 2015.

Program:
a. GAA Regular FMRs 2014 appropriated in the GAA 2014 is 1756 sub-projects as reported by DBM (PhP12,000,000,000). This number is based on geo-tagged data of FMRs as submitted to DBM by the implementing agencies.

b. GAA Regular FMRs 2015 appropriated in the GAA 2015 is 1388 sub-projects as reported and listed in the GAA (PHP6,250,000,000). OnTrackPH analyses 583 (PhP2,309,049,00) of these FMRs for which geo-tagged data is available as submitted to DBM by the implementing agencies.

c. Of 1076 TRIP subprojects in the program database. OnTrackPH analyzed 865 sub-projects as listed in the GAA for the period of 2012-2015 (PhP46B) for matching in PhilGEPS and ePLC.