The Wisdom of Ganesh

Sunday, March 18, 2018

I realised today that I have become wise. I'm writing this out of a sense of amusement rather than vanity.

I had a discussion with a client about some changes she wanted made on an application. I realised even as she was speaking what changes I would need to make to the underlying database in order to display the results she wanted on the screen. After that conversation, I sat down to write out the SQL script that would make the required changes to the database.

But I didn't run it.

And at that moment, I realised the difference between knowledge and wisdom.

Knowledge refers to the capacity to understand how to implement something.

Wisdom is the meta-knowledge, born of years of experience, that it's rarely that straightforward.

It's the experience of receiving another email or phone call from the client within a few hours saying, "Hey Ganesh, I just discussed this with the team, and we realised that we need the system to do something more. So when you implement your change, can you please ensure that it also does X?"

It's the experience of looking at a script I wrote earlier and thinking, "Hey, why am I doing it like this? I can do it much more simply in this other way!"

And then I have to change the script I initially wrote, and undo all the changes I had made earlier.

That's really what's changed in me. I knew SQL even 25 years ago, and I could possibly have come up with the same script even back then.

That's knowledge, and I haven't significantly added to that knowledge in so many years.

The difference is that 25 years ago, I would have implemented that change immediately.

Today, I have a voice inside me that says, "Just wait. Save this script in a file and look at it again tomorrow."

Thursday, January 04, 2018

A sting operation by the newspaper The Tribune has reported that with a very nominal payment of 500 rupees, their team was able to get access to an Aadhar portal that was intended for use only by authorised officials responsible for helping citizens retrieve lost or forgotten data.

Aadhar is a relatively new unique identifier for all residents of India

When the news broke, the organisation in charge of India's unique ID database, the Unique Identification Authority of India (UIDAI), played down suggestions that a data breach had taken place. Their main contention was that the biometric data (fingerprints and iris scans) of residents was stored in a secure and encrypted manner, and that it was not exposed through the mechanism used by the Tribune.

Let us analyse what happened, and why it is in fact a big deal.

There are four primary security threats that organisations have to guard against:

Deception (the system being presented with false data and made to accept it)

Disruption (interruption in normal operations and loss of service)

Usurpation (unauthorised persons gaining control of the system)

What the UIDAI is saying is that there has been no Disclosure of authentication tokens (biometric data) that could lead to a future Deception or Usurpation attack. But importantly, they have not denied that a breach has occurred which could have given an unauthorised user access to certain kinds of information. In fact, some of their officials have confirmed this:

Sanjay Jindal, Additional Director-General, UIDAI Regional Centre, Chandigarh, accepting that this was a lapse, told The Tribune: "Except the Director-General and I, no third person in Punjab should have a login access to our official portal. Anyone else having access is illegal, and is a major national security breach."

In other words, an attack could have successfully taken place resulting in the Disclosure of certain kinds of information.

What kind of information? According to the Tribune article, they were able to retrieve the following data on provision of an Aadhar number:

Name

Address

Post code (PIN)

Photo

Phone number

Email address

In fact, this is probably the minimal set of data that is accessible through the portal.

It took just Rs 500, paid through Paytm, and 10 minutes in which an “agent” of the group running the racket created a "gateway" for this correspondent and gave a login ID and password. Lo and behold, you could enter any Aadhaar number in the portal, and instantly get all particulars that an individual may have submitted to the UIDAI (Unique Identification Authority of India), including name, address, postal code (PIN), photo, phone number and email.

Reading between the lines, it doesn't appear that an API (a non-visual means by which a software program could retrieve data) was provided. It looks like login credentials were provided into an administrator's portal, in other words, access to a secure web page.

What is the worst that could happen if such access were granted to a non-authorised user? How much data could they steal? In other words, how severe is the Disclosure exposure?

To a layperson, this may not seem like a huge exposure. How many people's data can a person steal by sitting at a portal and entering Aadhar numbers on a screen? A few hundred, perhaps a couple of thousand records. Unfortunately, a potential data thief is much more efficient.

I'm not a security expert, but even I would potentially be able to steal the entire Aadhar database (i.e., not biometric data but the set of data listed above) in a matter of hours or days, without much personal effort on my part. Hackers may have much more efficient means at their disposal, but I would probably use a web application testing tool like Selenium, which I use for user testing of software as part of my day job. Selenium is built on top of a basic browser "engine", and has a number of programmable features built around it. It can be "trained" by a user to follow a certain sequence of steps, and it can then repeat that sequence of steps ad nauseam, using different data values based on its controlling script. Best of all, Selenium looks just like a regular browser to any web server, so the server would have no idea that an automated tool is logging in and moving around the site, and not a human user.

Assuming I have been granted login credentials into the Aadhar administrator's portal (on payment of the going rate of 500 rupees), I would first put Selenium into "learning mode", where it would record my actions to be replayed later. I would use it just as I would use a regular browser, except that this browser is recording my every action, including the data values I am entering. Based on its observation of my actions, it would then know which URL to navigate to, what username and password to enter on the login form, which menu item to select to get to the search screen, etc. Then it would learn about the repeatable part of the task, which is the entry of an Aadhar number in a certain field, and a click on a Search or Retrieve button. When the system responded with the data of the resident (name, address, etc.), I would stop the learning mode. (In practice, I would train Selenium to deal with invalid Aadhar numbers also, since it should be able to recognise when a query was unsuccessful.)

I would then look into the script that Selenium generated to describe the sequence of actions that it had learnt. I would modify this script to make Selenium repeat its search in a loop.

Aadhar numbers are 12-digit numeric strings, which means they can theoretically be any number in the range "000000000000" to "999999999999", a trillion (1012) numbers in all. I would modify the script to loop through all trillion numbers, perhaps in lots of a million at a time, and I would get it to extract the data from within specific HTML tags on the result screen. If these tags had helpful "id" attributes, it would make my job much easier, otherwise I would have to rely on the relative position of each field's tag within the returned page (known as a DOM search). This is how I would get Selenium to "read" the returned name, address, postcode, phone number, email address, etc.

The last thing I would do within the loop is to record the Aadhar number and all the retrieved fields into a local database on my machine, provided the query returned a value. I expect that only about a billion of the trillion numbers I use in my loop will return valid data, since there are just about a billion Indian residents.

And this is how I would collect the basic personal and contact details of every single resident of India. Unless UIDAI has a throttling or choking mechanism to prevent such a rapid-fire query of data, it's possible that I will be able to get away with this over a couple of days at most.

More sophisticated hackers would hide behind multiple IP addresses, and use multiple sets of user credentials, over a longer time period, so as to fool any auditing system on the UIDAI side from realising that a bulk theft of records was underway.

Since the Aadhar numbers are permanent identifiers for residents, the data in this database is likely to remain useful for decades to whoever steals it. It can form the foundation of a database to which other data can be added.

And that brings us to the even bigger threat that this breach enables.

The Indian government has also committed another cardinal sin from a privacy angle. It has mandated the linking of Aadhar numbers to a number of other important data, for example, bank account numbers and mobile SIM cards. In fact, although the courts have ruled that such linking is to be voluntary, the government, banks and telcos are not endorsing that liberal message at all. Residents are being virtually threatened into providing their Aadhar numbers to their telecom providers and their banks. Aadhar is also being used across various arms and services of the government, such as the Tax department and the Public Distribution System.

Now, the databases of such organisations are generally a lot less secure than the biometric data stored by UIDAI. It is unlikely that banks and telcos are being diligent enough to encrypt their data. Besides, with just a few large banks and telcos operating in the country, it is relatively easy for a malicious organisation, such as the secret service agency of a foreign power, to perform the necessary "social engineering" required to access this data. No conventional "hacking" is even necessary for such simple data theft. The same goes for government departments holding Aadhar-linked data.

Now, if one had a list of bank account numbers with Aadhar numbers against them, it would be a trivial matter to map these to the Aadhar database stolen from UIDAI. One would then have the personal and contact details of every resident of India, -- plus their bank account numbers!

Repeat this for mobile SIM cards, ration cards, PAN cards (tax), etc.

Now one has put together a very lucrative set of data that a number of hostile powers would be very interested in. The cost of acquiring this data, as I have shown, is well within their budgets.

The Tribune's report also claims

Spotting an opportunity to make a quick buck, more than one lakh VLEs (Village-Level Enterprises) are now suspected to have gained this illegal access to UIDAI data to provide “Aadhaar services” to common people for a charge, including the printing of Aadhaar cards. However, in wrong hands, this access could provide an opportunity for gross misuse of the data.

Indeed, if the vulnerability has been around for the last few months, as suspected, it would not be unreasonable to assume that sensitive information on every Indian resident is now sitting in a Big Data lab in more than one foreign country, subject to sophisticated analysis and insight mining. In fact, if organisations like the NSA have not already acquired this data, my opinion of their competence has plummeted.

It's nothing short of a national security disaster.

Still, from the muted press around it, it appears that no one has a realistic handle on how grave the breach is. Critics of the government have seized on The Tribune's initial exposé to claim that Aadhar has been a massive security failure, but without understanding the nuances of what we have seen in this analysis. Supporters of the government have seized on the UIDAI's reassurances to downplay the significance of the breach, again without a sophisticated analysis of the potential exposure. Both sides are being irresponsible.

My conclusion is that a serious breach of data security has been proven by The Tribune. This vulnerability has probably been around for a few months, enough time for a competent organisation or set of individuals to steal the master data (about a billion records), and create a foundation to add more useful data as it is acquired, since the Aadhar number is a permanent identifier that is likely to be associated with every significant product or service that residents may own or use.

A security assessment of the implications is imperative. At the very least, linking of Aadhar numbers to services must be halted.

And it would not be unreasonable to demand that someone somewhere should resign.

Update 24/03/2018: It appears that nothing has been done to fix the Aadhaar breach. If anything, even more breaches have been discovered, as described in this ZDNet article.

Friday, October 13, 2017

India's Electronic Voting Machines (EVMs) have been in the news a lot lately, and not always for the right reasons. There have been complaints by some candidates that when voters pressed the button in their favour, another party's symbol lit up. Faced with accusations that the EVMs may have been hacked (especially with the string of electoral successes of the ruling Bharatiya Janata Party), India's Election Commission has begun to conduct elections with a Voter-Verifiable Paper Audit Trail (VVPAT), which prints out the details of the party/candidate that a voter selected, so they can verify that their vote was registered correctly.

An EVM unit with a printer to provide immediate paper verification to voters

As a systems architect, I'm afraid I have to say that may not be good enough. Let me explain why, and then suggest a more foolproof alternative.

First of all, my familiarity with IT security tells me never to believe that a hardware device has in-built safeguards. We have all heard about how backdoors can be built into hardware, with whispers about the Russian mafia or Chinese government having control of fabrication plants that produce integrated circuits, so we know it's at least theoretically possible for criminal elements to inject malicious logic right into the hardware of an electronic device.

At the same time, I believe it's being Luddite to advocate a return to entirely paper-based ballots. It's true that many Western countries stick to paper ballots for the sheer auditability of a poll (which electronic voting makes opaque), but India has had bad experiences with paper-based polls in the past, with uniquely Indian subversions of the system such as "booth-capturing", as well as more conventional forms of fraud like "ballot-stuffing".

No, there's no going back to purely paper-based ballots, but there are serious vulnerabilities with the electronic voting system, even with VVPAT.

Let me illustrate.

The basic EVM logic is as follows. The voter presses a button corresponding to their preferred party or candidate, and the machine confirms their selection by lighting up the corresponding election symbol (because many voters are illiterate and can't read). The choice is also recorded in the unit's memory. After the polls close, all the voting units are collected and connected to a central unit that tallies the votes in each. Once all units have uploaded their votes to the central unit, the results of that election can be announced, with the tallies of all parties and candidates available.

Now, based on voter complaints about the wrong symbol lighting up, here's what many people suspect happened. Somehow (never mind how) a hack was introduced into some of the units that recorded a selection of party A as a selection of party B.

This is actually a pretty amateurish hack, as I'll explain shortly. It's readily detectable by an alert voter. What the Election Commission is attempting to do with the Voter-Verifiable Paper Audit Trail (VVPAT) is to make the voter's selection more explicit, in the hope that more of them will be forced to verify that their choice was correctly recorded. It does not make the system more secure in the sense of being able to trap more subtle hacks.

Here's the schematic of the basic logic when things go well.

(Click to expand)

When faced with a simple hack like the suspected one above, the system will respond as below.

(Click to expand)

However, any hacker with a little more smarts will realise that their subversion will have to be less readily detectable. In other words, the hack would have to be placed in a slightly different place.

(Click to expand)

With this kind of hack, any mischief would be virtually undetectable. Both the lighted symbol and the paper printout would confirm to the voter that their choice was faithfully recorded, yet their vote would have been subtly hijacked in favour of another party.

The logic of the hack could be designed to be extremely subtle indeed. Instead of switching every single vote from party A to party B, it could be designed to apply a random function so that, on average, only 1 in N votes was switched across. In many marginal constituencies, even a small skimming of votes would be enough to tip the balance, so desired results could be achieved without any suspiciously large vote swings. There could even be a threshold below which the logic would not start to kick in, say a few thousand votes. That way, if the Election Commission conducted a few test runs to ensure that a unit was working correctly, it would not arouse suspicions.

Now all this seems depressing. Is there any way to combat this?

Yes, there is, but it's not purely in hardware and software. If it were, this post would have been titled "Designing A Tamper-Proof Electronic Voting Machine". The system that we design needs to incorporate electronic and manual elements.

What we need are not one but two printouts for every vote. One copy is for the voter's own records. The other is for the Election Commission. The voter must verify that both match their selection, then place the EC copy into a ballot box before they leave the booth, just like in a paper-based poll. However, this paper ballot will only be used for verification, not for the actual vote tally on counting day, otherwise we may as well go back to a purely manual vote count.

(Click to expand)

A number of statistical techniques may be used to sample and test the performance of voting machine units in various constituencies.

Under the most pessimistic scenario, the ballot boxes of every single booth will be tallied offline, and the counting may continue for weeks after the official results. Elections will only be rescinded if the manual tally grossly contradicts the electronic one (there will always be minor discrepancies due to voter or official error).

Under less pessimistic scenarios, a random sample of booths may be chosen for such manual verification. If gross discrepancies are detected in any booth, then all of the ballot boxes in that constituency will have to be manually tallied. If more than a certain number of constituencies show suspicious results, then the tally may be expanded to cover an entire state, and so on.

There can be further refinements, such as ensuring that the random sample of booths to be verified is drawn publicly, after the voting is completed, so as to afford no opportunity for malicious elements to know in advance which booths are "safe" from being audited.

In general, the design of the overall process is meant to detect subversions after the fact, so the technically accurate term is tamper-evident rather than tamper-proof. However, advertising the fact that such audits will be taking place may deter malicious elements from attempting these hacks in the first place. Hence, in a larger sense, the system consisting of the combined electronic and manual process, plus a widespread foreknowledge of an inevitable audit, may result in a tamper-proof system after all.

Democracy works because citizens have faith that their will is reflected in the results of elections. If citizens lose faith in the electoral process, it could cause a breakdown in society, with violent revolution in the worst case. That's why it's important to act quickly to restore faith in the process, even if this makes the process costlier.

As the quote commonly attributed to Thomas Jefferson goes, "Eternal vigilance is the price of freedom."

I'll admit it feels somewhat anticlimactic for me to see these books finally published, because I finished writing them in December 2013 after about two years of intermittent work. They have been available as white papers on Slideshare since Christmas 2013. The last seven months have gone by in reviews, revisions and the various other necessary steps in the publication process. And they have made their appearance on InfoQ's site with scarcely a splash. Is that all?, I feel like asking myself. But I guess I shouldn't feel blasé. These two books are a major personal achievement for me and represent a significant milestone for the industry, and I say this entirely without vanity.

You see, the IT industry has been misled for over 15 years by a distorted and heavyweight philosophy that has gone by the name "Service-Oriented Architecture" (SOA). It has cost organisations billions of dollars of unnecessary spend, and has fallen far short of the benefits that it promised. I too fell victim to the hype around SOA in its early days, and like many other converted faithful, tried hard to practise my new religion. Finally, like many others who turned apostate, I grew disillusioned with the lies, and what disillusioned me the most was the heavyhandedness of the "Church of SOA", a ponderous cathedral of orthodox practice that promised salvation, yet delivered nothing but daily guilt.

But unlike others who turned atheist and denounced SOA itself, I realised that I had to found a new church. Because I realised that there was a divine truth to SOA after all. It was just not to be found in the anointed bible of the SOA church, for that was a cynical document designed to suit the greed of the cardinals of the church rather than the needs of the millions of churchgoers. The actual truth was much, much simpler. It was not easy, because "simple" and "easy" are not the same thing. (If you find this hard to understand, think about the simple principle "Don't tell lies", and tell me whether it is easy to follow.)

I stumbled upon this simple truth through a series of learnings. I thought I had hit upon it when I wrote my white paper "Practical SOA for the Solution Architect" under the aegis of WSO2. But later, I realised there was more. The WSO2 white paper identified three core components at the technology layer. It also recognised that there was something above the technology layer that had to be considered during design. What was that something? Apart from a recognition of the importance of data, the paper did not manage to pierce the veil.

The remaining pieces of the puzzle fell into place as I began to consider the notion of dependencies as a common principle across the technology and data layers. The more I thought about dependencies, the more things started to make sense at layers even above data, and the more logical design at all these layers followed from requirements and constraints.

In parallel, there was another train of thought to which I once again owe a debt of gratitude to WSO2. While I was employed with the company, I was asked to write another white paper on SOA governance. A lot of the material I got from company sources hewed to the established industry line on SOA governance, but as with SOA design, the accepted industry notion of SOA governance made me deeply uncomfortable. Fortunately, I'm not the kind to suppress my misgivings to please my paymasters, and so at some point, I had to tell them that my own views on SOA governance were very different. To WSO2's credit, they encouraged me to write up my thoughts without the pressure to conform to any expected models. And although the end result was something so alien to establishment thought that they could not endorse it as a company, they made no criticism.

So at the end of 2011, I found myself with two related but half-baked notions of SOA design and SOA governance, and as 2012 wore on, my thoughts began to crystallise. The notion of dependencies, I saw, played a central role in every formulation. The concept of dependencies also suggested how analysis, design, governance and management had to be approached. It had a clear, compelling logic.

I followed my instincts and resisted all temptation to cut corners. Gradually, the model of "Dependency-Oriented Thinking" began to take shape. I conducted a workshop where I presented the model to some practising architects, and received heartening validation and encouragement. The gradual evolution of the model mainly came about through my own ruminations upon past experiences, but I also received significant help from a few friends. Sushil Gajwani and Ravish Juneja are two personal friends who gave me examples from their own (non-IT) experience. These examples confirmed to me that dependencies underpin every interaction in the world. Another friend and colleague, Awadhesh Kumar, provided an input that elegantly closed a gaping hole in my model of the application layer. He pointed out that grouping operations according to shared interface data models and according to shared internal data models would lead to services and to products, respectively. Kalyan Kumar, another friend who attended one of my workshops, suggested that I split my governance whitepaper into two to address the needs of two different audiences - designers and managers.

And so, sometime in 2013, the model crystallised. All I then had to do was write it down. On December 24th, I completed the two whitepapers and uploaded them to Slideshare. There has been a steady trickle of downloads since then, but it was only after their publication by InfoQ that the documents have gained more visibility.

These are not timid, establishment-aligned documents. They are audacious and iconoclastic. I believe the IT industry has been badly misled by a wrongheaded notion of SOA, and that I have discovered (or re-discovered, if you will) the core principle that makes SOA practice dazzlingly simple and blindingly obvious. I have not just criticised an existing model. I have been constructive in proposing an alternative - a model that I have developed rigorously from first principles, validated against my decades of experience, and delineated in painstaking detail. This is not an edifice that can be lightly dismissed. Again, these are not statements of vanity, just honest conviction.

I believe that if an organisation adopts the method of "Dependency-Oriented Thinking" that I have laid out in these two books (after testing the concepts and being satisfied that they are sound), then it will obtain the many benefits of SOA that have been promised for years - business agility, sustainably lower operating costs, and reduced operational risk.

It takes an arc of enormous radius to turn around a gigantic oil tanker cruising at top speed, and I have no illusions about the time it will take to bring the industry around to my way of thinking. It may be 5-10 years before the industry adopts Dependency-Oriented Thinking as a matter of course, but I'm confident it will happen. This is an idea whose time has come.

Thursday, June 19, 2014

I've always held that Free and Open Source Software (FOSS) is one of the best aspects of the modern IT landscape. But like all software, FOSS needs constant effort to keep up to date, and this effort costs money. A variety of funding models have sprung up, where for-profit companies try to sell a variety of peripheral services while keeping software free.

However, one of the most obvious ways to fund the development of FOSS is government funding. Government funding is public money, and if it isn't used to fund the development of software that is freely available to the public but spent on proprietary software instead, then it's an unjustifiable waste of taxpayers' money.

It was therefore good to read that the Dutch government recently paid to develop better support for the WS-ReliableMessaging standard in the popular Open Source Apache CXF services framework. I was also gratified to read that the developer who was commissioned to make these improvements was Dennis Sosnoski, with whom I have been acquainted for many years, thanks mainly to his work on the JiBX framework for mapping Java to XML and vice-versa. It's good to know that talented developers can earn a decent dime while doing what they love and contributing to the world, all at the same time.

Monday, June 09, 2014

I was trying to get PostgreSQL's "pgagent" process (written to run as a daemon) to run on startup like other Linux services, and came upon this nice visual (i.e., curses) tool to manage services.

It's called "sysv-rc-conf" (install with "sudo apt-get install sysv-rc-conf"), and when run with "sudo sysv-rc-conf", brings up a screen like this:

It's not really "graphics", but to a command-line user, this is as graphical as it gets

All services listed in /etc/init.d appear in this table. The columns are different Unix runlevels. Most regular services need to be running in runlevels 2, 3, 4 and 5, and stopped in the others. Simply move the cursor to the desired cells and press Tab to toggle it on or off. The 'K' (stop) and 'S' (start) symbolic links are automatically written into the respective rc.d directories. Press 'q' to quit the tool and satisfy yourself that the symbolic links are all correctly set up.

You can manually start and stop as usual:

/etc/init.d$ sudo ./myservice start

/etc/init.d$ sudo ./myservice stop

Plus, your service will be automatically started and stopped when the system enters the appropriate runlevels.

Saturday, April 05, 2014

Although a big fan of Ubuntu Linux as a desktop OS, I've never been interested in their cloud storage platform Ubuntu One, and found it a bit of a nuisance when asked to sign up for it every time I installed the OS.

The linked article talks about mobile, and how new mobiles such as the Ubuntu-powered ones need cloud storage to succeed. If so, isn't it really bad timing for Canonical to walk away from a fully operational cloud platform just when its mobile devices are entering the market?

Ubuntu-powered smartphones(Do you know what the time on the middle phone refers to?)

I think it's about economics.

Ubuntu's statement says:

If we offer a service, we want it to compete on a global scale, and for Ubuntu One to continue to do that would require more investment than we are willing to make. We choose instead to invest in making the absolute best, open platform and to highlight the best of our partners’ services and content.

Hmm. I read this as Canonical trying to build a partner ecosystem that will substitute for having a big cloud-and-mobile story like Google does, without the investment that such a proprietary ecosystem will require. Let's see if they succeed.

The other side-story in the linked article is about telcos and their role. Having worked at a telco over the last two years, I can confirm that the major fear in the telco industry is being reduced to commodity carriers by "over the top" services. The telcos are fighting to offer content, and will want willing mobile wannabe partners like Mozilla and Canonical to offer smartphone platforms that will work with networking infrastructure and make the telcos more attractive (through content that both players source from content providers). It will be interesting to see how this four-way, federated partnership (between multiple telcos, independent smartphone platform vendors like Mozilla and Canonical, smartphone device OEMs and content providers) will play out. Many of these companies will think of themselves as the centre of the Universe and the others as partners.

"Nothing runs like a fox" - Well, let's see if the Firefox Smartphone has legs

In the meantime, some good news for startup cloud providers ("startup" only with respect to the cloud, since they will still need deep pockets to set up the infrastructure!): Canonical is open-sourcing its Ubuntu One storage code “to give others an opportunity to build on this code to create an open source file syncing platform.” This should be interesting.

"To Analyse, Understand and Explain"

This site presents the technology-related opinions (wise or otherwise) of Ganesh Prasad - software architect, Java devotee and Open Source aficionado. (For my views on other topics, see this blog instead.)Disclaimer: Though I bear the name of the Hindu god of wisdom, Lord Ganesh, such cosmic wisdom is not always guaranteed to be transmitted through my writings. Reader beware!