machinelearning

Researchers areabletocreate fake fingerprints that result in a 20% false-positive rate.

The problem is that these sensors obtain only partial images of users’ fingerprints — at the points where they make contact with the scanner. The paper noted that since partial prints are not as distinctive as complete prints, the chances of one partial print getting matched with another is high.

The artificially generated prints, dubbed DeepMasterPrints by the researchers, capitalize on the aforementioned vulnerability to accurately imitate one in five fingerprints in a database. The database was originally supposed to have only an error rate of one in a thousand.

Another vulnerability exploited by the researchers was the high prevalence of some natural fingerprint features such as loops and whorls, compared to others. With this understanding, the team generated some prints that contain several of these common features. They found that these artificial prints were more likely to match with other prints than would be normally possible.

If this result is robust — and I assume it will be improved upon over the coming years — it will make the current generation of fingerprint readers obsolete as secure biometrics. It also opens a new chapter in the arms race between biometric authentication systems and fake biometrics that can fool them.

More interestingly, I wonder if similar techniques can be brought to bear against other biometrics are well.

James Mickens gave an excellent keynote at the USENIX Security Conference last week, talking about the social aspects of security — racism, sexism, etc. — and the problems with machine learning and the Internet.

Fascinating research de-anonymizing code — from either source code or compiled code:

Rachel Greenstadt, an associate professor of computer science at Drexel University, and Aylin Caliskan, Greenstadt’s former PhD student and now an assistant professor at George Washington University, have found that code, like other forms of stylistic expression, are not anonymous. At the DefCon hacking conference Friday, the pair will present a number of studies they’ve conducted using machine learning techniques to de-anonymize the authors of code samples. Their work could be useful in a plagiarism dispute, for instance, but it also has privacy implications, especially for the thousands of developers who contribute open source code to the world.

A trained eye (or even a not-so-trained one) can discern when something phishy is going on with a domain or subdomain name. There are search tools, such as Censys.io, that allow humans to specifically search through the massive pile of certificate log entries for sites that spoof certain brands or functions common to identity-processing sites. But it’s not something humans can do in real time very well — which is where machine learning steps in.

StreamingPhish and the other tools apply a set of rules against the names within certificate log entries. In StreamingPhish’s case, these rules are the result of guided learning — a corpus of known good and bad domain names is processed and turned into a “classifier,” which (based on my anecdotal experience) can then fairly reliably identify potentially evil websites.