Caution: One resource in the top ten (#9) for “fuzzing software” is: Fuzzing: Brute Force Vulnerability Discovery, by Michael Sutton, Adam Greene, and Pedram Amini. Great historical reference but it was published in 2007, some ten years ago. Look for more recent literature and software.

Fuzzing is obviously an important topic in finding subjects (read vulnerabilities) in software. Whether your intent is to fix those vulnerabilities or use them for your own purposes.

While reading Vranken‘s post, it occurred to me that “fuzzing” is also useful in discovering subjects in unmapped data sets.

Not all nine-digit numbers are Social Security Numbers but if you find a column of such numbers, along with what you think are street addresses and zip codes, it would not be a bad guess. Of course, if it is a 16-digit number, a criminal opportunity may be knocking at your door. (credit card)

While TMDM topic maps emphasized the use of URIs for subject identifiers, we all know that subject identifications outside of topic maps are more complex than string matching and far messier.

How would you create “fuzzy” searches to detect subjects across different data sets? Are there general principles for classes of subjects?

While your results might be presented as a curated topic map, the grist for that map would originate in the messy details of diverse information.

This sounds like an empirical question to me, especially since most search engines offer API access.

Five months ago, we announcedOSS-Fuzz, Google’s effort to help make open source software more secure and stable. Since then, our robot army has been working hard at fuzzing, processing 10 trillion test inputs a day. Thanks to the efforts of the open source community who have integrated a total of 47 projects, we’ve found over 1,000 bugs (264 of which are potential security vulnerabilities).

[graphic omitted]

Notable results

OSS-Fuzz has found numerous security vulnerabilities in several critical open source projects: 10 in FreeType2, 17 in FFmpeg, 33 in LibreOffice, 8 in SQLite 3, 10 in GnuTLS, 25 in PCRE2, 9 in gRPC, and 7 in Wireshark, etc. We’ve also had at least one bug collision with another independent security researcher (CVE-2017-2801). (Some of the bugs are still view restricted so links may show smaller numbers.)
…

A useful way to improve the quality of software and its security. Not only that, but rewards are offered for projects that adopt the ideal integration guidelines.

Contributing to open source projects, here by contributing to the use of fuzzing in the development process, is a far cry from the labor market damaging “Hack the Air Force” program. The US Air Force can and does spend $millions if not $billions on insecure software and services.

Realizing it has endangered itself, but unwilling to either contract for better services and/or to hold its present contractors responsible for shabby work, the Air Force is attempting to damage the labor market for defensive cybersecurity services by soliciting free work. Or nearly so given the ratio of the prizes to Air Force spending on software.

$Millions in contributions to open source projects, not a single dime for poorly managed government IT contract results.

The Internet has become an unlimited resource of knowledge, and is thus widely used in many applications. Web mining plays an important role in discovering such knowledge. This mining can be roughly divided into three categories, including Web usage mining, Web content mining, and Web structure mining. Data and knowledge on the Web may, however, consist of imprecise, incomplete, and uncertain data. Because fuzzy-set theory is often used to handle such data, several fuzzy Web-mining techniques have been proposed to reveal fuzzy and linguistic knowledge. This paper reviews these techniques according to the three Web-mining categories above—fuzzy Web usage mining, fuzzy Web content mining, and fuzzy Web structure mining. Some representative approaches in each category are introduced and compared.

Written to cover fuzzy web mining but generally useful for data mining and organization as well.

Fuzzy techniques are probably closer to our mental processes than the precision of description logic.

Being mindful that mathematical and logical proofs are justifications for conclusions we already hold.

The CERT Basic Fuzzing Framework (BFF) is a software testing tool that finds defects in applications that run on the Linux and Mac OS X platforms. BFF performs mutational fuzzing on software that consumes file input. (Mutational fuzzing is the act of taking well-formed input data and corrupting it in various ways, looking for cases that cause crashes.) The BFF automatically collects test cases that cause software to crash in unique ways, as well as debugging information associated with the crashes. The goal of BFF is to minimize the effort required for software vendors and security researchers to efficiently discover and analyze security vulnerabilities found via fuzzing.

Traditionally fuzzing has been very effective at finding security vulnerabilities, but because of its inherently stochastic nature results can be highly dependent on the initial configuration of the fuzzing system. BFF applies machine learning and evolutionary computing techniques to minimize the amount of manual configuration required to initiate and complete an effective fuzzing campaign. BFF adjusts its configuration parameters based on what it finds (or does not find) over the course of a fuzzing campaign. By doing so it can dramatically increase both the efficacy and efficiency of the campaign. As a result, expert knowledge is not required to configure an effective fuzz campaign, and novices and experts alike can start finding and analyzing vulnerabilities very quickly.

Topic maps would be useful for mapping vulnerabilities across networks by application/OS and other uses.