A MILITARY HELICOPTER was on the ground when Russell Guy arrived at the helipad near Tallinn, Estonia, with a briefcase filled with $250,000 in cash. The place made him uncomfortable. It didn’t look like a military base, not exactly, but there were men who looked like soldiers standing around. With guns.

The year was 1989. The Soviet Union was falling apart, and some of its military officers were busy selling off the pieces. By the time Guy arrived at the helipad, most of the goods had already been off-loaded from the chopper and spirited away. The crates he’d come for were all that was left. As he pried the lid off one to inspect the goods, he got a powerful whiff of pine. It was a box inside a box, and the space in between was packed with juniper needles. Guy figured the guys who packed it were used to handling cargo that had to get past drug-sniffing dogs, but it wasn’t drugs he was there for.

Inside the crates were maps, thousands of them. In the top right corner of each one, printed in red, was the Russian word секрет. Secret.

The maps were part of one of the most ambitious cartographic enterprises ever undertaken. During the Cold War, the Soviet military mapped the entire world, parts of it down to the level of individual buildings. The Soviet maps of US and European cities have details that aren’t on domestic maps made around the same time, things like the precise width of roads, the load-bearing capacity of bridges, and the types of factories. They’re the kinds of things that would come in handy if you’re planning a tank invasion. Or an occupation. Things that would be virtually impossible to find out without eyes on the ground.

Given the technology of the time, the Soviet maps are incredibly accurate. Even today, the US State Department uses them (among other sources) to place international boundary lines on official government maps.
…

If you like stories of the intrigue of the Cold War and of maps, Greg’s post was made for you.

The maps have been rarely studied but one person is trying to change that:

But one unlikely scholar, a retired British software developer named John Davies, has been working to change that. For the past 10 years he’s been investigating the Soviet maps, especially the ones of British and American cities. He’s had some help, from a military map librarian, a retired surgeon, and a young geographer, all of whom discovered the maps independently. They’ve been trying to piece together how they were made and how, exactly, they were intended to be used. The maps are still a taboo topic in Russia today, so it’s impossible to know for sure, but what they’re finding suggests that the Soviet military maps were far more than an invasion plan. Rather, they were a framework for organizing much of what the Soviets knew about the world, almost like a mashup of Google Maps and Wikipedia, built from paper.

I don’t know any more about Soviet maps that you can gain from reading this article but the line:

they were a framework for organizing much of what the Soviets knew about the world, almost like a mashup of Google Maps and Wikipedia, built from paper.

Has some of the qualities that I associate with topic maps. Granting it chooses a geographic frame of reference but every map has some frame of reference, stated or unstated.

It would make a great paper on topic maps to represent the knowledge of an old-style Soviet map as a topic map.

Lambert goes on a search for tools that come close to this presentation and also meet the requirements set forth above.

The idea of combining graphs with narrative snippets as links is a deeply intriguing one. Rightly or wrongly I think of it as illustrated narrative but without the usual separation between those two elements.

I commend this review of the history of backdoors to anyone interested in the technical issues and why “exceptional access” isn’t a workable idea.

I do not recommend it if you are debating pro-government access advocates because they are wrapped in an armor of invincible ignorance. You are wasting your time offering them contrary facts and opinions of genuine experts.

Your time is better spent documenting all the lies told by government officials in a policy area. When debating “exceptional access” point out the government and its minions are completely unworthy of belief.

Humor is a far better weapon than facts because the government advocates think of themselves as serious people, entrusted with defending civilization itself. Rather than the 24×7 news cycles solemnly intoning their latest positions, there should be a Punch and Judy show that reports their positions.

And mocks their positions as well. Few things are sharper than well crafted humor. Especially when set to music so everytime a dunce proposal like “Clipper” comes up, you think of:

A massive 91% of successful data breaches at companies started with a social engineering and spear-phishing attack. A phishing attack usually involves an e-mail that manipulates a victim to click on a malicious link that could then expose the victim’s computer to a malicious payload.

…

Phish your Employees!

Yes, you heard me right… by this I mean that you should run a mock phishing campaign in your organization and find out which employees would easily fall victim to the phishing emails. Then step everyone through Internet Security Awareness Training.

Great idea but we can do better than that!

Phish your job applicants!

You can rank your current applicants by their vulnerability to phishing and in the long term, develop a phishing scale for all applicants.

Those that fail, you don’t call for an interview.

Any more than you would install a doorway into your corporate offices without a door.

Has anyone proposed a phishing rating service? Like a credit rating but it rates how likely you are to be a victim of phishing?

PS: I know your CEO and his buddies will fail the same test but the trick is to catch them before they become CE0s, etc.

With the free account, you can only see the first fifty (50) results for a search.

I’m not sure I agree that the pricing is “simple” but it is attractive. Note the difference between query credits and scan credits. The first applies to searches of the Shodan database and the second applies to networks you have targeted.

The 20K+ routers w/ default info could be a real hoot!

You know, this might be a cost effective alternative for the lower level NSA folks.

Due to recent data breaches, users want to replace the cyber vulnerability known as Windows XP. If Directory Comey could make a deal with North Korea, he could secure distribution and labeling rights to North Korea’s Red Star Linux operating system.

The operating system, developed from 2002 as a replacement for Windows XP, was relaunched with a Mac-like interface in 2013’s version three. The newest version emerged in January 2015.

Grunow says files including Microsoft Word documents and JPEG images connected to but not necessarily executed in Red Star will have a tag introduced into its code that includes a number based on hardware serial numbers.
…

It’s not a perfect solution for the problems faced by Director Comey because it doesn’t track files created using OpenOffice.

Few things in life are as private as our romantic entanglements. So with hackers announcing they’ve made off with as many as 37 million records from the parent company of extramarital dating site AshleyMadison.com, you can be sure there are plenty of people sweating over the potential fallout.

The hackers, “The Impact Team,” have demanded the extramarital site shut down or all the data will be released. That’s an odd use of pirated data.

A better strategy would be to complete the profiles to discover spouses and sell the resulting list to divorce lawyers. The profiles of the offending partners would be extra.

There are any number of countries want to help the United States police its loose morals. They could legalize the sales of the data (not its acquisition). You would not even have to launder the money.

The other upside would be a giant lesson to many users in protecting their own privacy.

Every organisational feature – including silos – is an outcome of some kind of optimisation. By talking about trying to destroy silos, we’re denying the sometimes very rational reasons behind their creation.

While working on a presentation for Balisage 2015, it occurred to me to ask: Why Are Data Silos Opaque?

A popular search engine reports that sans duplicates, there were three hundred and thirty-three (333) “hits” on “data silo” that were updated in the last year. Far more reports than I want to list or that you want to read.

The common theme, of course, is the difficulty of accessing data silos.

OK, I’ll bite, why are data silos opaque?

Surely if our respective data silos are based on relational database technology, even with NoSQL, still a likely bet, don’t our programmers know about JDBC drivers? Doesn’t connecting to the data silo solve the problem?

Can we assume that data silos are not opaque due to accessibility? That is drivers exist for accessing data stores, modulo the necessity for system security. Yes?

Data silos aren’t opaque to the users who use them or the DBAs who maintain them. So opacity isn’t something inherent in the data silo itself because we know of people who successfully use what we call a data silo.

What do you think makes data silos opaque?

If we knew where the problem comes from, it might be possible to discuss solutions.

Whether you are tracking the latest outrageous statements from the Repubicans for U.S. President Clown Car or have more serious mapping purposes in mind, you need to take a look at MapFig. There are plugins from WordPress, Drupal, Joomla, and Omeka, along with a host of useful features.

There is one feature in particular I want to call to your attention: “Create highly customized leaflet maps quickly and easily.”

I stumbled over that sentence because I have never encountered “leaflet” maps before. Street, terrain, weather, historical, geological, archaeological, astronomical, etc., but no “leaflet” maps. Do they mean a format size? As in a leaflet for distribution? Seems unlikely because it is delivered electronically.

FAQ was no help. No hits at all.

Of course, you are laughing at this point because you know that “Leaflet” (note the uppercase “L”) is a JavaScript library developed by Vladimir Agafonkin.

So a “leaflet map” is one created using the Leftlet Javascript Library.

Nowadays most free-text searching is based on Lucene-like approaches, where the search text is parsed into its various components. For every keyword a lookup is done to see where it occurs. When looking for a couple of keywords this approach is great. But what about it if you are not looking for just a couple of keywords, but a 100,000 of them? Like, for example, checking against a dictionary?

This is where the Aho-Corasick algorithm shines. Instead of chopping up the search text, it uses all the keywords to build up a construct called a Trie. There are three crucial components to Aho-Corasick:

goto

fail

output

Every character encountered is presented to a state object within the goto structure. If there is a matching state, that will be elevated to the new current state.

However, if there is no matching state, the algorithm will signal a fail and fall back to states with less depth (ie, a match less long) and proceed from there, until it found a matching state, or it has reached the root state.

Whenever a state is reached that matches an entire keyword, it is emitted to an output set which can be read after the entire scan has completed.

The beauty of the algorithm is that it is O(n). No matter how many keywords you have, or how big the search text is, the performance will decline in a linear way.

Some examples you could use the Aho-Corasick algorithm for:

looking for certain words in texts in order to URL link or emphasize them

Apologies for the “lite” posting over the past several days. I have just completed a topic maps paper for the Balisage conference in collaboration with Sam Hunting. My suggestion is that you register before all the seats are gone.

Specifically, we found nearly 3,000 critical and high-risk vulnerabilities in hundreds of publicly accessible computers operated by these three Bureaus. If exploited, these vulnerabilities would allow a remote attacker to take control of publicly accessible computers or render them unavailable. More troubling, we found that a remote attacker could then use a compromised computer to attack the Department’s internal or non-public computer networks. The Department’s internal networks host computer systems that support mission-critical operations and contain highly sensitive data. A successful cyber attack against these internal computer networks could severely degrade or even cripple the Department’s operations, and could also result in the loss of sensitive data. These deficiencies occurred because the Department did not: 1) effectively monitor its publicly accessible systems to ensure they were free of vulnerabilities, or 2) isolate its publicly accessible systems from its internal computer networks to limit the potential adverse effects of a successful cyber attack.

It is hard to imagine anyone needing a vulnerability list in order to crack into the Interior Department. Rather than sanitize its reports, the Inspector General should publish a vulnerability by vulnerability listing. Years of concealing that type of information hasn’t improved the behavior of the Interior Department.

Time to see what charging upper management with criminal negligence can do after data breaches.

The title is from Mission Impossible, which is this case should be renamed: Mission Possible.

Mapping the Medieval Countryside is a major research project dedicated to creating a digital edition of the medieval English inquisitions post mortem (IPMs) from c. 1236 to 1509.

IPMs recorded the lands held at their deaths by tenants of the crown. They comprise the most extensive and important body of source material for landholding in medieval England. Describing the lands held by thousands of families, from nobles to peasants, they are a key source for the history of almost every settlement in England and many in Wales.

This digital edition is the most authoritative available. It is based on printed calendars of the IPMs but incorporates numerous corrections and additions: in particular, the names of some 48,000 jurors are newly included.

The site is currently in beta phase: it includes IPMs from 1418-1447 only, and aspects of the markup and indexing are still incomplete. An update later this year will make further material available.

One of the more fascinating aspects of the project is the list of eighty-nine (89) place types, which can be used for filtering. Just scanning the list I happened across “rape” as a place type, with four (4) instances recorded thus far.

The term “rape” in this context refers to a subdivision of the county of Sussex in England. The origin of this division is unknown but it pre-dates the Norman Conquest.

The “rapes of Sussex” and the eighty-eight (88) other place types are a great opportunity to explore place distinctions that may or may not be noticed today.

The Document Translator app and the associated source code demonstrate how Microsoft Translator can be integrated into enterprise and business workflows. The app allows you to rapidly translate documents, individually or in batches, with full fidelity—keeping formatting such as headers and fonts intact, and allowing you to continue editing if necessary. Using the Document Translator code and documentation, developers can learn how to incorporate the functionality of the Microsoft Translator cloud service into a custom workflow, or add extensions and modifications to the batch translation app experience. Document Translator is a showcase for use of the Microsoft Translator API to increase productivity in a multi-language environment, released as an open source project on GitHub.

Whether you are writing in Word, pulling together the latest numbers into Excel, or creating presentations in PowerPoint, documents are at the center of many of your everyday activities. When your team speaks multiple languages, quick and efficient translation is essential to your organization’s communication and productivity. Microsoft Translator already brings the speed and efficiency of automatic translation to Office, Yammer, as well as a number of other apps, websites and workflows. Document Translator uses the power of the Translator API to accelerate the translation of large numbers of Word, PDF*, PowerPoint, or Excel documents into all the languages supported by Microsoft Translator.
…

Typing the “related” link to say how they are related would be a step in the right direction. Apparently there is an organization with the title: “‘Sdim Curo Plant!” (other sources report Welsh for “Children are Unbeatable”.) Which turns out to be the preferred label.

The entire survey is worth a look but the Key Takeaways, are real treasures:

Organizations still prioritize protection over detection and response, despite the fact that protection is fundamentally incapable of stopping today’s greatest cyber threats.

The biggest weakness of surveyed organizations is the ability to measure, assess, and mitigate cybersecurity risk, which makes it difficult or impossible to prioritize security activity and investment.

It is nice to have RSA confirm my adding cybersecurity protection graphic:

Software, including security software, is so broken that even attempting add on security is worthless.

That doesn’t mean better software practices should not be developed but in the meantime, you are better off monitoring and responding to specific threats.

I don’t know of anyone who would disagree that being unable to “measure, assess, and mitigate cybersecurity risk,” makes setting security priorities impossible.

What consumers generally do not know is that they are shielded from liability for unauthorized transactions made with their credit cards via the combination of federal law issuer/card network policy. As a result, financial institutions and merchants assume responsibility for most of the money lost as a result of fraud. For example, card issuers bore a 63% share of fraudulent losses in 2012 and merchants assumed the other 37% of liability, according to the Nilson Report, August 2013.

With credit card fraud at $11.2 billion in 2012, you would think card issuers and merchants would have plenty of incentive for reducing this loss.

Simple steps, like requiring a second form of identification, a slight delay as the transaction goes through fraud prevention, etc., could make a world of difference. But, they would also impact the convenience of using credit cards.

Do you care to guess what strategy credit card issuers chose? Credit card holders are extolled to prevent credit card fraud, which has no impact on them in most events.

Does that offer a clue to the reason for the lack of proper preparation for cybersecurity?

Yes, breaches occur, yes, we sustain losses, yes, those losses are regrettable, but, we have no ROI measure for an investment in effective cybersecurity.

Unless and until there are financial incentives and an ROI to be associated with cybersecurity, it is unlikely we will see significant progress on that front.

There are many resources to help you build Clojure applications. Most however use trivial examples that rarely span more than one project. What if you need to build a big clojure application comprising many projects? Over the three years that we’ve been using Clojure at WalmartLabs, we’ve had to figure this stuff out. In this session, I’ll discuss some of the challenges we’ve faced scaling our team and code base as well as our experience using Clojure in the enterprise.

I first saw this mentioned by Marc Phillips in a post titled: Walmart Runs Clojure At Scale. Marc mentions a tweet from Anthony Marcar that reads:

Our Clojure system just handled its first Walmart black Friday and came out without a scratch.

“Black Friday,” is the Friday after the Thanksgiving holiday in the United States. Since 2005, it has been the busiest shopping day of the year and in 2014, $50.9 billion was spend on that one day. (Yes, billions with a “b.”)

Great narrative of issues encountered as this system was built to scale.

The more impressive of the two is the ProxyGambit, a $235 device that allows people to access an Internet connection from anywhere in the world without revealing their true location or IP address. One-upping the ProxyHam, its radio link can offer a range of up to six miles, more than double the 2.5 miles of the ProxyHam. More significantly, it can use a reverse-tunneled GSM bridge that connects to the Internet and exits through a wireless network anywhere in the world, a capability that provides even greater range.
…

A bit pricey and 2.5 miles doesn’t sound like a lot to me.

Using Charter Communications as my cable provider, my location is shown by router to be twenty (20) miles from my physical location. Which makes for odd results when sites try to show a store “nearest to” my physical location.

Of course, Charter knows the actual service address and I have no illusions about my cable provider throwing themselves on a grenade to save me. Or a national security letter.

With a little investigation you can get distance from your physical location for free in some instances, bearing in mind that if anyone knows where you are, physically, then you can be found.

Think of security as a continuum that runs from being broadcast live at a public event to lesser degrees of openness. The question always is how much privacy is useful to you at what cost?

A unique database of more than 300 investigative journalism reports from across Latin America is now available from The Institute for Press and Society (Instituto Prensa y Sociedad, or IPYS). Called BIPYS (Banco de Investigaciones Periodísticas, or Bank of Investigative Journalism) the UNESCO-backed initiative was announced July 6 at the annual conference of Abraji, Brazil’s investigative journalism association.

BIPYS is a repository of many of the best examples of investigative journalism in the region, comprised largely of winners of the annual Latin American Investigative Journalism Awards that IPYS and Transparency International have given out for the past 13 years.

See Gabriela’s post for more but in summary the site is still under development and fees being discussed.

An admirable effort considering that words in Latin American can and do have real consequences.

Unlike some places where disagreement can be quite heated but when the broadcast ends, the participants slip away for drinks together. Meanwhile, the subjects of their disagreement continue to struggle and die due to policy decisions made far, far away.

Less than 5% of nearly 220,000 individual requests made to Google to selectively remove links to online information concern criminals, politicians and high-profile public figures, the Guardian has learned, with more than 95% of requests coming from everyday members of the public.

The Guardian has discovered new data hidden in source code on Google’s own transparency report that indicates the scale and flavour of the types of requests being dealt with by Google – information it has always refused to make public. The data covers more than three-quarters of all requests to date.

Previously, more emphasis has been placed on selective information concerning the more sensational examples of so-called right to be forgotten requests released by Google and reported by some of the media, which have largely ignored the majority of requests made by citizens concerned with protecting their personal privacy.
…

It is a true data leak but not nearly as exciting as it sounds. If you follow the Explore the data link, you will find a link to “snapshots on WayBack Machine” that will provide access to the data now scrubbed from Google transparency reports. Starting about three months ago the data simply disappeared from the transparency reports.

Here is an example from the February 4th report as saved by the WayBack Machine:

Dr Paul Bernal, lecturer in technology and media law at the UEA School of Law, argues that the data reveals that the right to be forgotten seems to be a legitimate piece of law. “If most of the requests are private and personal ones, then it’s a good law for the individuals concerned. It seems there is a need for this – and people go for it for genuine reasons.”

On the contrary, consider this chart (from the Guardian explore the data page):

The data shows that 96% of the requests are likely to have one searcher, the person making the request.

If the EU wants to indulge such individuals, it should create a traveling “Board of the Right to Be Forgotten,” populate it with judges, clerks, transcribers, translators, etc. that visits every country in the EU on some regular schedule and holds televised hearings for every applicant and publishes written decisions (in all EU languages) on which links should be delisted from Google.

That would fund the travel, housing and entertainment industries in the EU, a perennial feature of EU funding and relieve Google of the distraction of such cases. It would establish a transparent record of the self-obsessed who request delisting of facts from a search engine and the facts deleted.

Decisions by a “Board of the Right to Be Forgotten” would also enable the monetization of requests to be forgotten, by easing the creation of search engines that only report facts “forgotten” by Google. Winners all the way around!

“Blue light special” is nearly a synonym for KMart. If you search for “blue light special” at Wikipedia, you will be redirected to the entry for Kmart.

A “blue light special” consisted of a blue police light being turned on and a KMart employee announcing the special to all shoppers in the store.

As of Tuesday, July 14, 2015, there are now blue light specials on Windows Server 2003. Well, sans the blue police light and the KMart employee. But hackers will learn of vulnerabilities in Windows Server 2003 and there will be no patches to close off those opportunities.

Although open sourcing Windows Server 2003 might cut into some of the maintenance contract income, it would greatly increase the pressure on businesses to migrate off of Windows Server 2003 as hackers get first hand access to this now ancient code base.

In some ways, open sourcing Windows XP, Windows Server 2003 could be a blue light special that benefits all shoppers.

Microsoft obtains the obvious benefits of greater demand, initially, for formal support contracts and in the long run, the decreasing costs of maintaining ancient code bases, plus new income from migrations.

People concerned with the security, or lack thereof in ancient systems gain first hand knowledge of those systems and bugs to avoid in the future.

IT departments benefit from having stronger grounds to argue that long delayed migrations must be undertaken or face the coming tide of zero-day vulnerabilities based on source code access.

Users benefit in the long run from the migration to modern computing architectures and their features. A jump comparable to going from a transistor radio to a smart phone.

To understand large legacy systems we need to look beyond the current structure of the code. We need to understand both how the system evolves and how the people building it collaborate. In this session you’ll learn about a Clojure application to mine social information such as communication paths, developer knowledge and hotspots from source code repositories (including Clojure itself). It’s information you use to improve both the design and the people-side of your codebase. We’ll also look at some interesting libraries like Incanter and Instaparse before we discuss the pros and cons of writing the tool in Clojure.

From the presentation:

“Laws” of Software Evolution

Continuing Change

“a system must be continually adapted or it becomes progressively less satisfactory”

Increasing Complexity

“as a system evolves, its complexity increases unless work is done to maintain or reduce it.

Those two “laws” can be claimed for software but they are applicable to any system, including semantics.

Adam develops code that uses source control logs to identify “hot spots” in code, which is available at: Code Maat.

Early in his presentation Adam mentions that the majority of a programmer’s time isn’t spent programming but rather “…making changes to existing code and the majority of that, trying to understand what the code does….” Imagine capturing your “understanding” of existing code using a topic map.

Here is the full text of the Iran nuclear deal. The “E3/EU+3″ is a reference to the world powers that negotiated the deal with Iran (three European Union states of UK, France, and Germany, plus three others of China, the US, and Russia). A lot of the text is highly technical, but it’s still surprisingly readable for an international arms control agreement that was hammered out in many past-midnight sessions.

This article describes a simple tool to display geophylogenies on web maps including Google Maps and OpenStreetMap. The tool reads a NEXUS format file that includes geographic information, and outputs a GeoJSON format file that can be displayed in a web map application.

From the introduction (with footnotes omitted):

The increasing number of georeferenced sequences in GenBank [ftnt omitted] and the growth of DNA barcoding [ftnt omitted] means that the raw material to create geophylogenies [ftnt omitted] is readily available. However, constructing visualisations of phylogenies and geography together can be tedious. Several early efforts at visualising geophylogenies focussed on using existing GIS software [ftnt omitted], or tools such as Google Earth [ftnt omitted]. While the 3D visualisations enabled by Google Earth are engaging, it’s not clear that they are easy to interpret. Another tool, GenGIS [ftnt omitted], supports 2D visualisations where the phylogeny is drawn flat on the map, avoiding some of the problems of Google Earth visualisations. However, like Google Earth, GenGIS requires the user to download and install additional software on their computer.

By comparison, web maps such as Google Maps [ftnt omitted] are becoming ubiquitous and work in most modern web browsers. They support displaying user-supplied data, including geometrical information encoded in formats such as GeoJSON, making them a light weight alternative to 3D geophylogeny viewers. This paper describes a tool that makes use of the GeoJSON format and the capabilities of web maps to create quick and simple visualisations of geophylogenies.

Whether you are interested in geophylogenies or in the use of GeoJSON, this is a post for you.

Making a beautiful app for news is great; making a beautiful reusable app for news is better. At least that’s the thinking behind a new project released by Vox Media today: Autotune is a system meant to simplify the creation and duplication of things like data visualizations, graphics, or games.

Autotune was designed by members of the Vox Media product team to cut down on the repetitive work of taking one project — say, a an image slider — and making it easy to use elsewhere. It’s “a centralized management system for your charts, graphics, quizzes and other tools, brought to you by the Editorial Products team at Vox Media,” according to the project’s GitHub page. And, yes, that means Autotune is open source.
…

Sounds like a great project but I will have to get a cellphone to pass judgement on apps. I would have to get a Farraday cage to keep it in when not testing apps.

tl;dr: A recent NSDI paper argued that data analytics stacks don’t get much faster at tasks like PageRank when given better networking, but this is likely just a property of the stack they evaluated (Spark and GraphX) rather than generally true. A different framework (timely dataflow) goes 6x faster than GraphX on a 1G network, which improves by 3x to 15-17x faster than GraphX on a 10G network.

“Network optimizations can only reduce job completion time by a median of at most 2%. The network is not a bottleneck because much less data is sent over the network than is transferred to and from disk. As a result, network I/O is mostly irrelevant to overall performance, even on 1Gbps networks.” (§1)

The measurements were done using Spark, but the authors argue that they generalize to other systems. We thought that this was surprising, as it doesn’t match our experience with other data processing systems. In this blog post, we will look into whether these observations do indeed generalize.

One of the three workloads in the paper is the BDBench query set from Berkeley, which includes a “page-rank-like computation”. Moreover, PageRank also appears as an extra example in the NSDI slide deck (slide 38-39), used there to illustrate that at most a 10% improvement in job completion time can be had even for a network-intensive workload.

This was especially surprising to us because of the recent discussion around whether graph computations require distributed data processing systems at all. Several distributed systems get beat by a simple, single-threaded implementation on a laptop for various graph computations. The common interpretation is that graph computations are communication-limited; the network gets in the way, and you are better off with one machine if the computation fits.[footnote omitted]

…

The authors introduce Rust and timely dataflow to achieve rather remarkable performance gains. That is if you think a 4x-16x speedup over GraphX on the same hardware is a performance gain. (Most do.)

Code and instructions are available so you can test their conclusions for yourself. Hardware is your responsibility.

While you are waiting for part 2 to arrive, try Frank’s homepage for some fascinating reading.

Unfortunately I did not find a link to bulk data for presidential nominations nor an API for the search engine behind this webpage.

I say that because matching up nominees and/or their sponsors with campaign contributions would help get a price range on becoming the ambassador to Uraguay, etc.

I wrote to Ask a Law Librarian to check on the status of bulk data and/or an API. Will amend this post when I get a response.

Oh, there will be a response. For all the ills and failures of the U.S. government, which are legion, it is capable of assembling vast amounts of information and training people to perform research on it. Not in every case but if it falls within the purview of the Law Library of Congress, I am confident of a useful answer.

The Internet is full of fascinating information. While getting lost can sometimes be entertaining, most people still prefer somewhat guided surfing excursions. It’s always great to find new websites that can help out and be useful in one way or another.

Here are a few of our favorites.

Thomas does a great job of collecting, the links/tools run from odd, photos taken at the same location but years later (haven’t they heard of Photoshop?), to notes that self destruct after being read.

Remember for the self-destroying notes that bytes did cross the Internet to arrive. Capturing them sans the self-destruction method is probably a freshman exercise at better CS programs.

My problem is that my bookmark list probably has hundreds of useful links, if I could just remember which subjects they were associated with and was able to easily retrieve them. Yes, the cobbler’s child with no shoes.