Session-replay scripts disrupt online privacy in a big way

Marketing companies place a high value on tracking what consumers do online, hence their desire to develop better ways of tracking our internet activities, which in turn forces privacy pundits to cry foul. The marketing tools currently of concern are a subset of third-party analytics called session-replay scripts.

What is a session-replay script, and why is it a privacy concern?

Session-replay scripts were originally developed to help website operators understand how visitors interact with their site and to identify user-interface problems. In order to do that, website developers embed one or more session-replay script in the website’s back-end programming. This affords the ability to replay an individual’s browsing session—every click, input, and scrolling behavior, as well as the entire contents of the pages visited—and therein lies the problem.

The original purpose of session-replay scripts is exemplary. “However, the extent of data collected by these services far exceeds user expectations; text typed into forms is collected before the user submits it, and precise mouse movements are saved, all without any visual indication to the user,” write Princeton security researchers and coauthors Steven Englehardt, Gunes Acar, and Arvind Narayanan, in their post titled No Boundaries: Ex-filtration of personal data by session-replay scripts. “This data cannot reasonably be expected to be kept anonymous. In fact, some companies allow publishers to link recordings to a user’s real identity.”

The researchers offer an example of what is possible. “Collection of page content by third-party replay scripts may cause sensitive information such as medical records, credit-card details, and other personal information displayed on a page to leak to the third-party as a part of the recording,” note Englehardt, Acar, and Narayanan. “This may expose users to identity theft, online scams, and other unwanted behavior. The same is true for the collection of user inputs during checkout or registration processes.”

Redaction tools work, but not always

Several of the recording services offer a redaction capability. The three researchers took this into consideration, setting up test pages with session-replay scripts from some of the companies offering the service, plus analyzing several live websites. The researchers found the following types of vulnerabilities.

Passwords are included in session recordings: Services automatically exclude password input fields from being recorded. However, mobile-friendly login boxes that store unmasked passwords are not redacted, and unfortunately, the information can be recorded without the form being submitted.

Sensitive user inputs are redacted in a partial and imperfect way: The recording services offer automated redaction, but the quality of redaction varies widely. Figure A displays the team’s findings: A filled circle denotes the data is excluded, a half-filled circle is where there is equivalent masking, and an empty circle indicates that data is sent in the clear.

Manual redaction of personally identifying information displayed on a page is a fundamentally insecure model: Besides collecting user input, session-recording companies can collect rendered page content. “Unlike user input recording, none of the companies appear to provide automated redaction of displayed content by default; all displayed content in our tests ended up leaking,” state the researchers. “Sensitive user data has a number of avenues to end up in recordings, and small leaks over several pages can lead to a large accumulation of personal data in a single session recording.”

Recording services may fail to protect captured user data: Companies that offer recording services must be willing to handle the captured data with best-intentioned security practices, but that is not always the case. The researchers offer the following example:

“The publisher dashboards for Yandex, Hotjar, and Smartlook all deliver playbacks within an HTTP page, even for recordings which take place on HTTPS pages. This allows an active man-in-the-middle attack to inject a script into the playback page and extract all of the recording data. Worse yet, Yandex and Hotjar deliver the publisher page content over HTTP—data that was previously protected by HTTPS is now vulnerable to passive network surveillance.”

The researchers emphasize that the problems can be fixed—leaking user data and passwords can be patched—though they add this note of caution, “But, as long as the security of user data relies on publishers fully redacting their sites, these underlying vulnerabilities will continue to exist.”

What are our options?

Englehardt, Acar, and Narayanan state that ad-blocking lists such as EasyList and EasyPrivacy can help, but not in every instance. “EasyPrivacy has filter rules that block Yandex, Hotjar, ClickTale, and SessionCam,” say the researchers. The three authors caution that neither EasyList nor EasyPrivacy block FullStory, Smartlook, or UserReplay scripts.

In the comment section of the Princeton researchers’ paper, there was mention of using applications such as Ghostery and NoScript; Englehardt responded that both Ghostery and NoScript—if configured to block all scripts—would be effective since the recording scripts will not run. He added, “A blocker which includes the EasyPrivacy blocklist, like uBlock Origin, will also block most of the parties mentioned in this post.”

There is a problem with blocking all scripts: All means all, including scripts used for conducting business, which seems to indicate that as of now there is no simple cure to prevent user sessions from being recorded if session-replay scripts are enabled on the website.

Note: The researchers updated the article to include a website that contains a list of sites with session-replay scripts, and sites where they have confirmed recording by third parties.