Tracking, marketing, and analytics firms have been exfiltrating the email addresses of internet users from web forms prior to submission and without user consent, according to security researchers.
Some of these firms are said to have also inadvertently grabbed passwords from these forms.
In a research paper scheduled to appear at the Usenix ’22 security conference later this year, authors Asuman Senol (imec-COSIC, KU Leuven), Gunes Acar (Radboud University), Mathias Humbert (University of Lausanne) and Frederik Zuiderveen Borgesius, (Radboud University) describe how they measured data handling in web forms on the top 100,000 websites, as ranked by research site Tranco.
The boffins created their own software to measure email and password data gathering from web forms – structured web input boxes through which site visitors can enter data and submit it to a local or remote application.
And many companies involved in data gathering and advertising appear to believe that they’re entitled to grab the information website visitors enter into forms with scripts before the submit button has been pressed.
“Our analyses show that users’ email addresses are exfiltrated to tracking, marketing and analytics domains before form submission and without giving consent on 1,844 websites in the EU crawl and 2,950 websites in the US crawl,” the researchers state in their paper, noting that the addresses may be unencoded, encoded, compressed, or hashed depending on the vendor involved.
Most of the email addresses grabbed were sent to known tracking domains, though the boffins say they identified 41 tracking domains that are not found on any of the popular blocklists.
“Furthermore, we find incidental password collection on 52 websites by third-party session replay scripts,” the researchers say.
Replay scripts are designed to record keystrokes, mouse movements, scrolling behavior, other forms of interaction, and webpage contents in order to send that data to marketing firms for analysis. In an adversarial context, they’d be called keyloggers or malware; but in the context of advertising, somehow it’s just session-replay scripts.
Gunes Acar, one of the report co-authors, was also the co-author of a similar research project in 2017 that looked at data gathering by session-replay companies Yandex, FullStory, Hotjar, UserReplay, Smartlook, Clicktale, and SessionCam.
Evidently, not much has changed since then, except perhaps that email addresses have become more desirable as unique identifiers now that privacy-oriented browsers like Brave, Firefox, and Safari are taking more steps to block cookies and tracking scripts.
Email addresses, the researchers observe, represent a cookie replacement because they’re unique, persistent, and can be used to track people across applications, platforms, and even offline interactions that may be tied to an email address like loyalty card transactions.
The website categories with the most leaking forms include: Fashion/Beauty (11.1 per cent, EU; 19 per cent US); Online Shopping (9.4 per cent EU; 15.1 per cent US); and General News (6.6 per cent EU; 10.2 per cent US).
Websites categorized as Pornography had the best privacy when it comes to surreptitious form data harvesting.
“A somehow surprising result was the following: despite filling email fields on hundreds of websites categorized as Pornography, we have not a single email leak,” the researchers say, noting that previous studies of adult-oriented websites have relatively fewer third-party trackers than similarly popular general interest websites.
Those pesky regulations
The report authors say that EU websites practicing email exfiltration may be in violation of at least three GDPR requirements: transparency, purpose limitation, and prior consent. Firms found to be violating these rules can be fined up to $20m euros or 4 per cent of annual revenue, per Article 83(5).
The US doesn’t have a federal data privacy law, though it’s conceivable one of the handful of US states with applicable privacy rules could take action against pre-submission form harvesting. But given the toothlessness of US privacy regulation over the past decade, don’t expect much.
The authors say they attempted to contact 58 first-parties and 28 third-parties with GDPR requests. They report receiving 30 responses from the first-parties, which varied from surprise and remediation to justifications of one sort or another.
“fivethirtyeight.com (via Walt Disney’s DPO), trello.com (Atlassian), lever.co, branch.io and cision.com were among the websites that said they had not been aware of the email collection prior to form submission on their websites and removed the behavior,” the report says.
Marriott, meanwhile, said the information collected by digital analytics firm Glassbox helps with customer care, technical support, and fraud prevention.
Third-parties Taboola, Zoominfo, and ActiveProspect defended their data collection practices.
Facebook, aka Meta, is among the third-parties involved in this. The researchers say that email addresses or their hashes were spotted being sent to facebook.com from 21 different websites in the EU.
“On 17 of these, Facebook Pixel’s Automatic Advanced Matching feature was responsible for sending the SHA-256 of the email address in a
SubscribedButtonClick event, despite not clicking any submit button,” the report says.
Advanced Matching – called out recently for harvesting student loan data – is designed to collect hashed customer data, such as email addresses, phone numbers, and names from checkout, sign-in, and registration forms. The researchers speculate that on these sites, Facebook’s script treats clicks on non-submit buttons as a click event for the submit button.
Facebook did not respond to a request for comment.
The report concludes that browser vendors, regulators, and privacy tool makers need to deal with this issue because it isn’t going away. “Based on our findings, users should assume that the personal information they enter into web forms may be collected by trackers – even if the form is never submitted,” the report concludes. ®