Dennis Hackethal’s Blog
My blog about philosophy, coding, and anything else that interests me.
Images and Privacy for the Web
I’m not a security expert. Don’t rely on this article to make informed security decisions for your tech.
A common security feature for emails is the disabling of images. Why? Because trackers can embed hidden images in an email to determine whether you’ve opened it. For example, an email could contain this HTML:
<img src="http://some-tracker.com/email-opened?id=123" style="display: none;">
If your email client is configured to display images, your browser will send a GET request to
http://some-tracker.com/email-opened?id=123, letting the server at some-tracker.com know that you’ve opened an email with id
123. And along with that request comes your IP address, which may divulge your location, and other information about your browser which can be used for fingerprinting. Since the
img tag is set not to display, you will never even know.
That’s why it’s a good idea to configure your email client not to render images from the internet at all. That way, it will only render images that are attached to the email, in which case no requests need to be made and no information about you is divulged.
If that’s a good idea for emails, why not disable images when you browse the web generally? Chrome has such a setting under Settings → Site Settings → Images → Don’t allow sites to show images, but that seems a bit overkill. It’s one thing to block images in emails, which are very limited in scope and make up only a small part of your browsing experience, but to block images altogether when browsing the web will make for a rather boring (and sometimes dysfunctional) experience.
What I’d like to see is the ability to block images from websites other than the one you’re currently visiting. Say you visit website A, which instructs your browser to load an image from website B. It seems reasonable to assume that by visiting A on purpose, you made a conscious choice about divulging some of your data to A. But you may have never known about website B, not to mention that B might not render legitimate images but function as a tracker. Just like in the email example above. (I don’t think all trackers are bad – websites have a legitimate analytics need and they should use legitimate, privacy-conscious trackers. That includes Plausible, which I use, and excludes Google Analytics, which you shouldn’t use.)
Browsers give servers a fair amount of control over which images to render. For one, there’s CORS, which, among other things, allows the owner of a site to prevent other sites from loading its images. Assuming the browser respects site B’s CORS settings, images from site B simply won’t load on site A. If you’re wondering why the owner of site A doesn’t just choose not to embed any images from site B, the images may have been included in user-generated content. And again, in this example, it’s on site B to have the requisite CORS settings, not A. (If I’m not mistaken, browsers will prevent cross-origin loading of images by default, unless the target origin specifically allows it.)
Second, there’s the content-security-policy response header, which gives a site owner fine-grained control over which sites images can be loaded from. For example, the following response header will instruct the browser to load images only from the same site:
Content-Security-Policy: img-src 'self'
That way, whenever an image’s
src attribute points to another site, your browser will refuse to load the image. It will display an error in the console instead. You could also choose to include
data: so that, say, base64-encoded images can render, or
https: so that images can be loaded from websites that use SSL but not from those that do not. (Think of the nightmare of thousands upon thousands of images being loaded as you browse the web over time while anyone on the same network can view them.) For these extra settings, adjust your header to the following:
Content-Security-Policy: img-src 'self' data: https:
So, while CORS allows site B to prevent site A from loading B’s images, a content security policy allows site A to prevent the loading of images from site B. In other words, CORS is for incoming requests to the server; content security policies are for outgoing requests from the client.
However, both of these mechanisms are available only to site owners. As a user surfing the web, you have little control over where images are loaded from. You can either render all images or none. And you shouldn’t have to rely on webmasters’ technical ability and trustworthiness to protect your privacy. That’s why, again, I look forward to a browser setting allowing you to render same-origin images while blocking cross-origin ones. At the time of writing, all I could find was a Chrome extension that lets you block images based on their size, but that’s a different issue. There exist open-source tracker blockers such as uBlock Origin, but that one doesn’t block cross-origin images either, at least not all of them (I’ve tested it). That means some trackers must fall through the cracks. If you use any such blockers, make sure they’re open source and have good reviews. And continue using them even after the technology to block cross-origin images exists.
Until then, if you operate a site, for the sake of your visitors’ privacy, consider loading all resources, including images, scripts, and fonts, from your own server. Use a strict content security policy – even if you don’t have user-generated content, it will prevent you from accidentally loading remote content in your code. An added benefit of a strict policy is increased security: cross-site scripting (XSS) becomes a lot harder to pull off, as does cross-site request forgery (CSRF). However, content security policies cannot be relied upon as older browsers do not support them. If you absolutely do need to load cross-origin images, do so on the server and then pass them on to the client for rendering. That seems like an idea for a SaaS business catering to privacy-conscious webmasters, if it doesn’t already exist.
For full disclosure, here’s the content security policy I use at the time of writing:
default-src 'self'; font-src 'self'; img-src 'self' https://dh-podcasts.s3.us-east-2.amazonaws.com; object-src 'none'; script-src 'self' '<dynamically generated nonce>'; style-src 'self' '<dynamically generated nonce>'; media-src 'self' https://dh-podcasts.s3.us-east-2.amazonaws.com; connect-src 'self' https://plausible.io
This policy says, among other things, that I allow browsers to load images and other media from my specific Amazon AWS bucket and to connect to Plausible for analytics.
These values are put into a single line and sent to the client as
Content-Security-Policy header. I use Rails’ powerful DSL for generating content security policies.