Engineering · Security
Proof of Work Against Scrapers
Why this site makes your browser solve a small puzzle before it loads — and why more of the web should.
The thing that set me off
I check my logs more than I'd like to admit, and the pattern is always the same. A handful of real people, and then thousands of requests from bots that nobody asked for, chewing through every page as fast as my server will hand them over. Some of them are honest crawlers. Most are just there to copy whatever they can and sell it, or feed it into something.
What bugs me isn't that they look. The web is supposed to be open, and I like that. It's how cheap it is for them. A scraper fires off one tiny request and pays basically nothing, while I'm the one footing the bill for bandwidth. That imbalance is the whole problem, so I figured: fine, let's make the bots pay a little something to get in.
Proof of work, quickly
The idea is older than crypto and pretty simple. Before I let you in, I make your computer do a small, annoying chunk of math. Specifically: find a number that, when I glue it onto a random string and run the whole thing through SHA-256, produces a hash starting with a few zeros.
There's no clever shortcut for that. You just have to guess over and over until one sticks, which costs real CPU time. But checking your answer is one hash and it's instant. That gap is the magic. Expensive to solve, trivial for me to verify. Bitcoin does the same trick at an absurd scale; I just need a version small enough that a browser can finish it in about a second.
What this site actually does
You probably already saw it. That bunny on the loading screen when you first showed up? While you were watching it, your browser was quietly grinding through hashes in the background to solve one of these puzzles. Took maybe a second. Then the site appeared and you forgot about it.
Under the hood the server hands out a random challenge and signs it, so it only accepts puzzles it actually gave you. (Otherwise you could just invent your own, solve it once, and reuse it forever, which would defeat the point.) Your browser solves it in a worker thread, sends the answer back, the server re-checks it in one hash and drops a cookie. That cookie is what actually guards the stuff that matters, like posting on the message wall.
Two things I cared about. The page itself is always in the HTML, so Google and anyone with JavaScript off still read everything fine — the puzzle only guards the interactive bits. And if anything goes wrong, a timeout, a slow phone, a worker that errors out, the site just lets you in anyway. I would rather a scraper occasionally slip through than lock out one real person who came to read.
Why I didn't just use a captcha
Because I find captchas genuinely insulting. They make the human do the work, squinting at fire hydrants to prove they're not a robot, and half of them quietly ship your behaviour off to some third party while they're at it. Cookie banners and "press and hold to continue" walls are the same energy: friction you feel every single time, all of it pointed at the visitor.
Proof of work flips who pays. The machine does the grinding, not you. It happens once, in the time it takes the page to fade in, and it doesn't need a single third-party script or a banner asking permission to track you. For a site that's trying to stay quiet and respect the people reading it, that trade felt obvious.
The catch, because there's always one
This won't stop a determined scraper, and I'm not going to pretend it will. If someone really wants my pages and has money to burn, they'll just pay the CPU cost or run a real headless browser that solves the puzzle like everyone else. What it kills is the cheap, lazy, scrape-a-million-pages-before-lunch stuff, which is most of it.
The dial is difficulty: more required zeros means more work for the bots, but also a longer wait on a cheap phone. Crank it too high and congratulations, you've reinvented the cookie wall you were trying to avoid. Finding the setting where a human never notices but a mass crawl can't afford it is most of the actual work, and I'm still tweaking mine.
Anyway, you should try it
I think more sites should do this. It sits in the gap between doing nothing and hiding behind a captcha vendor, it's a couple hundred lines, it leans on a hash function every browser already ships, and it asks nothing of your visitors' privacy. Worst case for a real person is a one-second wait they'll never think about again.
Publishing on the open web means giving everyone a free copy, and I still think that's worth it. Charging the robots a sliver of electricity to take theirs isn't closing the door. It's just asking them to chip in. If you run anything, it's worth a look.
