I have a client with a WordPress site who is getting lots of spam form submissions. For a long time I have used Ben Marshall’s excellent WordPress Zero Spam plugin, which was originally a simple implementation of a simple idea from David Walsh for preventing spam.
David’s idea was kinda like a reverse honeypot. A traditional honeypot is a hidden HTML input that is empty. It tempts the bots into filling it in with a value. Because it’s hidden a human won’t see it and fill it in, so a submitted honeypot value probably means a bot filled the form in.
David Walsh used JavaScript to ADD an input to the form, and the spam test was to detect this additional input. All this really did was say “this form was submitted by a browser that runs JavaScript”.
There are two problems with this method.
Firstly, it requires JavaScript to be working.
And secondly, spam is clearly an arms race, and the game is to stay one step ahead of the bots. Most bots, it seems, aren’t currently running headless browsers that run JavaScript. I always though the David Walsh method would soon be defeated but it’s been working well for the many years I’ve been using it.
BUT… Ben Marshall (note: I’m REALLY grateful for Ben’s work and I have every respect for him) removed the David Walsh method from version 5 of his plugin, and then added it back in, but it only currently seems to work for comment forms and WordPress registration forms. And I need it to integrate with WP Forms and Gravity Forms.
Another implementation of the David Walsh method exists in Ben Gillbanks‘s excellent, multi-purpose “Toolbelt” plugin. But this only works for Gravity Forms and Jetpack in addition to the standard WordPress forms.
NOW… both these plugins are open source, so I could pull-request the features I need into the plugins. And I may do that, but it will take time to understand the plugin structure and add it and get the changes approved and published by the authors.
My client needs a quick fix.
And it’s always bugged me that JavaScript is needed for the David Walsh method. So I got thinking… can we do something without JavaScript?
An idea
So here’s the idea.
As the bots don’t seem to be running browsers, I assume they are just scraping the form, figuring out the input fields, and POST’ing responses to the action URLs.
So I thought: we don’t want JavaScript, but what might we do with CSS? I eventually settled on the idea that we can load an image with CSS and this can hit a server endpoint. If we can do this conditionally based on some user action then we can tell the server we are real.
The server can then add the IP address of the browser to an allow list temporarily (we’ll come back to this) and allow form submissions from it!
Initially I thought you could use the “active” state on the submit button element to trigger a load of a background image “pixel” on an adjacent element using something like:
input[type="submit"]:active + .pixel {
background-image: url(<pixel-url>);
}
I chose active
over focus
because I wasn’t sure if a mobile device button press would trigger focus.
The problem with active
is: does this give enough time before the form submits to do the allow-listing that the pixel URL/endpoint does?
In my tests the answer was no. I thought I would get a race condition, but it actually seemed that the browser cancelled the background-image load to submit the form.
So I needed another approach.
I eventually discovered the focus-within
state. This state is set on an element when any child element has focus. And this meant that I could load the pixel and do the allow-listing when the user is filling in the form using CSS like this:
form:focus-within .pixel {
background-image: url(<pixel-url>);
}
We can then use the pixel URL endpoint to add the browser’s IP address to the allow-list for a set period of time (would have to be long enough for the user to fill in the form), and then check the allow list when the form is submitted.
I initially tested this as a proof of concept with a simple Laravel application form:
This worked and proved the concept. But as a real test I needed to get it quickly onto my client’s site that had lots of spam, so I quickly cooked up a one-file WordPress plugin that integrated this method with WP Forms.
I tested this locally with success:
And then deployed to the client site to see what happened. I wait to see if it works.
Update: My client has had two days with no spam, the logs I’m creating show that spam is being blocked, and we’re also seeing genuine enquiries getting through. It WORKS!! (I think!!)
Questions and explanations
Here are some things I’ve got questions about and some things that I’ve already thought through.
How is this better than a traditional honeypot?
Bots seem to have figured out how to circumvent some traditional honeypots. I’ve always found the David Walsh technique to be more effective.
How is this better than the David Walsh technique?
It doesn’t use JavaScript. And it’s MINE!
Storing the allow-list
This, like all spam-prevention solutions, is not perfect. For one thing I’m using transients to store the allow-listed IP addresses. The reason for this is that if I stored a single option with a list of IP addresses in then this could have race conditions that caused IP addresses to not make it on the list. This will clutter up the options table with transient entries. So I need to re-think that.
Generating a testing a nonce has been suggested as an alternative. These are not stored in the database so are cleaner. But there’s probably page-caching issues. I will investigate this.
Won’t bots just allow-list themselves by hitting the pixel URL?
I guess the idea here is that this is not a widely-known spam prevention measure. The aim is to get another step ahead of the bots. The bots go for the easy wins – they attack the holes that are likely to get them access across a large number of sites. So diversity of spam prevention solutions seems good.
I think I could get another step ahead by periodically changing the pixel URL and the class name on the pixel element.
Won’t this stop working once bots start using headless browsers?
Yes. But I think at that point, aside from implementing proper (Re)CAPTCHAs, I think we’re probably screwed.
Accessibility and screen-readers and browser support
I probably need to make a couple of accessibility improvements, mainly properly hiding the pixel element from screen readers.
I’m also not sure how this will work with some screen readers. Can I assume that screen readers set the focus state? Do screen readers support the focus-within state?
And focus-within doesn’t work on IE11.
One thought I’d has is that you could detect focus and enable the pixel using Javascript as well as CSS. Then CSS acts as a fallback for when Javascript isn’t available.
What if CSS is disabled
It seems this can be done. But I wonder what the probably that both CSS and JS are unavailable in the real world is? My CSS is inlined, so it doesn’t need an extra file to be loaded. The only case would be if someone had deliberately disabled CSS. Does anyone do that?
Are there privacy/data protection/GDPR issues?
IP addresses are classed as personal data. But if someone’s sending you a form submission then you’re probably already collecting personal data. I would just be clear in our the privacy policy that the IP address is stored temporarily for spam prevention purposes.
Will you be making this into a WordPress plugin/Laravel package?
Well, you can see the simple WordPress plugin code I got working with WP Forms. That’s currently being tested on a site that I know gets spam form entries.
You can take that and try to make your own integrations/plugins/packages. I’d love people to take this concept and run with it.