Pulse
7IT Solutions
← All articles
Custom Software

Smart Scraping Bots That Fake an iPhone: How Cloudflare and Vercel Fight Back

Lior Aharonov Lior Aharonov 10 min read

It usually starts as a shape you cannot explain. Traffic climbs at odd hours, your serverless bill creeps up, the analytics stop matching reality, and the site feels a little slower for everyone, yet none of it lines up with real customers or real sales. Somewhere in the logs sits a request that swears it is an iPhone in someone's hand, except it arrived from a server farm on the other side of the world, and its twin shows up ten thousand times an hour. That is a scraping bot, and the modern ones are very good at hiding.

The reason this is worth your attention is not abstract. A scraper that lifts your prices hands a competitor your strategy. One that copies your catalog and content can republish it elsewhere or feed it into a model you never agreed to train. The raw traffic inflates your hosting and function costs, slows the experience for the customers who are actually trying to buy, and quietly poisons the numbers you use to make decisions. The encouraging part is that you do not have to accept any of it, and you do not need to become a security company to stop it. The platforms you are likely already using, Cloudflare and Vercel, ship the tools that turn this from a slow drain into a solved problem.

What a "smart" scraping bot actually looks like

The crude bots announce themselves. They send an obvious automated user agent, hammer one URL, and are easy to swat. The ones that cost you money do the opposite. They present a perfectly ordinary phone or desktop browser as their identity. They spread their requests across large pools of addresses so no single one looks busy. They pace themselves to resemble a human clicking around, and the more sophisticated ones drive a real headless browser so the page even runs JavaScript. And they go straight for what is valuable: your live prices, your product data, your availability, your written content.

The point is that the bot is not trying to look like a bot. It is trying to look exactly like your best customer, which is why the instinct to "just block the bad ones by name" never gets you very far.

The tell: a phone that lives in a datacenter

Here is the specific pattern that gives them away, and the example worth understanding in detail. A real iPhone reaches your site over a mobile carrier or a home broadband connection. Those networks have a particular character. A request that presents itself as an iPhone but originates inside a major cloud provider's datacenter is almost never a real iPhone, because real people do not browse your store from inside a server rack.

Every address on the internet belongs to a network block identified by an ASN, the number that says which provider an IP comes from. An ASN tells you whether traffic is coming from a home ISP, a mobile carrier, or a datacenter, and that single fact is one of the cleanest signals you have. When the user agent claims a consumer device but the ASN belongs to a large cloud network, including the big Chinese clouds such as Alibaba Cloud, Tencent, and Huawei, among many datacenter networks worldwide, the outside and the inside of the request disagree. A consumer device on the surface, a server farm underneath. That mismatch is the thread you pull.

Why blocking by user agent never works

It is tempting to read that user agent and simply ban it. The problem is that a user agent is just a line of text the client chooses for itself, and anyone can set it to anything. Block a scraping library's default and it becomes "iPhone." Block that and it becomes "Chrome on Windows." You are playing whack-a-mole against a string the attacker rewrites for free, and along the way you risk blocking real customers whose browsers happen to share that text.

Durable protection comes from combining signals the bot cannot cheaply fake: where the request truly comes from, what its connection actually looks like underneath the claims, and how it behaves over time. That is exactly what the platforms below are built to do.

What Cloudflare gives you

Cloudflare sits in front of your site and sees a vast slice of the internet's traffic, which is what makes its detection strong. The features worth knowing, roughly from simplest to most advanced:

  • Bot scoring. Bot Fight Mode and Super Bot Fight Mode, and Bot Management on higher tiers, score each request on how automated it looks using machine learning trained across Cloudflare's whole network. You can challenge or block traffic that scores as automated while letting verified good bots like Googlebot pass untouched.
  • WAF custom rules that combine signals. This is where you encode the iPhone-from-a-datacenter rule in plain logic: if the user agent looks like a mobile device and the request comes from a datacenter ASN you never sell to, then challenge or block it. One rule can weigh ASN, country, bot score, threat score, user agent, and the path being requested together.
  • ASN and country targeting. Cloudflare lets you match on the exact network an address belongs to and on geography, so you can challenge entire datacenter networks that only ever send you scrapers, or regions you do not serve, without disturbing the visitors you want.
  • TLS fingerprinting (JA3 and JA4). Every client leaves a fingerprint in the way it negotiates its secure connection, and that fingerprint reveals the real software making the request no matter what user agent it claims. A scraping tool dressed up as Safari still has the fingerprint of the tool, and Cloudflare can act on the truth rather than the costume.
  • Managed Challenge and Turnstile. Instead of a hard block, you can present a lightweight challenge that real browsers pass invisibly and bots fail. Turnstile is the privacy-friendly alternative to the old image puzzles, and you can place it on logins, signups, and checkout to stop automated abuse without annoying a single human.
  • Rate limiting. Cap how many requests a single source can make to a sensitive path, so a scraper machine-gunning your product pages or an API endpoint gets throttled while a real shopper never comes close to the limit.
  • IP reputation. Cloudflare carries a reputation signal for addresses with a history of abuse across its network, so known-bad sources can be challenged automatically before they do anything.
  • Blocking AI scrapers and crawlers. Cloudflare added straightforward controls to block the recent wave of AI training crawlers, and tools that feed misbehaving crawlers decoy content instead of your real data. If you would rather your catalog and your writing not quietly become training material, this is a real lever.
  • Managed rulesets and DDoS protection. Underneath all of it is baseline protection that absorbs volumetric attacks and known exploit patterns before they ever reach your origin.

What Vercel gives you

If your app runs on Vercel, you have a capable firewall of your own at the edge, and it has grown into a serious bot-protection layer:

  • The Vercel Firewall and custom rules. You write rules on path, IP address, geography, user agent, request headers, and TLS fingerprint to challenge, block, or rate-limit traffic, the same combine-the-signals approach, native to the platform your app already lives on.
  • Attack Challenge Mode. A single switch that puts a verification step in front of suspicious traffic during a bot surge, separating real browsers from automation without taking the site offline for anyone.
  • BotID, invisible bot detection. An invisible challenge you can put in front of your most sensitive actions, login, signup, checkout, and critical API routes, that catches sophisticated bots without ever showing a human a puzzle.
  • Rate limiting and IP deny lists. Throttle or block abusive sources per route, so one endpoint under attack does not drag down the rest.
  • Automatic DDoS mitigation. Baseline volumetric protection that comes with the platform rather than as an upsell.

Defense in depth: Cloudflare in front of Vercel

For many projects the strongest setup is layered. Cloudflare sits at the very edge handling DNS, the firewall, the ASN and bot-score rules, and rate limiting, so the bulk of bad traffic is turned away before it costs you a single serverless invocation on Vercel. Vercel's own firewall and BotID then stand guard over the specific endpoints that matter most. Cheap, high-volume scraping dies at the Cloudflare edge, and anything clever enough to slip through meets BotID at the door of your checkout or your API. It is worth saying plainly that fronting Vercel with Cloudflare has setup details that need to be right to avoid breaking caching or origin verification, which is part of why this rewards being done deliberately rather than by flipping switches and hoping. We run production backends on Vercel every day, including the customs-invoice.com platform, so this is familiar ground rather than a science project, and the webhook and edge discipline behind it is the same as in our guides to Revolut on Vercel and Stripe in a headless storefront.

An honest word on what protection can and cannot do

No setup blocks every bot, and anyone who promises otherwise is selling something. The most determined scrapers rent residential addresses and drive real browsers, which is genuinely expensive for them, and that expense is exactly the point. Good protection raises the attacker's cost and kills the cheap, high-volume majority outright, while challenges and fingerprinting catch most of what remains. The goal is not a flawless wall. It is to make your site a poor target, to protect the endpoints and the data that actually matter, and to do all of it without ever getting in the way of a real customer or a search engine you want to be found by.

How we set this up without breaking real traffic

Bot rules are powerful precisely because they can block traffic, which means a careless rule can turn away the customers you worked to earn. So the process is the safeguard.

  • Discovery from your real logs. We start by looking at what is actually hitting you, which ASNs, which paths, which fingerprints, and at what rhythm, so the rules target the real offenders rather than a guess.
  • A fixed-scope first phase. Usually that means standing up the highest-value protections first: the datacenter-pretending-to-be-a-phone pattern, rate limits on the paths under pressure, and a challenge on the forms being abused, measured against your traffic before and after so you can see the difference.
  • Demos and careful tuning. You watch the change in your own analytics and logs, and we tune deliberately to make sure no real customer, and no good bot like Googlebot, ever gets caught in the net.
  • You own all of it. Your Cloudflare account, your Vercel project, your rules, with direct access to the developer who wrote them and ongoing adjustment as the bots adapt, because they will.

Proof, not promises

Keeping infrastructure fast, affordable to run, and protected from abuse is part of the everyday substance of our work, not a side service. We run real production software on Vercel, including the customs-invoice.com compliance platform and storefronts with live payments such as LeO-Optic, and the same care for clean, owned, well-defended infrastructure runs through our eCommerce work as WooSmiths. If you are weighing how much of your stack to keep under your own control, our piece on owning your project on Vercel makes the broader case.

If your traffic graphs have a shape you cannot explain, or you suspect someone is scraping your prices and your own bill is paying for the privilege, tell me what you are seeing and I will map out the Cloudflare and Vercel protection that fits your project, without locking out the customers and crawlers you actually want.

Have a project in mind?

Let's turn it into custom software that moves your business forward.