Detecting curl_cffi after TLS impersonation: the ClientHello length signal
curl_cffi is the Python wrapper around curl-impersonate, a libcurl fork that ships browser-accurate TLS handshakes. It's the default tool for scraping sites protected by TLS fingerprinting. A pip install curl_cffi, two lines of code, and the resulting requests carry a JA3 hash identical to the target browser's. If your bot detection stops at "does the JA3 match Chrome?", curl_cffi defeats you.
What survives correct JA3 impersonation is the raw ClientHello byte length. Across the curl_cffi impersonation profiles we've observed, length values fall on a recognizable grid that real browsers don't produce. This post explains why the signal exists, what it does and doesn't catch, and where it sits in a multi-signal detection stack.
What this looks like in practice
A common scenario where this matters: an AI-image-generator SaaS, free trial gives every new signup 10 generations on the house. Friday morning the founder wakes up to a $400 OpenAI bill alert. Logs show 247 signups overnight, all between 2:14am and 3:01am, all carrying Mozilla/5.0 ... Chrome/130.0.0.0 Safari/537.36 User-Agents. The emails look real (first.last.4729@temp-mail.org and similar variations), temp-mail providers, but real RFC-shaped addresses. Stripe Radar didn't flag anything because no payment method was required at signup. Cloudflare Turnstile is enabled on the signup page and produced "human" verdicts for every one of those 247 requests because the bot's automation framework executed Turnstile's JavaScript correctly.
Investigation: dump cf.tlsClientCiphersSha1 + cf.tlsClientExtensionsSha1 + cf.tlsClientHelloLength for the 247 fraud signups. Every one of them matches a curl_cffi impersonation profile. The fingerprints look like Chrome 130 to JA3-style checks, but the ClientHello length sits at one of four specific values that real Chrome doesn't produce, all on a 32-byte grid. The bot was using curl_cffi with the impersonate="chrome130" option; whoever scripted it tested against JA3 detection but didn't know about the length signature.
The fix: a TLS-layer filter that flags requests whose (cipher_sha1, extensions_sha1, length) triple matches a known curl_cffi entry. Bot signups stop within 30 minutes of deploying the rule. Real user signup rate is unchanged because the rule doesn't match any real-browser fingerprint.
That pattern (JA3 looks fine, headers look fine, Turnstile passes, but TLS length betrays the library) is the gap the rest of this post explains.
What curl_cffi does at the TLS layer
A TLS ClientHello is the first message a client sends. It includes the cipher suites it supports, the order they're listed in, the TLS extensions present, GREASE values (random-looking placeholders real browsers add to keep middleboxes honest), and length-prefixed extension data inside each extension.
curl-impersonate forks libcurl to use BoringSSL with Chrome's exact configuration and patches the cipher/extension byte order to match a specific browser version. JA3, which hashes cipher suites, extensions, supported groups, and EC point formats, produces an identical value for "curl_cffi impersonating Chrome 136" and "real Chrome 136 on the same OS." JA4 (newer, more granular) likewise matches.
What JA3 and JA4 don't capture is the total ClientHello byte count, the raw length of the handshake record. Cloudflare exposes this as request.cf.tlsClientHelloLength. It's not part of any standard fingerprinting scheme, but it turns out to be highly informative as an extra dimension.
The length pattern
Across the curl_cffi impersonation profiles we've collected in production (Chrome, Firefox, and Safari iOS targets), the ClientHello length is not arbitrary. Each profile produces a small finite set of length values, and within each profile the values fall on a strict 32-byte grid. Different curl_cffi session configurations (impersonate, default_headers, http_version, custom header overrides) cause the library to emit different optional extensions, and the resulting length lands on a predictable multiple-of-32 offset from a base value.
Real browsers don't show this pattern. Real Chrome's ClientHello length is fixed per build, same Chrome 130 on the same OS produces the same length on every connection. Real Firefox the same. The lengths cluster tightly, often within 5–10 byte ranges across an entire browser version's userbase. They don't sit on a uniform 32-byte grid.
For one impersonation profile in our data, the pattern is even tighter: the same (cipher_sha1, extensions_sha1) pair produces multiple lengths differing by 32 bytes, the underlying impersonation is unchanged across session configurations, only the padding shifts. For the other profiles, different session configurations produce different (cipher, ext) pairs, but the length values still land on the 32-byte grid.
We have not bisected curl-impersonate's source to confirm the mechanism. The plausible candidates are TLS extension padding alignment (the padding extension rounding to 32-byte units instead of a fixed target like real Chrome does) and GREASE entry rotation (variable insertion of 16-byte GREASE values across configurations). Both are speculation. If you trace it to a specific patch in curl-impersonate, the contact surface is rozetyp@gmail.com.
Detection approach
A naive TLS fingerprint database stores (cipher_hash, extensions_hash, hello_length) triples and matches strictly. That works against any curl_cffi session whose exact length is in the database. It fails against new session configurations that produce a length we haven't catalogued.
A more useful match generalizes via the 32-byte grid. When a request's cipher and extension hashes match a known curl_cffi entry but the length differs by a non-zero multiple of 32, the request is flagged as the same library at a new session config. The detection generalizes from one observed sample to the broader length family for that fingerprint.
This works cleanly for impersonation profiles where the same (cipher, ext) pair generates multiple length variants. For profiles where each session config produces a different fingerprint pair (only the length lands on the grid), the generalization doesn't apply, those rely on exact match against entries already in the database. New profiles that ship with future curl_cffi releases evade detection until we observe them and add them.
False positive risk: a real Chrome handshake with cipher and extension hashes coincidentally identical to a curl_cffi entry, with a length differing by a multiple of 32. Across observed real-browser traffic this has not occurred, the cipher and extension hashes diverge between curl-impersonate and real browsers at the hash level (curl-impersonate replicates byte ordering but not the precise extension contents, which is what the hashes cover).
Limitations and the evasion ceiling
- Database coverage. The technique catches curl_cffi traffic against impersonation profiles we've observed. A curl_cffi update that ships a new impersonation target produces a new fingerprint pair we haven't catalogued. Until we observe and add it, only the generic "short ClientHello" fallback fires (libraries and CLI tools usually produce sub-400-byte handshakes vs Chrome's 1400+).
- Integration mode. This works only when the visitor's TLS handshake terminates at our edge. In server-to-server integrations where the customer's backend forwards visitor headers to our
/v1/checkendpoint, the TLS handshake we see is the customer's server's connection, not the visitor's. Header signals (RQ4) still fire in that mode; TLS signals don't. - Real headless Chrome. A motivated attacker uses Playwright instead of curl_cffi. That sends a real Chrome TLS handshake, same length distribution as real users, same cipher and extension hashes. No curl_cffi-targeted signature applies. Defeating that requires a JS challenge for proof-of-execution, a browser fingerprint snippet for canvas/WebGL/audio variance, or behavioral analysis across multiple requests. None of these are TLS-layer signals.
The TLS layer catches commodity bot traffic: curl_cffi, scrapy, requests, default Selenium, basic tls-client. It doesn't catch sophisticated headless-browser-based attacks. That's the structural ceiling for any TLS-fingerprinting product; closing it requires layers above the TLS handshake.
Related techniques
The TLS length signal is one of several detection layers. The full stack also includes:
- RQ4, per-request header context analysis catching impossible Sec-Fetch combinations real browsers can't emit. See the RQ4 specification.
- RQ4-S, session-level transition detection catching cookie-reuse handoff attacks (a real browser solves a JS challenge, hands the cookie to a bot, and the bot's subsequent requests show patterns the original browser never produced).
- ASN intelligence, curated VPN and datacenter ASN lists, hosting-org keyword matching.
- RTT physics, speed-of-light validation comparing client RTT to great-circle distance between claimed IP geolocation and Cloudflare PoP.
No single signal catches everything. Detection precision comes from the composite verdict across layers.
Common questions
Why didn't Cloudflare Turnstile catch the bots in the scenario above?
Turnstile is a JavaScript challenge, it presents a small computation to the visitor's browser, and a real browser solves it transparently in the background. The bot in the scenario was using curl_cffi orchestrated by a script that loaded the signup page in a real Chromium instance to solve Turnstile, then exported the resulting session cookie and replayed the actual signup POST through curl_cffi (which is much faster and cheaper to run at scale than a full Chromium for every request). Turnstile saw a real browser solving its challenge; the subsequent signup request from curl_cffi inherited the "passed" cookie. This handoff pattern is increasingly common against any JS challenge. The TLS layer catches the curl_cffi half of the attack regardless of what the Chromium half passed.
Is curl_cffi illegal? Can I get in trouble for blocking it?
curl_cffi itself is open-source and legal. It's a library for making HTTP requests, used for legitimate purposes (testing, scraping public data within terms of service, etc.). What you're allowed to block on your own site is governed by your Terms of Service and applicable computer-misuse laws in your jurisdiction. In practice: blocking curl_cffi traffic to your signup form, login, or trial-credit-grant endpoints is uncontroversial and well within standard practice. Blocking curl_cffi as a blanket measure across all your routes could affect legitimate users of automation tools (e.g., your own API consumers); target the filter narrowly.
Can I just block all signups from datacenter IPs?
That handles part of the problem: bots running from Hetzner, OVH, AWS EC2, etc., would be blocked. But the attack pattern in the scenario above typically uses residential proxy services ($50-200/month, IPs rented from real ISPs). ASN-based IP blocking misses residential-proxied traffic entirely. TLS fingerprinting is unaffected by the bot's IP, it sees the connection's TLS handshake regardless of where the IP comes from. The two layers (IP intelligence + TLS fingerprinting) catch different attack classes and complement each other.
Will my legitimate users on corporate VPNs or proxies get caught?
Corporate VPNs use real TLS stacks, typically the OS's TLS library (Windows Schannel, macOS Network.framework). These produce TLS fingerprints distinct from any browser AND distinct from curl_cffi. A user on a corporate VPN visiting your site over their actual Chrome browser produces a real Chrome TLS handshake; the VPN tunnels the encrypted traffic but doesn't re-handshake. False-positive risk on the curl_cffi signature specifically is essentially zero because the cipher and extension hashes of curl_cffi don't collide with any real browser's.
How fast does curl_cffi adapt to new detection?
curl-impersonate (the upstream project) ships new impersonation profiles every few months as target browsers update their TLS configurations. Each new profile produces new cipher and extension hashes that aren't in any pre-existing detection database. A detection system needs to either observe these in the wild and add them, or detect them via the structural length-pattern signal that's somewhat independent of which target is being impersonated. The structural signal (32-byte grid behavior) is more durable; the exact-fingerprint match decays over months as new profiles ship.
Does this catch Playwright or real headless Chrome?
No. Playwright (and Selenium-with-real-Chrome) launches an actual Chromium process which uses Chrome's actual TLS library. The TLS handshake is bit-for-bit identical to a real user's Chrome handshake, same cipher and extension hashes, same length. No curl_cffi-targeted signal applies. Catching that requires layers that observe browser behavior at the JavaScript or DOM layer, not the TLS handshake layer.
Related reading
- RQ4 specification and the reference implementation at github.com/rozetyp/rq4, the open standard for per-request and session-level header analysis
- Cloudflare Workers normalizes Accept-Encoding before your handler sees it, related gotcha that affects header-based detection signals
- Bot detection middleware for Next.js App Router, practical patterns for blocking automation traffic in Next.js, including the TLS-layer caveat that motivated the body-mode integration discussed above
- RQ4-S: detecting cookie-reuse handoff, session-level extension that catches a different attack class (cookie handoff) than the TLS-layer detection covered here
- Comparing fraud-check APIs, how TLS fingerprinting maps against other available signal layers in the market