| |

Beyond Basic Bot Detection: Humanizing Web Analytics through DX Spotlight

TL;DR

Bot traffic now accounts for nearly 50% of all web traffic and is growing more sophisticated each year, making traditional analytics data unreliable for business decisions. While existing solutions struggle to address this challenge, DX Spotlight’s AI-powered Traffic Classification module identifies and filters out non-human traffic, restoring the human perspective to your analytics data.

Bot Traffic Today

For over a decade, web analytics served as a reliable compass for understanding digital customer behaviour. But that reality has fundamentally shifted. Today, your analytics are probably lying to you – and increasingly sophisticated bots are the culprits.

A staggering 50% of all internet traffic now comes from bots, with malicious/ bad bots alone accounting for 32% in 2023 (source). For businesses tracking their digital presence, this means one in two ‘visitors’ might not be human at all.

Types of Bot Traffic: Breaking Down the Numbers

Legitimate Bots (18% of total traffic):

  • Search engine crawlers indexing content e.g. Googlebot, Bingbot, GPTBot
  • Analytics and monitoring bots e.g. UptimeRobot, ArchiveBot
  • Social media crawlers e.g. Facebookexternalhit, PinterestBot 

Malicious Bots (32% of total traffic):

  • Traffic manipulation bots: Artificially inflate website metrics
  • Web scrapers: Systematically copy data and content
  • Ad fraud bots: Click on paid advertisements or on fake ads.
  • Account takeover bots: Target login endpoints to gain unauthorized access

The Rising Challenge of Bot Detection

Traditional bot blocking is becoming increasingly ineffective. Modern bad bots don’t just conceal obvious automated markers – they actively mimic human behavior patterns. These bots have evolved to create traffic patterns nearly indistinguishable from legitimate users. 

Cressive DX’s analysis of client data regularly uncovers logically impossible patterns in raw analytics data – like pages being viewed without any corresponding sessions. These anomalies exist regardless of whether organizations use Google Analytics, Adobe Analytics, Matomo or other solutions, and so are increasingly more than a factor of the recording systems themselves.

When Numbers Lie: The Real Impact on Analytics

The current level of Bot infestation in analytics data makes analysis nigh impossible. One of our clients, a B2B SaaS brand, noticed a large spike in all traffic metrics during February 2024. The marketing team initially – optimistically – attributed this to their seasonal campaign. However, analysis revealed that 90% of this traffic came from sophisticated bad bots, making the campaign’s true impact unclear.

Screenshot

Another client saw pageviews rise up dramatically while sessions increased by 100%. The initial assumption was that their product campaign and updated internal linking structure were paying off. The reality? Bad bots were copying product information, creating artificial pageview inflation while maintaining somewhat realistic session counts.

chart showing traffic spike due to bots

These examples highlight a critical problem for digital brands: when bot traffic contaminates analytics data, it doesn’t just skew numbers – it fundamentally undermines the ability to make informed decisions. Marketing budgets risk getting misallocated chasing phantom engagement. Content strategies optimize for bots rather than humans. Most concerningly, brands lose sight of genuine customer behaviour patterns, making it impossible to improve real user experience and drive authentic growth.

“The impact is a critical for digital brands: bot traffic contaminates analytics data and it fundamentally undermines analysis” 

Some Markers of Bot Traffic for Self Diagnosis

While sophisticated bad bot traffic is difficult to quantify and adjust for, there are some markers in analytics data that can be used to ascertain their presence, using your analytics source. This analysis is best carried out over a month or longer period.

  1. Traffic anomalies: Spikes in sessions and/or pageviews that are substantially higher than average traffic for the period.
  2. Traffic from multiple unexpected locations on the same day.
  3. Very high or very low bounce rates.
  4. 1, 2 and/or 3 above combined with direct or undefined channels.

The Inadequacy of Existing Solutions

Most available solutions for managing bot traffic focus on prevention – blocking bots before they can access your website. While this approach seems logical, it faces several fundamental challenges:

Prevention vs Reality

  • Standard advice typically includes:
  • Installing firewalls e.g. Azure WAF
  • Configuring robots.txt files
  • Blocking suspicious IP addresses
  • Customizing analytics platforms
  • Implementing cloud-based protection services e.g. Cloudflare

While these measures can help reduce bot traffic, they face a fundamental problem: public websites need to remain accessible. Complete bot prevention is nearly impossible without severely impacting legitimate users.

“Blocking based solutions miss a critical point: public websites need to be accessible”  

Analytics Platform Limitations

Analytics platforms like Google Analytics 4 have built-in options to counter bot traffic:

  • IP/host blacklisting
  • Referral exclusion lists
  • Built-in bot filtering
  • Custom filters and segments

However, these tools rely on identifying known patterns. Modern bots constantly evolve their behaviour, making static filtering rules increasingly ineffective. Moreover, managing these filters becomes a resource-intensive task with diminishing returns.

The Missing Piece

Perhaps most importantly, traditional solutions don’t address a critical question: What happens when bots inevitably make it through? For businesses trying to understand their true customer engagement, blocking some bots isn’t enough – they need to identify and filter out non-human traffic from their analytics data. 

Traditional solutions don’t address a critical question: What happens when bots inevitably make it through?

Humanizing Analytics through DX Spotlight

Our Analytics Classification module, part of DX Spotlight, directly addresses this challenge. Rather than attempting to block bots – a strategy that often fails against sophisticated automation – the solution focuses on using AI to identify and filter out non-human traffic from analytics data. The module is currently in beta and undergoing continuous R&D, but the results already speak for themselves.

Looking at the B2B SaaS brand example from earlier, the difference between raw traffic (sessions and pageviews) and actual human traffic as determined by DX Spotlight is striking. While raw analytics showed a dramatic spike in sessions, the filtered data reveals a much more stable pattern of genuine user engagement. This clean, bot-free view of traffic provides the reliable foundation that brands need for data-driven decision making.

Bot vs Human-only traffic, as determined by DX Spotlight
Bot vs Human-only sessions, as determined by DX Spotlight
Bot vs Human-only pageviews, as determined by DX Spotlight
Bot vs Human-only pageviews, as determined by DX Spotlight

How DX Spotlight Traffic Classification Works

  1. Data Ingestion and ETL: Data collection from any mainstream analytics source.
  2. Pattern Analysis and Classification: ML-based classification pipeline
  3. Reporting: Reporting Human Traffic on DX Spotlight.

Benefits for Brand Owners

With Bot Traffic identified and most importantly, quantified, Spotlight enables analysis of clean, human analytics data.

High level analytics KPIs and consultant’s insights – after removing all Bot impact

DX Spotlight even adjusts and provides human-only traffic across key dimensions such as channels, locations, devices and more.

Analyse performance across dimensions with human-only data.

Having human-only data makes strategic month-by-month planning meaningful, for example through our Bowler report that tracks our clients’ Governance and Site Performance progress based on monthly targets.

Imagine how useless large spikes of Bot Traffic would make this critical report!

Your Analytics, Humanized through DX Spotlight

Clean analytics isn’t just about better numbers – it’s about understanding real customer behavior in a landscape where bots increasingly blur the line between automated and human traffic.

With DX Spotlight’s Traffic Classification module, brands finally see their digital presence through a clear lens: human traffic only.

Discover how much bot traffic is actually affecting your analytics and marketing decisions – schedule a demonstration today.

Similar Posts