Marketing Pilgrim's "Analytics Channel"

Sponsor Marketing Pilgrim's Analytics Channel today! Get in front of some of the most influential readers in the Internet and social media marketing industry. Contact us today!

Tips for blocking SEMalt and botnet attacks



Bot attacksSUMMARY: Botnets are degrading Website performance and preventing real people from visiting your Websites.  Here is a brief update on the problem and some measures you can take to block one specific botnet operated by SEMalt.

You may know by now that the SEMalt “service” has been probing Websites with fake traffic generated by botnets.  The problem with fighting botnets is that they come from everywhere.  Many of the compromised nodes are Web hosting servers hiding in the cloud or in far eastern and European data centers.  We see them coming out of Africa, South America, and even New Zealand.  US military networks are compromised and rolled into the swarm every few months.

semalt-referral

There is no simple defense against botnets, not even hiding behind a Content Delivery Network (CDN) because the botnets are eating up resources.  They whittle away at the available connections on your server(s), they waste CPU cycles, they eat up disk space with thousands of entries in security and Web traffic logs (on a daily basis), and they sometimes beat away at your user accounts and try every password they can generate.

Botnets and cloud services also power the link research tools that you diligent search marketers love so much.  These tools expose your Websites to link placement banditos who send you requests for guest posting, dead link replacement, and other bad practices.  But the SEO tools are nothing compared to what SEMalt is allegedly doing.  According to the latest research, if you download their tools and use them you’ll be installing malware on your computer.

You have few options for blocking botnets.  You can add IP addresses to a firewall but the list adds up quickly.  A few months back I came across a forum discussion where one Webmaster asked other admins if his 25,000 blocked IP addresses were too many (the consensus in replies was that if the server wasn’t choking then it wasn’t – your mileage may vary).

firewall-example

I don’t like blocking individual IP addresses because I have tracked some bots (not botnets, but individual loan agents) cycling through pools of IP addresses but coming from the same server.  Now that all the root IPv4 address blocks have been assigned scarcity is driving up the cost of acquiring extra IP addresses; still, some bot-operators pay for up to 10, 20, 30 IP addresses so that they can pretend to be multiple visitors coming to your site from their servers.

Normal Web traffic rarely comes in from another Web server unless that server is operating a proxy service.  But most Web hosting companies forbid their clients to run proxy services and the most well-known proxy services are being blocked by a growing number of Websites.  The botnets offer the most reliable pools of proxy IP addresses because they are endlessly creeping across the Internet.

WordPress administrators see botnets constantly attempting to leave comments, register accounts, crack passwords, and compromise “xmlrpc.php”.  Botnets will probe your site for vulnerabilities today and return two weeks later with a Brute Force Dictionary Attack (BFDA) that prevents people from seeing your Website.  Botnets significantly degrade the performance of low-cost shared hosting accounts.  Just by blocking compromised IP address ranges in .htaccess I have been able to speed up performance on hundreds of Websites significantly.

And botnets aren’t about to run out of IP addresses.  An Amazon employee once told me they have 4,000,000 IP addresses.   Blocking that many addresses in your firewall or .htaccess file simply isn’t practical unless you block the CIDR (also called ASN) records.  These are the records that describe the RANGE of IP addresses that any particular address is part of (e.g., 185.38.248.0/22).  ISPs buy/lease these ranges of IP addresses and dole them out to their customers.  Blocking by CIDR rather than by individual IP address is more efficient and takes up less space on your server.

example-deny-list

Unfortunately, some ISPs assign IP addresses to Web hosting and access customers from the same ranges.  When you block botnets there is a chance you are also blocking real visitors, but the rule of thumb is that if a botnet comes in from a Web server that whole CIDR block is probably just being assigned to Web servers.

Botnets are also now using wireless phone networks.  North American businesses may be able to safely block whole Asian mobile networks.  But what should people who serve the Asian Internet markets do when the botnets hammer their sites from Vietnam, China, Malaysia, Taiwan, Pakistan, and Bangladesh?  You think mobile is the future for North America and Western Europe?  I have news for you: the mobile Internet saturated Asia and Africa more than a decade ago.

The SEMalt botnet drives traffic toward a handful of Websites through hundreds, probably thousands of compromised computers.  Blocking those computers’ CIDR ranges is only practical if you don’t want to do business with people around the world.  It is better to block the fake traffic by referrer user-agent although you cannot do that in a firewall (IP tables on Linux systems); you have to do it in your .htaccess or IIS configuration file.

Furthermore, public attempts to document the SEMalt problem to date have only served to confuse people because the shared notes available on the Web do not distinguish between which domains are used for fake referral traffic and which domains are being used to supply malware to the general (unsuspecting) public.  We are currently blocking four domains in .htaccess but at any time SEMalt can add another domain to their fake traffic referral scheme and we’ll have to go back in and update the .htaccess files.

We’re prepared to do that but unfortunately being prepared doesn’t quite cut the mustard.  Fighting botnets is an ongoing site administration task.  It never ends.  And the most frightening prospect for the future is that eventually the botnets will use only IPv6 addresses.  Most anti-spam/botnet tools cannot yet handle IPv6 addressing.

Apache 2.x installations allow you to use CIDR notation for IPv6 addresses in .htaccess files.  Most other software that I have reviewed to date makes no provision for managing remote connections via IPv6.

TIPS FOR BLOCKING BOTNETS

If you are tired of all the rogue traffic coming out of the Amazon AWS network you can stop it all with the following code in .htaccess:

Deny from amazonaws.com

However, a lot of the tools we use to send our RSS feeds to social media services run on the Amazon AWS network.  If you can identify their IP addresses you can “allow” them before the “deny”; otherwise you’ll have to block by CIDR.  Amazon, unfortunately, is just too easily exploited by rogue crawlers (some of which don’t pay for the service).

If you’re struggling to block the SEMALT network, we have tested several methods in .htaccess files on more than 100 Websites.  We have had mixed results, probably because of different server configurations.  The following code is our latest universal block:

# Block fake traffic

RewriteEngine on
Options +FollowSymlinks
# Block all http and https referrals from "savetubevideo.com" and all subdomains of "savetubevideo.com"
RewriteCond %{HTTP_REFERER} ^https?://([^.]+\.)*savetubevideo\.com\ [NC,OR]
# Block all http and https referrals from "srecorder.com" and all subdomains of "srecorder.com"
RewriteCond %{HTTP_REFERER} ^https?://([^.]+\.)*srecorder\.com\ [NC,OR]
# Block all http and https referrals from semalt.com" and all subdomains of "semalt.com"
RewriteCond %{HTTP_REFERER} ^https?://([^.]+\.)*semalt\.com\ [NC,OR]
# Block all http and https referrals from "kambasoft.com" and all subdomains of "kambasoft.com"
RewriteCond %{HTTP_REFERER} ^https?://([^.]+\.)*kambasoft\.com\ [NC]
RewriteRule .* - [F]

We have been able to block most but not 100% of the SEMalt traffic using this and a few other variations.  I still see a few instances getting through in referral data.  We’re defending over 100 sites across 6 Web hosting providers and using 3-4 analytics packages, so it’s really hard to ensure that anything works everywhere, all the time.  I was hoping to have this pinned down by today but I see at least one straggling referral on one site today.  I have to wait 24 hours to test any specific fix.

The objective is to block ANY occurrence of <any subdomain>.<any subdomain>.<domain name> as they have been rotating through new subdomains to get past the htaccess rules.

If you use low-budget Web hosting and you cannot browse your server logs, but must instead download them in .GZ format, you can download a proper tool for free at GZIP.ORG.  When you use the GZIP program in Windows the syntax is: gzip –d filename.  You can read the uncompressed file in Wordpad.

Some Web hosting companies won’t allow you to edit your .htaccess file.  It’s there; you just don’t have access to it.  You can install WordPress plugins to allow you to do that but if you make a mistake your entire site goes offline immediately and then you have to reinstall WordPress from scratch.  Make a full backup of your UPLOADS directory and SQL database before editing .htaccess through a plugin.

Some people are using plugins to maintain firewalls on their blogs.  These plugins may or may not be able to block SEMalt and similar rogue services.  You may be able to maintain the block list yourself.

Search for “CIDR record lookup” or “ASN record lookup” to find a tool where you type in a full IP address and see the CIDR range to which it is assigned.

UPDATE: Michael sent us new code to use to block SEMalt:

# Block SEMalt botnet
SetEnvIfNoCase Referer fbdownloader.com spammer=yes
SetEnvIfNoCase Referer descargar-musicas-gratis.com spammer=yes
SetEnvIfNoCase Referer baixar-musicas-gratis.com spammer=yes
SetEnvIfNoCase Referer savetubevideo.com spammer=yes
SetEnvIfNoCase Referer srecorder.com spammer=yes
SetEnvIfNoCase Referer kambasoft.com spammer=yes
SetEnvIfNoCase Referer semalt.com spammer=yes

Order allow,deny
Allow from all
Deny from env=spammer

About Michael Martinez

Michael Martinez has been developing and promoting Websites since 1996 and began practicing search engine optimization in 1998.  He is the principal author of the SEO Theory blog. 

  • http://www.brickmarketing.com/ Nick Stamoulis

    I’ve seen that SEMalt bot show up in every single one of my clients’ reports for the last few months…and not in small numbers! It stinks because it skews the data.

  • http://infolific.com/technology/ Marios Alexandrou

    I’m currently trying out blocking Amazon’s published list of cloud IPs. The reasoning is that bots are likely taking advantage of what’s arguably the biggest and most cost-effective resource. However, I know that there are also legitimate users such as social media monitoring tools. So do I want to exclude my site from those? Still thinking about that…

  • Emil

    Thanks for sharing, just what I was looking for.