Activity Stream
48,167 MEMBERS
61034 ONLINE
besthostingforums On YouTube Subscribe to our Newsletter besthostingforums On Twitter besthostingforums On Facebook besthostingforums On facebook groups

Results 1 to 4 of 4
  1.     
    #1
    Banned
    Website's:
    tinydl.com freew.org dl4pda.com 0torrent.com 0sec.org 0sce.com

    Default Protect Your Site with a Blackhole for Bad Bots

    One of my favorite security measures here at Perishable Press is the site’s virtual Blackhole trap for bad bots. The concept is simple: include a hidden link to a robots.txt-forbidden directory somewhere on your pages. Bots that ignore or disobey your robots rules will crawl the link and fall into the trap, which then performs a WHOIS Lookup and records the event in the blackhole data file. Once added to the blacklist data file, bad bots immediately are denied access to your site. I call it the “one-strike” rule: bots have one chance to follow the robots.txt protocol, check the site’s robots.txt file, and obey its directives. Failure to comply results in immediate banishment. The best part is that the Blackhole only affects bad bots: normal users never see the hidden link, and good bots obey the robots rules in the first place.

    The Blackhole is built with PHP, and uses a bit of .htaccess to protect the blackhole directory. The blackhole script combines heavily modified versions of the Kloth.net script (for the bot trap) and the Network Query Tool (for the whois lookups). Refined over the years and completely revamped for this tutorial, the Blackhole consists of a single plug-&-play directory that contains the following four files:

    • .htaccess – basic directory protection
    • blackhole.dat – server-writable log file (serves as the blacklist)
    • blackhole.php – checks requests against blacklist and blocks bad bots
    • index.php – generates blackhole page, performs whois lookup, sends email, and logs data


    I set things up to make implementation as easy as possible. Here are the five basic steps:

    1. Upload the /blackhole/ directory to your site
    2. Ensure writable server permissions for the blackhole.dat file
    3. Add a single line to the top of your pages to include the blackhole.php file
    4. Add a hidden link to the /blackhole/ directory in the footer of your pages
    5. Prohibit crawling of the /blackhole/ by adding a line to your robots.txt file

    Implementation and Configuration

    Here are complete instructions for implementing and configuring the Perishable Press Blackhole:
    Step 1: Download the Blackhole zip file, unzip and upload to your site’s root directory. This location is not required, but it enables everything to work out of the box. To use a different location, edit the include path in Step 3.
    Step 2: Change file permissions for blackhole.dat to make it writable by the server. The permission settings may vary depending on server configuration. If you are unsure about this, ask your host. Note that the blackhole script needs to be able to read, write, and execute the blackhole.dat file.
    Step 3: Include the bot-check script by adding the following line to the top of your pages:
    <?php include($_SERVER['DOCUMENT_ROOT'] . "/blackhole/blackhole.php"); ?> The blackhole.php script checks the request IP against the blacklist data file. If a match is found, the request is blocked with a customizable message. See the source code for more information.
    Step 4: Include a hidden link to the /blackhole/ directory in the footer of your pages:
    <a style="display:none;" href="yourdomain/blackhole/" rel="nofollow">Do NOT follow this link or you will be banned from the site!</a> This is the hidden link that bad bots will follow. It’s currently hidden with CSS, so 99% of visitors won’t ever see it. To hide the link from users without CSS, replace the anchor text with a transparent 1-pixel GIF image.
    Step 5: Finally, add a Disallow directive to your site’s robots.txt file:
    User-agent: * Disallow: /*/blackhole/* This step is pretty important. Without the proper robots directives, all bots would fall into the Blackhole because they wouldn’t know any better. If a bot wants to crawl your site, it must obey the rules! The robots rule that we are using basically says, “All bots DO NOT visit the /blackhole/ directory or anything inside of it.” More on this in the next section..
    Further customization: The previous five steps will get the Blackhole working, but the index.php requires a few modifications. Open the index.php file and make the following changes:

    • Line #54: Edit the path to your site’s robots.txt file
    • Line #56: Edit the path to your contact page (or email address)
    • Lines #140/141: Edit email address with your own
    • And in blackhole.php, edit line #53 with your contact info

    These are the recommended changes, but the PHP is clean and generates valid HTML5, so feel free to modify the source code as needed. Note that beyond these three items, no other edits need made.










    Whitelisting Search Bots

    Initially, the Blackhole blocked any bot that disobeyed the robots.txt directives. Unfortunately, as discussed in the comments, Googlebot, Yahoo, and other major search bots do not always obey robots rules. And while blocking Yahoo! Slurp is debatable, blocking Google, MSN/Bing, et al would just be dumb. Thus, the Blackhole now “whitelists” any user agent identifying as any of the following:

    • googlebot (Google)
    • msnbot (MSN/Bing)
    • yandex (Yandex)
    • teoma (Ask)
    • slurp (Yahoo)

    Whitelisting these user agents ensures that anything claiming to be a major search engine is allowed open access. The downside is that user-agent strings are easily spoofed, so a bad bot could crawl along and say, “hey look, I’m teh Googlebot!” and the whitelist would grant access. It is possible to verify the true identity of each bot, but as X3M explains in the comments, doing so consumes significant resources and could overload the server. Avoiding that scenario, the Blackhole errs on the side of caution: it’s better to allow a few spoofs than to block any of the major search engines.


    Code: 
    http://www.filejungle.com/f/KF9gyn/Blackhole-v01.2.zip
    http://www.fileserve.com/file/a9vFqB4/Blackhole-v01.2.zip
    http://www.filesonic.com/file/4295785584/Blackhole-v01.2.zip
    http://www.wupload.com/file/2620056797/Blackhole-v01.2.zip
    shahpar Reviewed by shahpar on . Protect Your Site with a Blackhole for Bad Bots One of my favorite security measures here at Perishable Press is the site’s virtual Blackhole trap for bad bots. The concept is simple: include a hidden link to a robots.txt-forbidden directory somewhere on your pages. Bots that ignore or disobey your robots rules will crawl the link and fall into the trap, which then performs a WHOIS Lookup and records the event in the blackhole data file. Once added to the blacklist data file, bad bots immediately are denied access to your site. I call Rating: 5

  2.   Sponsored Links

  3.     
    #2
    Member
    This is the staff, you have been banned

  4.     
    #3
    Banned
    Website's:
    tinydl.com freew.org dl4pda.com 0torrent.com 0sec.org 0sce.com
    ys

  5.     
    #4
    Member

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Replies: 5
    Last Post: 3rd Jun 2012, 11:20 AM
  2. how to protect my site from hacker
    By Mr.leg in forum Web Development Area
    Replies: 23
    Last Post: 6th Feb 2012, 12:49 PM
  3. About Bots Registering On Site
    By _Ankur_ in forum Webmaster Discussion
    Replies: 6
    Last Post: 3rd Dec 2011, 06:52 PM
  4. Testing Site : Prevent Bots ( google etc...) from accessing
    By viruz99 in forum General Discussion
    Replies: 5
    Last Post: 7th Oct 2011, 11:41 AM
  5. Replies: 58
    Last Post: 19th Jun 2011, 03:25 PM

Tags for this Thread

BE SOCIAL