Playing in Googlebot's Sandbox with Slurp, Teoma
?>
Download eBooks and Software
Satellitedirect - Highest Converting Tv To PC Product Don't Have A Website? Create A Money Making Website In Just Minutes For Free With Our Brand New Tool: Http://professionalcash.com/webtool
The World's #1 Lottery System For Lotto. Ken Silver's Multi-million Silver Lotto System! 1 Minute Setup, Winning 9 Out Of Every 10 Games Since 1991. Affiliate Commission Now 75%! Earn Up To $243/sale. Low Refunds. Visit: www.lottoaffiliatecenter.com
Forex Trading Software Plug-in Averages 500 Pips A Month Forex Plug-in Software Averaging 500 Pips A Month Since Oct. 2010. Ideal For Newbies, Simple Instructions W/6 Training Modules. Killer Conversion, Super Low Returns, Total Impulse Buy. Affiliates, Visit: Http://netpicks.com/odstaffiliate For More Info!
Ipad Video Lessons - Big Seller! 38 Million Ipads Sold... With No Ipad Instructions! That's Where You & I Come In. Works With Ipad & Ipad 2. Get Your Ipad Video Lessons Affiliate Tools... Http://ipadvideolessons.com/affiliate
Paleo Recipe Book - Brand New Paleo Cookbook Brand New Paleo Diet Cookbook With Over 370 Recipes. Pays 70% Commission On This High-quality, Easy To Sell Product. Get Banners And Promotional Material At Http://paleorecipebook.com/affiliates.html
Articles Index >> Search Engine Optimization
Playing in Googlebot's Sandbox with Slurp, Teoma & MSNbot
by Mike Banks Valentine
There has been endless webmaster speculation and worry about the so-called "Google Sandbox" - the indexing time delay for new domain names - rumored to last for at least 45 days from the date of first "discovery" by Googlebot. This recognized listing delay came to be called the "Google Sandbox effect." Ruminations on the algorithmic elements of this sandbox time delay have ranged widely since the indexing delay was first noticed in spring of 2004. Some believe it to be an issue of one single element of good search engine optimization such as linking campaigns. Link building has been the focus of most discussion, but others have focused on the possibility of size of a new site or internal linking structure or just specific time delays as most relevant algorithmic elements. Rather than contribute to this speculation and further muddy the Sandbox, we'll be looking at a case study of a site on a new domain name, established May 11, 2005 and the specific site structure, submissions activity, external and internal linking. We'll see how this plays out in search engine spider activity vs. indexing dates at the top four search engines. Ready? We'll give dates and crawler action in daily lists and see how this all plays out on this single new site over time. * May 11, 2005 Basic text on large site posted on newly purchased domain name and going live by days end. Search friendly structure implemented with text linking making full discovery of all content possible by robots. Home page updated with 10 new text content pages added daily. Submitted site at Google's "Add URL" submission page. * May 12 - 14 - No visits by Slurp, MSNbot, Teoma or Google. (Slurp is Yahoo's spider and Teoma is from Ask Jeeves) Posted link on WebSite101 to new domain at Publish101.com * May 15 - Googlebot arrives and eagerly crawls 245 pages on new domain after looking for, but not finding the robots.txt file. Oooops! Gotta add that robots.txt file! * May 16 - Googlebot returns for 5 more pages and stops. Slurp greedily gobbles 1480 pages and 1892 bad links! Those bad links were caused by our email masking meant to keep out bad bots. How ironic slurp likes these. * May 17 - Slurp finds 1409 more masking links & only 209 new content pages. MSNbot visits for the first time and asks for robots.txt 75 times during the day, but leaves when it finds that file missing! Finally get around to add robots.txt by days end & stop slurp crawling email masking links and let MSNbot know it's safe to come in! * May 23 - Teoma spider shows up for the first time and crawls 93 pages. Site gets slammed by BecomeBot, a spider that hits a page every 5 to 7 seconds and strains our resources with 2409 rapid fire requests for pages. Added BecomeBot to robots.txt exclusion list to keep 'em out. * May 24 - MSNbot has stopped showing up for a week since finding the robots.txt file missing. Slurp is showing up every few hours looking at robots.txt and leaving again without crawling anything now that it is excluded from the email masking links. BecomeBot appears to be honoring the robots.txt exclusion but asks for that file 109 times during the day. Teoma crawls 139 more pages. * May 25 - We realize that we need to re-allocate server resources and database design and this requires changes to URL's, which means all previously crawled pages are now bad links! Implement subdomains and wonder what now? Slurp shows up and finds thousands of new email masking links as the robots.txt was not moved to new directory structures. Spiders are getting errors pages upon new visits. Scampering to put out fires after wide-ranging changes to site, we miss this for a week. Spider action is spotty for 10 days until we fix robots.txt * June 4 - Teoma returns and crawls 590 pages! No others. * June 5 - Teoma returns and crawls 1902 pages! No others. * June 6 - Teoma returns and crawls 290 pages. No others. * June 7 - Teoma returns and crawls 471 pages. No others. * June 8-14 Odd spider behavior, looking at robots.txt only. * June 15 - Slurp gets thirsty, gulps 1396 pages! No others. * June 16 - Slurp still thirsty, gulps 1379 pages! No others. So we'll take a break here at the 5 weeks point and take note of the very different behavior of the top crawlers. Googlebot visits once and looks at a substantial number of pages but doesn't return for over a month. Slurp finds bad links and seems addicted to them as it stops crawling good pages until it is told to lay off the bad liquor, er that is links by getting robots.txt to slap slurp to its senses. MSNbot visits looking for that robots.txt and won't crawl any pages until told what NOT to do by the robots.txt file. Teoma just crawls like crazy, takes breaks, then comes back for more. This behavior may imitate the differing personalities of the software engineers who designed them. Teoma is tenacious and hard working. MSNbot is timid and needs instruction and some reassurance it is doing the right thing, picks up pages slowly and carefully. Slurp has addictive personality and performs erratically on a random schedule. Googlebot takes a good long look and leaves. Who knows whether it will be back and when. Now let's look at indexing by each engine. As of this writing on July 7, each engine also shows differing indexing behavior as well. Google shows no pages indexed although it crawled 250 pages nearly two months ago. Yahoo has three pages indexed in a clear aging routine that doesn't list any of the nearly 8,000 pages it has crawled to date (not all itemized above.) MSN has 187 pages indexed while crawling fewer pages than any of the others. Ask Jeeves has crawled more pages to date than any search engine, yet has not indexed a single page. Each of the engines will show the number of pages indexed if you use the query operator "site:publish101.com" without the quotes. MSN 187 pages, Ask none, Yahoo 3 pages, Google none.
The daily activity not listed in the three weeks since June 16 above has not varied dramatically, with Teoma crawling a bit more than other engines, Slurp erratically up and down and MSN slowly gathering 30 to 50 pages daily. Google is absent.Linking campaign has been minimal with posts to discussion lists, a couple of articles and some blog activity. Looking back over this time it is apparent that a listing delay is actually quite sensible from the view of the search engines. Our site restructuring and bobbled robots.txt implementation seems to have abruptly stalled crawling but the indexing behavior of each engine displays distinctly differing policy by each major player. The sandbox is apparently not just Google's playground, but it is certainly tiresome after nearly two months. I think I'd like to leave for home, have some lunch and take a nap now. Back to class before we leave for the day kiddies. What did we learn today? Watch early crawler activity and be certain to implement robots.txt early and adjust often for bad bots. Oh yes, and the sandbox belongs to all search engines.
About the Author
Mike Banks Valentine is a search engine optimization specialist who operates http://WebSite101.com and will continue reports of case study chronicling search indexing of http://Publish101.com
?>
News on Internet Marketing
Cyberset Distinguishes Itself from Other Internet Marketing Companies with Its Dedication to Personalized Service LOS ANGELES, Feb. 3, 2012 /PRNewswire/ -- As the private sector aims to further exploit the many benefits of e-commerce, more businesses are seeking high-quality Internet marketing services. For comprehensive ... Internet Marketing Company Reveals Results of the Poll, "Which Link Building Strategy Works Best For You?" Internet Marketing Company (http://www.internetmarketingcompany.biz) Reveals Results of the Poll, "Which Link Building Strategy Works Best For You?"Los Angeles, CA (PRWEB) February 04, 2012 Internet Marketing Company, a search engine marketing company, announced today the results of the poll , "Which Link Building Strategy Works Best For You?" on the Facebook® platform. ... Internet Marketing Agency Command Partners Expands Opens New Office, Announces New Services And Website (PRWeb February 09, 2012) Read the full story at http://www.prweb.com/releases/interactive-agency/command-partners-expands/prweb9180744.htm Internet Marketing Expert Cheryl Heppard Teaches Strategies to Attract Clients and Increase Profits Host of Get Busy With Clients to lead monthly group coaching call for members on how to implement a strategic marketing plan that attracts clients and increases revenue.West Bloomfield, MI (PRWEB) February 08, 2012 Cheryl Heppard, business and marketing coach, will be leading a group coaching call for holistic practitioners, healing professionals, coaches and solopreneurs for members of her Get ... People Polled on "What is the Best Link Building Strategy to Get a High Page Rank Website?" Internet Marketing Company ( http://www.internetmarketingcompany.biz ) Reveals Results of the Poll, "In Your Opinion, What is the Best Link Building Strategy to Get a High Page Rank Website?" on the Facebook® Platform (PRWeb February 07, 2012) Read the full story at http://www.prweb.com/releases/seoservices/internetmarketingconsult/prweb9175836.htm An Internet Marketing Company With a Big Heart Web Marketing Pros is Donating Internet Marketing Services to Outreach International(PRWEB) February 02, 2012 Jacksonville Beach, Florida based Web Marketing Pros is donating Internet marketing services to Outreach International, an organization which provides long-term investments to generate growth in marginalized regions and countries.“We will be giving Outreach International free Internet ... Audience Management and Internet Marketing From March 22 to 23, 2012, the annual eTarget conference is taking place in Moscow. It is the premier event in Russia where businesses interested in online marketing and brand management can meet brand marketers and proven practitioners, learn and get inspired from detailed case studies. Paton Internet Marketing Recognized by Constant Contact’s Corporate Office As a Pillar Trustee and a proven leader in the community, Todd Paton and Constant Contact have brought highly successful outcomes to Paton Marketing Clients. Paton Marketing was recently recognized by Constant Contact as an exemplary business partner. Paton Internet Marketing is a progressive leader in the field of Search Engine Optimization, Website Development, and [...] Local Lighthouse Changes How Internet Marketing and SEO Companies Do Business With the Introduction of New Online Portal The Orange County based Internet Marketing and SEO Company, Local Lighthouse, has announced the arrival of its new online portal, PARC, making it easier for clients to assess the performance of their site. Through PARC, Local Lighthouse hopes to make search engine marketing more accessible and easier to understand for small businesses across the country.Tustin, CA (PRWEB) February 02, 2012 Local ... Blizzard Reservation Engine Expands Reach With NEW Escapia API Blizzard Internet Marketing, Inc., developers of the Blizzard Reservation Engine (BRE), are proud to unveil a newly developed website for The Vacation Company of Hilton Head Island. The website features a booking engine with real-time integration with Escapia, a popular property management software used by vacation rental managers.Glenwood Springs, Colorado (PRWEB) February 07, 2012 The Colorado ...
|