I installed a translator plugin on one of my WordPress blogs but the plugin wasn’t working properly so I disabled it but two days later I found out that my Google web master tools account was reporting about 1100 ‘Not Found’ errors under the ‘Web crawl errors’ section. All the errors were from translated versions of my blog. I used the ‘robots.txt’ file to fix this issue.
If you don’t know what a ‘robots.txt’ file is, then read the article titled how to control access of the web crawlers or web robots to your site.
Basically, add rules to your ‘robots.txt’ file to Disallow any spider from indexing the translated version of the pages. My ‘robots.txt’ file looks like the following Depending on your situation you might need to block more languages. Just look in the Google webmaster tools and see which languages are causing the error then add them to the Disallow rule.
User-Agent: *
# Language pages
Disallow: /ar/*
Disallow: /bg/*
Disallow: /zh-hant/*
Disallow: /ca/*
Disallow: /cs/*
Disallow: /da/*
Disallow: /de/*
Disallow: /el/*
Disallow: /es/*
Disallow: /fi/*
Disallow: /fr/*
Disallow: /he/*
Disallow: /hi/*
Disallow: /hr/*
Disallow: /id/*
Disallow: /it/*
Disallow: /iw/*
Disallow: /ja/*
Disallow: /ko/*
Disallow: /lt/*
Disallow: /lv/*
Disallow: /mr/*
Disallow: /nl/*
Disallow: /no/*
Disallow: /pl/*
Disallow: /pt-br/*
Disallow: /pt/*
Disallow: /ro/*
Disallow: /ru/*
Disallow: /sk/*
Disallow: /sl/*
Disallow: /sr/*
Disallow: /sv/*
Disallow: /tl/*
Disallow: /tr/*
Disallow: /uk/*
Disallow: /vi/*
Disallow: /zh-CN/*
Allow: /
As far as I know, Google penalizes for duplicate content. Translated version of your page is considered duplicate content so for SEO benefit it is best to use this method to block access to the translated version of a web page.
It took about two weeks for all the errors to go away from my Google webmaster tools account but the number of errors started to go down as soon as I updated my robots.txt file to block the spiders from crawling all the translated version of the site. Hope this helps.
Thanks man i am having problem with format of putting url in robot.txt you solved my problem…
Thank you for clearing this up for me… i updated my site and a lot of errors came up and had no idea how to remove them. Thank you!
thanks a lot… you save me from this annoying not found errors..
If you use the Google webmaster tool it will only affect Google. The robot.txt is helpful for all search engine bots.
Hi, thanks for this great tutorial. Is there any difference if I use robot.txt file for removing pages from index instead of using webmaster tools?
thanks for your post its helpful…..
Thank you for sharing. That’s good for me.
i face this problem & this tips is so helpful for me.
thanks a lot.
Thanks for your sharing the information!
Good share, thanks a lot
Thanks for your sharing.They are very useful.
@Watson, I have cleaned up the code a bit in this post. asterisk (*) is a wild card witch mean everything under that directory. So you are telling the bots that ignore everything under that language directory.
Yes, it is important to have the following two lines in the robots.txt file as the first line says who this rule applies to (again asterisk mean everyone) and the 2nd line is saying allow access to everything else under the root (it won’t access the stuff that you specified in the disallow command)
User-Agent: *
Allow: /
Is it crucial to have that code above and below the Disallows, and is it important to have the asterisk ” * ” after each Disallow?
Just want to double check to make sure I don’t mess it up.
Terrific. Thanks for the help!!!!!
A million thanks.
@Watson, You don’t need to delete the robots.txt file as it is a standard file for controlling access of the web crawlers. You can read more about this here:
https://www.tipsandtricks-hq.com/how-to-control-access-of-the-web-crawlers-or-web-robots-to-your-site-166
No, the “robots.txt” file will not affect your human visitors to the site.
I have had the same problem. Since the plug in was active I have thousands of errors.
Thanks for laying out the details about the robots.txt file. Hope it helps.
Questions:
1) I assume the robots.txt file will fix the current errors (along with deleting the WP-google translator app). How long should the robots.txt file exist before it is deleted?
2) Will the robots.txt file effect visitors from using their own translator tools to read content?
Thanks,
Jay
Thanks for sharing,
I was looking for these tips.
Cheers,
Togrul
Thanks for your sharing the information. They are very useful.
Thanks for these tips. I’ve always had trouble remembering to play with my robots.txt file when I make new sites, but fortunately this post helped remind me
souds good ,thanks a lot
I had no idea that the duplicate content penalty was affected by different language versions of a site. Seems unfair, especially as having multiple language versions shows that you are actually presenting a more robust site.
Happy to see your blog as it is just what I’ve looking for and excited to read all the posts. I am looking forward to another great article from you. After skimming through your website.
Thanks for sharing this informative article.My website recently ran into this problem. Your info is very helpful.
Bookmarked and I’ll back to see you updates. Thanks again.
I really apprieciate this article for it will do good to me in the future
quite useful info, thanks. I will try it.
It helps a lot.
Many thanks,
robots.txt is very useful!
Happy to see your blog as it is just what I’ve looking for and excited to read all the posts. I am looking forward to another great article from you. After skimming through your website.
Well, personally, I use the plugin called all in one seo. It is very effective and I recommend everyone to use it.
Great blog,thank you for your sharing!!!
Thank God i am able to find this post. I have been searching for months on how to fix not found errors on my blog and this post has come to my aid.
that good article,thanks! i will back soon
Thanks for share this tip for .htta
Thank you for the reminder, It is certainly a lot useful.I’ll record it
Thanks a lot for sharing the tip. It is certainly a lot useful. Most of the times I used to fix it manually but now I will use this tip
very great idea thanks to share ,Very useful icon sets, tweeted and saved on Delicious.
Thank you for sharing these tips.
You are a geniu for this idea .
very great idea thanks to share ,Very useful icon sets, tweeted and saved on Delicious.
Actually, someone will use other language article by google translation tool, if the content is not in one website, it’s hard for google to find it.
Thank you for sharing such a good passage
we i have read all the articles. Very useful information was written. Thanks ….
If you have translated content then it gets treated as duplicate content as it is the same content in different language. It’s best to minimize duplicate content.
is it true that google hate translated content? because I use auto translate plugin on all of my blogs
I didn’t know they don’t like sites that have the same content that’s translate. I’ll have to use that robot file.
I have read all the articles. Very useful information was written. Thanks
Thanks for the great article it is really useful you should also deny access to all inside folders
Thanks a lot for sharing the tip. It is certainly a lot useful. Most of the times I used to fix it manually but now I will use this tip
Hmmm it sounds little bit complicated. Sometimes I found some error reports from google webmaster tools but not too much, so I usually fix it manually and doesn’t take too much time. But thanks for this information 🙂
Thank you for your great sharing, to me this is an awesome information to enhance my blog.
[Delighting LIfestyle] Best Buy And Idea | Blog And Store.
Follow me at Twitter.
Thanks for the tips, it is very useful