Apply Now


    New Open Source Robots.txt projects

    • September 23, 2020
    • Nirlep Patel
    • 3 min read

    Google created a milestone last year by making the robots.txt parser an open-source. Being a mere de- facto standard for nearly 25 years, the Robot Exclusion Protocol is now an actual internet standard and used officially by Google for crawling web pages. What does this mean? It means that Google has open-sourced the C++ Library that it has been using since the past two decades for parsing and matching the rules in robots.txt files. These new changes concerning robots.txt files have made it easier to work with. Such is the importance of the robots.txt structure that even this year Google announced a new development concerning it.

    Google announced the launch of two new open-source robots.txt projects:

    1. Robots.txt Specification Text
    2. Java robots.txt parser and matcher

    Both these developments were developed by two Google Interns Andrea Dutulescu who created the Robots.txt Specification test and Ian Dolzhanskii who created the java robots.txt parser and matcher.

    Now what exactly are these two new open-source projects and how are they useful for Google? Before knowing this, I would first like to state what is a robots.txt file and why is it used.

    What is a robots.txt file?

     The robots.txt file is a part of The Robots Exclusion Standard or Robots Exclusion Protocol is a standard form of communication that is used by websites to communicate with web crawlers or web robots. Simply stated, the robots.txt file tells Googlebot, Google’s web crawler which pages or files to crawl and which pages to keep on Google. It helps in managing the crawler traffic of your website.

    Now that you know what a robots.txt file does, let us look at the latest developments related to this file.

    A.Robots.txt Specification Text

    Developed by Google Intern Andreea Dutulescu, this specification tool is a testing framework for robot.txt parser developers. This specification test is used to test whether or not a robots.txt file follows the Robot Exclusion Protocol and if it does then till what extent. This test created by Andreea is useful as currently there is no valid or thorough test for assessing the validity of a parser. With the help of this tool can be used for creating robots.txt parsers which follow the protocol thoroughly.

    1. Java Robots.txt parser and matcher

    Developed by Ian Dolzhanskii, Google has made this recently developed tool its official Java port of the C++ Robots.txt parser. Java is the thor most popular programming language to learn in 2020 and is the most extensively used programming language for Google. This parser helps in translating the behaviour and functions of the C++ parser and has been thoroughly tested. Google is planning on using the Java Robots.txt parser in their production systems.

    Google aims at simplifying a web developers job. Last year it open-sourced the C++ library used by production systems for parsing and matching rules in robots.txt files and included a testing tool with the package.

    This year with the introduction of two new open-source robots.txt projects, Google plans to make the testing validity for the said files easier that in turn will create a better web crawling system experience.

    Nirlep Patel
    I am an internet entrepreneur and also the Co-Founder of GBIM Technologies, India’s fastest growing internet marketing company. My forte lies in actively lending technical expertise in Search Engine Optimisation, SEM Google AdWords, SMM. Spanning about 14 years of focus on Digital Marketing, GBIM today has become one of the greatest digital marketing company. This could have been possible only because of the trust which our clients have on us, and the quality services we have delivered to them I have always believed that more is lost from indecision than wrong decisions and as a result, we have managed to create an environment where people are encouraged to challenge process and innovate. This culture has cultivated a highly motivated team with an open approach where anyone can share their thoughts and ideas freely.
    recent posts
    Do’s And Don’ts Of Content Marketing
    • September 23, 2020
    • Nirlep Patel