Java library for reading and querying robots.txt files.
- Parse
robots.txt
:
RobotsTxt robotsTxt = RobotsTxtReader.read(inputStream);
- Query
robotsTxt
:
Grant grant = robotsTxt.query("GoogleBot", "/path");
boolean canAccess = grant.getAllowed();
if (grant instanceof MatchedGrant) {
Duration crawlDelay = ((MatchedGrant) grant).getMatchedRuleGroup().getCrawlDelay();
}
Add the JitPack repository into your pom.xml
.
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
Add the following under your <dependencies>
:
<dependencies>
<dependency>
<groupId>com.github.alturkovic</groupId>
<artifactId>robots-txt</artifactId>
<version>[insert latest version here]</version>
</dependency>
</dependencies>