Let's keep things short and sweet kids. If you run a website. Open your Robots.txt with your favorate text editor.

User-agent: GPTBot
Disallow: /

Source documentation for reference.

Save. Profit? Not sure at this point. But I can ramble some more about this topic.

Would you like to know more? seriously? Okay, it's your funeral.

Update 10/02/2023

Here comes the deluge of bots all hungry for data. They just keep multiplying like bunnies.

User-agent: GPTBot 
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Omgilibot
Disallow: /
User-agent: Facebookbot
Disallow: /

You don't seem all that convinced S.

Not really. OpenAI sort of ran around the internet kicking everyone in the teeth with ChatGPT and GPT-4 harvesting data with zero fucks given or even fact-checked. It's been the very thing that tech bro's have been clinging onto after saying 'blockchain' no longer gets investors to hand them millions. Now, OpenAI wants to get all top-hat and monocle about the internet making news how we as web admins are supposed to swoon over the fact that you NOW honor the "Robot.TXT" file in a stance of too little and too late. On top of this, there is absolutely no guarantee that the GPTBot will even follow those rules that a web admin throws down.

Does it mean we're not going to add it to our robots.txt ? We sure as fuck are! In fact, why make anything easier for a network that collects information and accepts it as truth and its own without attributing any credit? We shot down google AMP for far less bullshit than what OpenAI and their competitors are doing. However, the robots.txt thing is the equivalent of putting up a yard sign up saying

Do not insert into mouth of rectum!

I can't really stop someone if they tried and there's no real internet police to arrest someone for shoving that sign into the said mouth or rectum.

I don't fucking know anymore.DO NOT INSER-. I don't fucking know anymore.

This of course doesn't prevent other crawlers to come out of nowhere that refuse to read any robot.txt signs and just start scraping data as aggressively as possible. Or, competing companies to OpenAI from launching a similar crawler to scrape your data.

This might even make a case that perhaps we should bring Flash back from the grave. Get all homestar runner with one's website. That if you make a website that is so UI-centric that it makes it impossible for search engines to properly index. Then it will equally be the same for ChatGPT as well.

In fact. a more proper syntax to use in your robots.txt would be.

User-agent: *
Disallow: /

Yes, the great middle finger telling all search engines to fuck off. A lot of GPT-like programs will 'search' for your site's content first to determine the relevancy of the topic. It can't really do that if you do not exist on the "Googles" right? It cannot index as it has no reference of where to even start. But that doesn't really build readership now does it?

Why not allow GPTbot?

Depending on your content you may want to let the GBTbot into your site. After all, there's a certain level of humor in having a GPT end up hating itself if the datasets it gets are ones based on cynical articles about GPT as a whole.  If you got that level of data that you want GPT to feast into only to make some tech-bro sad. Then change the "Disallow:" to "Allow:" and you're golden.

Other thoughts.

We highly doubt any of these GPT engines will reference data from this website. If they did even we would question the integrity of a neural network that digs so deep that we don't even rank in the first 1 million websites (by design) people visit in order to find an answer; And more importantly, WHY?!? Nor are we concerned about AI crowding bloggers off of the internet. Because as we discovered when doing our Intel A380 graphics card review. It's easy to generate an article with text referencing the key points of a white paper. AI is great at that. But when it comes to legit doing the research such as video/audio to prove your work. AI cannot competently handle.

This means this minimal text-only content like this blog article could easily be crowded out by the billions of other articles out there. Another problem with AI is it takes the safest path toward writing something. It's sane, and it's boring as it does not call a single one of its readers a pigfucker before the DMT kicks in shutting us down and forgetting what we typed.

We'd like to think that AI is an interesting tool to screw around with. It's something that can help someone with an imagination block in their head think about things differently. But eventually, the hand of humanity has to be the guide. Which then it's no longer AI stories, AI art. It's cyborg art and stories.

Keep those implants coming kids.

Server protect you.


5 thoughts on “ChatGPT Web Crawler

  1. I like your content. Its weird, but at least not as polished as these Surfshark riddled Youtube videos. Love to explore the spooky skeleton part of the web, cool place. Keep up your work! And wow thanks a lot for the free music, its awesome!


