This is very good advice BUT, and this isn't a dig against OP, just a heads-up. A lot of AI companies will either straight up ignore robots.txt or will fake the user-agent of their crawling bot to bypass any blocking you might do on the server-side.
This isn't the only source but is one I could easily find about this specific issue:
I wish there was a silver bullet for that stuff but alas.
here's a repo that tracks a bunch of separate blocklists, some of which include bots and scrapers. https://github.com/firehol/blocklist-ipsets#list-of-ipsets-included
likewise, rate limiting and IP banning can be configured either with (usually a plugin for) your favorite http server, or using a tool like fail2ban which watches the logs of your http server and can ban clients who request too much too fast for variable amounts of time
(of course, all this only really applies if you're like self hosting or using a cloud vps, and requires a lot more technical effort)

