# Kangaroo Bot Kangaroo Bot is a specialized web crawler from Kangaroo LLM, an Australian AI consortium. Its unique mission is to collect textual content exclusively from Australian websites to build a dataset for training Australia's first open-source large language model (LLM). The project emphasizes data sovereignty and aims to create an AI that understands the specific linguistic and cultural nuances of Australian English. Breadcrumb navigation - [Privacy-focused, simple website analytics](https://plainsignal.com/) - [Agents](https://plainsignal.com/agents "Agents, User-Agents, Crawlers, Browsers") - [Kangaroo Bot](https://plainsignal.com/agents/kangaroo-bot) ## What is Kangaroo Bot? Kangaroo Bot is a data scraping web crawler operated by the Australian AI consortium Kangaroo LLM. Its purpose is to systematically collect text content from Australian websites to build a dataset that captures the unique characteristics of Australian English. The bot identifies itself in server logs with the user-agent string `Kangaroo Bot` and primarily targets Australian domains or servers located in Australia. Unlike global AI scrapers, it uses geographic filtering to ensure the relevance of its data. It is designed to be a well-behaved bot, respecting `robots.txt` protocols and maintaining a reasonable request rate. ## Why is Kangaroo Bot crawling my site? Kangaroo Bot is visiting your website because it has been identified as a source of Australian content that could be valuable for training Australia's first open-source large language model. The bot prioritizes websites based on factors like lexical density, update frequency, and the presence of user-generated content, with a focus on sites like forums and news outlets. If your site is on an Australian domain (.au) or hosted on Australian servers, it is a prime target for this crawler. The frequency of visits depends on your site's content volume and update patterns. ## What is the purpose of Kangaroo Bot? The main purpose of Kangaroo Bot is to collect data for the 'VegeMighty' dataset, which will be used to train an Australian large language model. This initiative aims to create an AI that understands Australian language and culture, promoting data sovereignty by processing and storing all information within Australia. The project follows an 'opt-out-plus' model, allowing website owners to control how their content is used. The ultimate goal is to boost Australian AI innovation and ensure the country's digital future is shaped by a language model that accurately reflects its linguistic diversity. ## How do I block Kangaroo Bot? To prevent Kangaroo Bot from collecting your website's content for its AI training dataset, you can add a specific disallow rule to your `robots.txt` file. This is the standard method for managing access for well-behaved web crawlers. Add the following lines to your `robots.txt` file to block Kangaroo Bot: ``` User-agent: Kangaroo Bot Disallow: / ``` ## Related agents and operators ## Canonical Human friendly, reader version of this article is available at [Kangaroo Bot](https://plainsignal.com/agents/kangaroo-bot) ## Copyright (c) 2025 [PlainSignal](https://plainsignal.com/ "Privacy-focused, simple website analytics")