In-Short
- Baidu updates Baike service to block Google and Bing from scraping content.
- Change aims to protect data used for training AI, following industry trends.
- Chinese Wikipedia remains accessible, while Baidu Baike entries still appear in searches.
- AI industry’s demand for high-quality content leads to new data-sharing policies.
Summary of Baidu’s Recent Update
Baidu, the prominent Chinese internet search provider, has taken a significant step by updating its Baike service to prevent major search engines Google and Microsoft Bing from scraping its content. This move, detected through changes in the Baidu Baike robots.txt file, reflects a broader industry trend where companies are increasingly protective of their data, especially with the surge in demand for large datasets to train artificial intelligence models.
The update, which occurred on August 8 according to the Wayback Machine, restricts the previously allowed indexing of Baidu Baike’s extensive repository by Googlebot and Bingbot crawlers. Baidu Baike boasts nearly 30 million entries, and while some subdomains were already restricted, this marks a more comprehensive measure.
Similar actions have been observed from other platforms, such as Reddit, which also limited search engine access to its content, with the exception of Google due to a financial agreement for data access. Microsoft has also contemplated restricting access to its internet-search data for competitors, particularly those utilizing the data for chatbots and generative AI services.
Despite Baidu’s restrictions, the Chinese Wikipedia, with over 1.43 million entries, remains open to search engine crawlers. Current searches still show Baidu Baike entries, likely due to the use of cached content by search engines.
The industry is witnessing a shift towards strategic partnerships for content access, as seen with OpenAI’s agreements with Time magazine and the Financial Times. Baidu’s decision underscores the escalating value of curated datasets in AI development and the consequent changes in online content management policies.
As the AI sector evolves, it is anticipated that more companies will reevaluate their data-sharing strategies, potentially leading to further shifts in the indexing and accessibility of information on the internet.
Further Reading
For more detailed insights on Baidu’s decision and its implications for the AI industry, readers are encouraged to visit the original source.
Footnotes
Image credits and external sources referenced within the article are acknowledged where applicable.