Wikipedia offers its data to AI to avoid being siphoned off by robots

Wikimedia, the nonprofit foundation that hosts and supports Wikipedia, is struggling with data-harvesting bots from AI companies. These bots are very data-hungry and put a strain on the organization's infrastructure. In fact, since the beginning of the year, activity of these robots has increased the bandwidth used for downloading multimedia content by 50%. Specifically designed for machine learning applications, it facilitates access to already processed articles that can be immediately used for tasks such as modeling, fine-tuning, alignment, and analysis.

Technically, the database uses the Snapshot Structured Contents API, which provides data in a machine-readable JSON format. This allows developers and researchers to work directly with well-segmented articles, containing summaries, short descriptions, structured data such as infoboxes, links to images, as well as clearly defined article sections (excluding references or non-text elements).

This data is published under open licenses, some cases in the public domain or alternative licenses. It is hosted by Kaggle, the reference platform owned by Google for the machine learning community. Wikimedia already had a partnership with Google for the sharing of its content. This new initiative is therefore a logical continuation of this.

Source: Wikimedia

Ticker

Wikipedia offers its data to AI to avoid being siphoned off by robots

Post a Comment

0 Comments

Most Popular

This trick lets you read deleted messages on WhatsApp

Farewell XPS and Inspiron: Dell Sléche on the side of Apple to rename its computers

Meta Quest 3S: Price drop for this virtual reality headset

Google Pixel 9a review: the new mid-range boss is here

Openai crosses 400 million weekly users and aims for record valuation

Tags

Followers

Footer Menu Widget

Contact form

Ticker

Wikipedia offers its data to AI to avoid being siphoned off by robots

You may like these posts

Post a Comment

0 Comments

Most Popular

This trick lets you read deleted messages on WhatsApp

Farewell XPS and Inspiron: Dell Sléche on the side of Apple to rename its computers

Meta Quest 3S: Price drop for this virtual reality headset

Google Pixel 9a review: the new mid-range boss is here

Openai crosses 400 million weekly users and aims for record valuation

Tags

Followers

Footer Menu Widget

Contact form