This was originally published on Wikimedia’s blog
Wikimedia’s new public service that exposes live streams of Wikimedia projects is already powering several visualizations, like DataWaltz.
We are happy to announce EventStreams, a new public service that exposes live streams of Wikimedia events. And we don’t mean the next big calendar event like the Winter Olympics or Wikimania. Here, an ‘event’ is defined to be a small piece of data usually representing a state change. An edit of a Wikipedia page that adds some new information is an ‘event’, and could be described like the following:
1 2 3 4 5 6 7
This means: “a user named ‘TheBestEditor’ added some content to the English Wikipedia’s Special Olympics page on March 7, 2017 at 9:31am”. While composing this blog post, we sought visualizations that use EventStreams, and found some awesome examples.
Open now in Los Angeles, DataWaltz is a physical installation that “creates a spatial feedback system for engaging with Wikipedia live updates, allowing visitors to follow and produce content from their interactions with the gallery’s physical environment.” You can see a photo of it at the top, and a 360 video of it over on Vimeo.
A little background—why EventStreams?
EventStreams is not the first service from Wikimedia to expose RecentChange events as a stream. irc.wikimedia.org and RCStream have existed for years. These all serve the same data: RecentChange events. So why add a third stream service?
Both irc.wikimedia.org and RCStream suffer from similar design flaws. Neither service can be restarted without interrupting client subscriptions. This makes it difficult to build comprehensive tools that might not want to miss an event, and hard for WMF engineers to maintain. They are not easy to use, as services require several programming setup steps just to start subscribing to the stream. Perhaps more importantly, these services are RecentChanges specific, meaning that they are not able to serve different types of events. EventStreams addresses all of these issues.
EventStreams is built on the w3c standard Server Sent Events (SSE). SSE is simply a streaming HTTP connection with event data in a particular text format. Client libraries, usually called EventSource, assist with building responsive tools, but because SSE is really just HTTP, you can use any HTTP client (even curl!) to consume it.
The SSE standard defines a Last-Event-ID HTTP header, which allows clients to tell servers about the last event that they’ve consumed. EventStreams uses this header to begin streaming to a client from a point in the past. If EventSource clients are disconnected from servers (due to network issues or EventStreams service restarts), they will send this header to the server and automatically reconnect and begin from where they left off.
EventStreams can be used to expose any useful streams of events, not just RecentChanges. If there’s a stream you’d like to have, we want to know about it. For example, soon ORES revision score events may be exposed in their own stream. The service API docs have an up to date list of the (currently limited) available stream endpoints.
We’d like all RecentChange stream clients to switch to EventStreams, but we recognize that there are valuable bots out there running on irc.wikimedia.org that we might not be able to find the maintainers of. We commit to supporting irc.wikimedia.org for the foreseeable future. However, we believe the list of (really important) RCStream clients is small enough that we can convince or help folks switch to EventStreams. We’ve chosen an official RCStream decommission date of July 7 this year. If you run an RCStream client and are reading this and want help migrating, please reach out to us!
1 2 3 4 5 6 7 8 9 10 11 12
You should see RecentChange events fly by in your console.
That’s it! The EventStreams documentation has in depth information and usage examples in other languages.
If you build something, please tell us, or add yourself to the Powered By EventStreams wiki page. There are already some amazing uses there!