Export or edit this event...

PDX Rust Meetup — Spidering Wikipedia Politely In Async Rust

Maseeh College of Engineering, Portland State University
1930 SW 4th Ave #500
Portland, OR 97201, United States (map)

Please plan to arrive between 6:30 and 7. Due to limitations of the venue, we need to have someone stand outside and let people in, and we'd like them to be able to attend, so the doors will be effectively closed at 7:00, unless you're a PSU student.

Website

Description

[Rescheduled after snow day]

How many pages are reachable from Wikipedia's page on the Rust programming language in two hops? Around 30,000, it turns out, including pages on wheat flour, Welsh orthography, and the zombie apocalypse.

As it turns out, it's super easy to do this exploration using asynchronous Rust code. Wikipedia offers a cute little REST API for querying links, and it's easy to use Serde to generate requests and parse replies. And if you're feeling guilty about flooding a precious public resource with silly API requests, it's also super easy to do rate limiting.

Jim Blandy will show how to wire up Tokio, Reqwest, and Serde to do the spidering, and whip up a mock server for testing using Warp. The techniques shown work nicely for all kinds of REST API scripting, including, say, GitHub.

Share

Tags