What a single ‘&’ in Rust taught me about HTTP

I’m a web software engineer who wants to learn more about low-level programming. Meanwhile, over the last few years, Rust has become the first language to be really plausible for both domains. So naturally, it has gone to the top of my “to learn” list.

I’ve heard that Rust’s ownership model is different enough from other languages that one can’t expect to just start writing a real program and pick it up along the way, so I wanted to make sure I got the basics first. I’ve been going slowly through the excellent Rust Book. It’s so clear and helpful — but I have really limited time and energy outside my day job, and nothing gives me the same jolt of motivation as actually building something I need. I have to update my Fringe Festival web scraper anyway. That seems like the right size of thing to build!

Reqwest looks to be the main crate (library) Rustaceans use for HTTP. (It’s also strangely satisfying to type; it feels like when I tried Colemak except I’m still in QWERTY.) Anyway, I hit my first mystery while simply trying to fetch a web page and dump the HTML to standard out:

Invalid Rust ❌
fn main() {
    let client = reqwest::blocking::Client::new();
    let play_url = "https://fringetoronto.com/next-stage/show/black-canada";
    print!("Fetching {play_url}");
    let response = client.get(play_url).send().unwrap();
    let status = response.status();
    println!("-> {status}");
    if let Ok(text) = response.text() {
        println!("Here is the HTML content for {}", response.url());
        println!("----------------------------------------");
        println!("{text}");
    } else {
        println!("Failed to decode response for {}", response.url());
    }
}

Compiling this gives an ownership error: the value is moved out of response on line 8, so trying to use response on lines 9 and 13 is an error.

My first layer of understanding came from a Reddit post, pointing out that Response‘s text() method takes self (moving ownership away) whereas other methods like url() take &self (borrowing). Therefore url() must be called before text(). No response methods can be called after it, because the ownership of response goes away permanently when you call text(). (Folks say text() “consumes” the response.)

Well, calling url() first did work — also, the value must be cloned with String::from() because otherwise response is still borrowed when I try to use it again:

Valid Rust ✅
fn main() {
    let client = reqwest::blocking::Client::new();
    let play_url = "https://fringetoronto.com/next-stage/show/black-canada";
    print!("Fetching {play_url}");
    let response = client.get(play_url).send().unwrap();
    let status = response.status();
    println!("-> {status}");
    let url = String::from(response.url().as_str());
    if let Ok(text) = response.text() {
        println!("Here is the HTML content for {url}");
        println!("----------------------------------------");
        println!("{text}");
    } else {
        println!("Failed to decode response for {url}");
    }
}

Okay, I thought, but … why is Reqwest like this? Why not just put “&” in front of self for the definition of text() too?

Having something that works without understanding why is equally unsatisfying as having something that doesn’t work. It had seemed very natural to call what I thought were “getter” methods on response in any order I wanted. My mental model for what’s happening here was coming from the abstractions I was used to in Python and Javascript. My incorrect thinking was, the request “happens” on line 5 with client.get(...), and all the resulting data is loaded into response, which is basically just a box of data. I wondered why it should matter what pieces of data I take out of the box first?

This StackOverflow answer finally helped shift my mental model.

I came to understand the reason is that response is not a box but a handle. When I call client.get(...), and get a response object, the request is not complete. Rather, the server has sent back the headers and the network connection is still open. While I own the response value, I have the chance to inspect the status code, URL, or other data that can be known at this stage. When I am ready to proceed, I can call text() to download the response body. After that point I will have the body, but whatever data I didn’t copy out from the previous stage is gone.

This way of doing things makes sense to me on the level of memory management: the size of the response body is unknowable ahead of time (the content-length header is optional, and even if given, can be incorrect). Simply loading it into memory when I haven’t even asked for it yet has some clear downsides. I mean, in the final version of this scraper, I will not even bother loading the response body if the status code is not 200.

In fact, now it seems strange to ever download the response body ahead of time, in any language. “Do Python and JS really do that?” I wondered.

No, of course they don’t! Well, the built-ins and low-level libraries don’t — but the popular high-level libraries do by default.

Python

In Python, I’m used to the Requests library. With Requests, once you have a response object, you can access whichever properties you want, in whatever order you want:

Python
import requests

response = requests.get("https://fringetoronto.com/next-stage/show/black-canada")

response.text
# -> '<!doctype html>...'

# I can still check this
response.status_code
# -> 200

# And I can do this again if I want
response.text
# -> '<!doctype html>...'

But this is an abstraction that Requests intentionally provides on top of the urllib3 library. What happens under the hood is that text is really a function with a @property decorator that can be called without parentheses. In turn, text() calls content(), another property function. content() checks if the response body has already been consumed, and if not, streams the data via urllib3 and stores it in a private variable so it can be retrieved as many times as you want.

However, by default, Requests will access response.content intentionally to stream the response ahead of time for you, unless you pass stream=True.

Python
import requests

response = requests.get("https://fringetoronto.com/next-stage/show/black-canada")
response._content_consumed
# -> True (wait, already??)

response2 = requests.get(
  "https://fringetoronto.com/next-stage/show/black-canada",
  stream=True
)
response2._content_consumed
# -> False (darn right)
response2.text
# -> '<!doctype html>...'
response2._content_consumed
# -> True (aha!)

So I learned that by default, the Requests library trades off some lower-level understanding for convenience. And Python generally supports this kind of developer experience via things like the @property decorator, which can hide the fact that a function is even being called at all. (Over the years, this one language feature has probably added the most “fun” to my efforts at understanding legacy code.)

Javascript

In my own recent work, the equivalent to the Python Requests library in the browser-side JS world is Axios. My developer experience with Axios has generally taken the form,

JavaScript
import axios from 'axios';

// This actually fails the CORS check, but just to illustrate...
const response = await axios.get(
  'https://fringetoronto.com/next-stage/show/black-canada'
);

console.log(response.data);
// -> '<!DOCTYPE html>...'

// And of course, you can treat it as a box of data
console.log(response.status);
// -> 200

// This works repeatedly
console.log(response.data);
// -> '<!DOCTYPE html>...'

So by default Axios makes the same trade-off as Requests, and what’s more, there is not even an option to stream the response, because although Axios has an adapter architecture, the only supported adapter for browser-side JS relies on the native XMLHttpRequest API, which does not support streaming. With XHR, the complete response body ends up in a response or responseText property, which is copied to the Axios response.data property.

It’s worth noting, the newer browser-native Fetch API exposes more of the inner workings of response handling. Its response.body property is a stream, not a preloaded string, and once consumed, can’t be read again. It also provides the bodyUsed boolean property to tell if the response body has already been streamed. And like the Reqwest crate in Rust, it provides convenience methods for decoding particular types of response body, like text() and json().

There is also discussion and work taking place around creating a Fetch-based adapter for Axios, which could allow response streaming, among other features. However, just using Fetch directly also seems like a great option for code that doesn’t need to be portable between the browser and server sides. Part of the value of Axios over XHR is the ergonomics of promises over callbacks, which Fetch already has built-in.

Rust

Each ecosystem discussed so far has a low-level library or API for talking HTTP, and another popular library that provides convenience and ergonomics on top of it.

In…this library…provides abstraction on top of…
PythonRequestsurllib3
Browser JSAxiosXMLHttpRequest
RustReqwestHyper

However, it’s interesting that, although Reqwest identifies as “an ergonomic, batteries-included HTTP Client for Rust,” it diverges from its Python and JS counterparts by not providing a “box of data”-style Response object. When you use Reqwest, you do get many batteries included, like patterns for error handling and body decoding, but you still have to understand the basic mechanics of how the data comes and goes.

This is interesting to me because it seems to mirror what I take to be the general value proposition of Rust itself. Most previous languages made their design trade-offs based on the idea of a spectrum with ergonomics and expressiveness at one end, and system awareness and control at the other end. But Rust says you don’t have to choose; you can have both. Thus why it’s becoming popular in such a wide variety of programming domains.

I thought it was pretty cool that although I have made HTTP requests in JS and Python for years, I never thought about this until making them in Rust. And it was all because of a single ampersand (or actually, the absence of one).


Posted