[The Rust Programming Language] 8. Common Collections

8. Common Collections

Collections can contain multiple values. Unlike the built-in array and tuple types, the data these collections point to is stored on the heap, which means the amount of data does not need to be known at compile time and can grow or shrink as the program runs. Each kind of collection has different capabilities and costs, and choosing an appropriate one for your current situation is a skill you’ll develop over time.

We’ll discuss three collections that are used very often in Rust programs:

  • A vector allows you to store a variable number of values next to each other.

  • A string is a collection of characters. We’ve mentioned the String type previously, but in this chapter we’ll talk about it in depth.

  • A hash map allows you to associate a value with a particular key. It’s a particular implementation of the more general data structure called a map.

8.1 Storing Lists of Values with Vectors

Vectors allow you to store more than one value in a single data structure that puts all the values next to each other in memory. Vectors can only store values of the same type.

Creating a New Vector

To create a new, empty vector, we can call the Vec::new function

1
let v: Vec<i32> = Vec::new();

It’s more common to create a Vec that has initial values, and Rust provides the vec! macro for convenience. The macro will create a new vector that holds the values you give it.

1
let v = vec![1, 2, 3];

Updating a Vector

To create a vector and then add elements to it, we can use the push method

1
2
3
4
5
6
let mut v = Vec::new(); // need to make it mutable using the mut keyword

v.push(5); // The numbers we place inside are all of type i32, and Rust infers this from the data
v.push(6);
v.push(7);
v.push(8);

As with any variable, if we want to be able to change its value, we need to make it mutable using the mut keyword,

Dropping a Vector Drops Its Elements

Like any other struct, a vector is freed when it goes out of scope, as annotated in Listing 8-4.

1
2
3
4
5
{
let v = vec![1, 2, 3, 4];

// do stuff with v
} // <- v goes out of scope and is freed here

When the vector gets dropped, all of its contents are also dropped, meaning those integers it holds will be cleaned up.

Reading Elements of Vectors

1
2
3
4
5
6
7
8
9
let v = vec![1, 2, 3, 4, 5];

let third: &i32 = &v[2]; // Indexed by number, starting at zero, using & and []
println!("The third element is {}", third);

match v.get(2) {
Some(third) => println!("The third element is {}", third), // Using an Option<&T>.
None => println!("There is no third element."),
}

Rust has two ways to reference an element so you can choose how the program behaves when you try to use an index value that the vector doesn’t have an element for.

1
2
3
4
let v = vec![1, 2, 3, 4, 5];

let does_not_exist = &v[100]; // Cause the program to panic because it references a nonexistent element.
let does_not_exist = v.get(100); // Returns None without panicking.

Adding a new element onto the end of the vector might require allocating new memory and copying the old elements to the new space, if there isn’t enough room to put all the elements next to each other where the vector currently is. In that case, the reference to the first element would be pointing to deallocated memory. The borrowing rules prevent programs from ending up in that situation.

1
2
3
4
5
6
7
8
let mut v = vec![1, 2, 3, 4, 5];

// error[E0502]: cannot borrow `v` as mutable because it is also borrowed as immutable
let first = &v[0]; // immutable borrow occurs here

v.push(6); // immutable borrow later used here

println!("The first element is: {}", first);

Iterating over the Values in a Vector

We can iterate through all of the elements rather than use indices to access one at a time

1
2
3
4
let v = vec![100, 32, 57];
for i in &v {
println!("{}", i);
}

We can also iterate over mutable references to each element in a mutable vector in order to make changes to all the elements.

1
2
3
4
let mut v = vec![100, 32, 57];
for i in &mut v {
*i += 50; // use the dereference operator (`*`) to get to the value in i
}

To change the value that the mutable reference refers to, we have to use the dereference operator (*) to get to the value in i before we can use the += operator. We’ll talk more about the dereference operator in the “Following the Pointer to the Value with the Dereference Operator” - https://doc.rust-lang.org/book/ch15-02-deref.html#following-the-pointer-to-the-value-with-the-dereference-operator section of Chapter 15.

Using an Enum to Store Multiple Types

Fortunately, the variants of an enum are defined under the same enum type, so when we need to store elements of a different type in a vector, we can define and use an enum!

1
2
3
4
5
6
7
8
9
10
11
enum SpreadsheetCell {
Int(i32),
Float(f64),
Text(String),
}

let row = vec![
SpreadsheetCell::Int(3),
SpreadsheetCell::Text(String::from("blue")),
SpreadsheetCell::Float(10.12),
];

When you’re writing a program, if you don’t know the exhaustive set of types the program will get at runtime to store in a vector, the enum technique won’t work. Instead, you can use a trait object, which we’ll cover in Chapter 17.

8.2 Storing UTF-8 Encoded Text with Strings

Rust has only one string type in the core language, which is the string slice str that is usually seen in its borrowed form &str. String literals, are stored in the program’s binary and are therefore string slices.

The String type, which is provided by Rust’s standard library rather than coded into the core language, is a growable, mutable, owned, UTF-8 encoded string type.

Rust’s standard library also includes a number of other string types, such as OsString, OsStr, CString, and CStr. Library crates can provide even more options for storing string data.

Creating a New String

1
2
3
4
5
6
7
8
9
10
11
12
13
// Using the new function
let mut s = String::new();

// Using the to_string method
let data = "initial contents";

let s = data.to_string();

// the method also works on a literal directly:
let s = "initial contents".to_string();

// Using the function String::from
let s = String::from("initial contents");

Updating a String

A String can grow in size and its contents can change, just like the contents of a Vec, if you push more data into it. In addition, you can conveniently use the + operator or the format! macro to concatenate String values.

Appending to a String with push_str and push

We can grow a String by using the push_str method to append a string slice

1
2
let mut s = String::from("foo");
s.push_str("bar"); // will contain foobar

The push_str method takes a string slice because we don’t necessarily want to take ownership of the parameter

The push method takes a single character as a parameter and adds it to the String. Listing 8-17 shows code that adds the letter “l” to a String using the push method.

1
2
let mut s = String::from("lo");
s.push('l'); // lol

Concatenation with the + Operator or the format! Macro

One way is to use the + operator

1
2
3
let s1 = String::from("Hello, ");
let s2 = String::from("world!");
let s3 = s1 + &s2; // note s1 has been moved here and can no longer be used

The + operator uses the add method, whose signature looks something like this:

1
fn add(self, s: &str) -> String

When we call the add method, Rust uses a deref coercion, which here turns &s2 into &s2[…]. We’ll discuss deref coercion in more depth in Chapter 15. Because add does not take ownership of the s parameter, s2 will still be a valid String after this operation.

Second, we can see in the signature that add takes ownership of self, because self does not have an &. This statement actually takes ownership of s1, appends a copy of the contents of s2, and then returns ownership of the result.

For more complicated string combining, we can use the format! macro:

1
2
3
4
5
let s1 = String::from("tic");
let s2 = String::from("tac");
let s3 = String::from("toe");

let s = format!("{}-{}-{}", s1, s2, s3); // tic-tac-toe

The format! macro works in the same way as println!, but instead of printing the output to the screen, it returns a String with the contents. The version of the code using format! is much easier to read and doesn’t take ownership of any of its parameters.

Indexing into Strings

If you try to access parts of a String using indexing syntax in Rust, you’ll get an error.

1
2
3
// This code does not compile!
let s1 = String::from("hello");
let h = s1[0]; // error[E0277]: the type `String` cannot be indexed by `{integer}`

Internal Representation

A String is a wrapper over a Vec.

Bytes and Scalar Values and Grapheme Clusters! Oh My!

Another point about UTF-8 is that there are actually three relevant ways to look at strings from Rust’s perspective: as bytes, scalar values, and grapheme clusters (the closest thing to what we would call letters).

A final reason Rust doesn’t allow us to index into a String to get a character is that indexing operations are expected to always take constant time (O(1)). But it isn’t possible to guarantee that performance with a String, because Rust would have to walk through the contents from the beginning to the index to determine how many valid characters there were.

Slicing Strings

Indexing into a string is often a bad idea because it’s not clear what the return type of the string-indexing operation should be: a byte value, a character, a grapheme cluster, or a string slice. Therefore, Rust asks you to be more specific if you really need to use indices to create string slices. To be more specific in your indexing and indicate that you want a string slice, rather than indexing using [] with a single number, you can use [] with a range to create a string slice containing particular bytes:

1
2
3
4
5
let hello = "Здравствуйте";

let s = &hello[0..4];

&hello[0..1] // would panic at runtime in the same way as if an invalid index were accessed in a vector

You should use ranges to create string slices with caution, because doing so can crash your program.

Methods for Iterating Over Strings

If you need to perform operations on individual Unicode scalar values, the best way to do so is to use the chars method.

1
2
3
for c in "नमस्ते".chars() {
println!("{}", c);
}

The bytes method returns each raw byte, which might be appropriate for your domain:

1
2
3
for b in "नमस्ते".bytes() {
println!("{}", b);
}

But be sure to remember that valid Unicode scalar values may be made up of more than 1 byte.

Getting grapheme clusters from strings is complex, so this functionality is not provided by the standard library. Crates are available on crates.io if this is the functionality you need.

8.3 Storing Keys with Associated Values in Hash Maps

The type HashMap<K, V> stores a mapping of keys of type K to values of type V. It does this via a hashing function, which determines how it places these keys and values into memory.

Creating a New Hash Map

You can create an empty hash map with new and add elements with insert. In Listing 8-20, we’re keeping track of the scores of two teams whose names are Blue and Yellow. The Blue team starts with 10 points, and the Yellow team starts with 50.

1
2
3
4
5
6
use std::collections::HashMap;

let mut scores = HashMap::new();

scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);

Another way of constructing a hash map is by using iterators and the collect method on a vector of tuples, where each tuple consists of a key and its value. We’ll be going into more detail about iterators and their associated methods in the ”Processing a Series of Items with Iterators” - https://doc.rust-lang.org/book/ch13-02-iterators.html section of Chapter 13.

1
2
3
4
5
6
7
use std::collections::HashMap;

let teams = vec![String::from("Blue"), String::from("Yellow")];
let initial_scores = vec![10, 50];

let mut scores: HashMap<_, _> =
teams.into_iter().zip(initial_scores.into_iter()).collect();

The type annotation HashMap<_, _> is needed here because it’s possible to collect into many different data structures and Rust doesn’t know which you want unless you specify. For the parameters for the key and value types, however, we use underscores, and Rust can infer the types that the hash map contains based on the types of the data in the vectors.

Hash Maps and Ownership

For types that implement the Copy trait, like i32, the values are copied into the hash map. For owned values like String, the values will be moved and the hash map will be the owner of those values.

1
2
3
4
5
6
7
let field_name = String::from("Favorite color");
let field_value = String::from("Blue");

let mut map = HashMap::new();
map.insert(field_name, field_value);
// field_name and field_value are invalid at this point, try using them and
// see what compiler error you get!

If we insert references to values into the hash map, the values won’t be moved into the hash map. The values that the references point to must be valid for at least as long as the hash map is valid. We’ll talk more about these issues in the “Validating References with Lifetimes” - https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#validating-references-with-lifetimes section in Chapter 10.

Accessing Values in a Hash Map

We can get a value out of the hash map by providing its key to the get method.

1
2
3
4
5
6
7
8
9
use std::collections::HashMap;

let mut scores = HashMap::new();

scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);

let team_name = String::from("Blue");
let score = scores.get(&team_name); // Some(&10)

Here, score will have the value that’s associated with the Blue team, and the result will be Some(&10). The result is wrapped in Some because get returns an Option<&V>; if there’s no value for that key in the hash map, get will return None.

We can iterate over each key/value pair in a hash map in a similar manner as we do with vectors, using a for loop:

1
2
3
4
5
6
7
8
9
10
use std::collections::HashMap;

let mut scores = HashMap::new();

scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);

for (key, value) in &scores {
println!("{}: {}", key, value); // will print each pair in an arbitrary order:
}

Only Inserting a Value If the Key Has No Value

It’s common to check whether a particular key has a value and, if it doesn’t, insert a value for it. Hash maps have a special API for this called entry that takes the key you want to check as a parameter. The return value of the entry method is an enum called Entry that represents a value that might or might not exist.

1
2
3
4
5
6
7
8
9
use std::collections::HashMap;

let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);

scores.entry(String::from("Yellow")).or_insert(50); // will insert the key for the Yellow team with the value 50 because the Yellow team doesn’t have a value already.
scores.entry(String::from("Blue")).or_insert(50); // will not change the hash map because the Blue team already has the value 10.

println!("{:?}", scores); // will print {"Yellow": 50, "Blue": 10}

Updating a Value Based on the Old Value

1
2
3
4
5
6
7
8
9
10
11
12
use std::collections::HashMap;

let text = "hello world wonderful world";

let mut map = HashMap::new();

for word in text.split_whitespace() {
let count = map.entry(word).or_insert(0); // returns a mutable reference (&mut V) to the value for this key
*count += 1;
}

println!("{:?}", map); // {"world": 2, "hello": 1, "wonderful": 1}

This code will print {“world”: 2, “hello”: 1, “wonderful”: 1}. The or_insert method actually returns a mutable reference (&mut V) to the value for this key. Here we store that mutable reference in the count variable, so in order to assign to that value, we must first dereference count using the asterisk (*). The mutable reference goes out of scope at the end of the for loop, so all of these changes are safe and allowed by the borrowing rules.

Hashing Functions

By default, HashMap uses a hashing function called SipHash that can provide resistance to Denial of Service (DoS) attacks involving hash tables1. This is not the fastest hashing algorithm available, but the trade-off for better security that comes with the drop in performance is worth it. If you profile your code and find that the default hash function is too slow for your purposes, you can switch to another function by specifying a different hasher. A hasher is a type that implements the BuildHasher trait. We’ll talk about traits and how to implement them in Chapter 10. You don’t necessarily have to implement your own hasher from scratch; crates.io has libraries shared by other Rust users that provide hashers implementing many common hashing algorithms.

References

[1] The Rust Programming Language - The Rust Programming Language - https://doc.rust-lang.org/book/title-page.html

[2] Common Collections - The Rust Programming Language - https://doc.rust-lang.org/book/ch08-00-common-collections.html