[The Rust Programming Language] 8. Common Collections
8. Common Collections
Collections can contain multiple values. Unlike the built-in array and tuple types, the data these collections point to is stored on the heap, which means the amount of data does not need to be known at compile time and can grow or shrink as the program runs. Each kind of collection has different capabilities and costs, and choosing an appropriate one for your current situation is a skill you’ll develop over time.
We’ll discuss three collections that are used very often in Rust programs:
-
A vector allows you to store a variable number of values next to each other.
-
A string is a collection of characters. We’ve mentioned the String type previously, but in this chapter we’ll talk about it in depth.
-
A hash map allows you to associate a value with a particular key. It’s a particular implementation of the more general data structure called a map.
8.1 Storing Lists of Values with Vectors
Vectors allow you to store more than one value in a single data structure that puts all the values next to each other in memory. Vectors can only store values of the same type.
Creating a New Vector
To create a new, empty vector, we can call the Vec::new
function
1 | let v: Vec<i32> = Vec::new(); |
It’s more common to create a Vec
1 | let v = vec![1, 2, 3]; |
Updating a Vector
To create a vector and then add elements to it, we can use the push
method
1 | let mut v = Vec::new(); // need to make it mutable using the mut keyword |
As with any variable, if we want to be able to change its value, we need to make it mutable using the mut keyword,
Dropping a Vector Drops Its Elements
Like any other struct, a vector is freed when it goes out of scope, as annotated in Listing 8-4.
1 | { |
When the vector gets dropped, all of its contents are also dropped, meaning those integers it holds will be cleaned up.
Reading Elements of Vectors
1 | let v = vec![1, 2, 3, 4, 5]; |
Rust has two ways to reference an element so you can choose how the program behaves when you try to use an index value that the vector doesn’t have an element for.
1 | let v = vec![1, 2, 3, 4, 5]; |
Adding a new element onto the end of the vector might require allocating new memory and copying the old elements to the new space, if there isn’t enough room to put all the elements next to each other where the vector currently is. In that case, the reference to the first element would be pointing to deallocated memory. The borrowing rules prevent programs from ending up in that situation.
1 | let mut v = vec![1, 2, 3, 4, 5]; |
Iterating over the Values in a Vector
We can iterate through all of the elements rather than use indices to access one at a time
1 | let v = vec![100, 32, 57]; |
We can also iterate over mutable references to each element in a mutable vector in order to make changes to all the elements.
1 | let mut v = vec![100, 32, 57]; |
To change the value that the mutable reference refers to, we have to use the dereference operator (*
) to get to the value in i before we can use the +=
operator. We’ll talk more about the dereference operator in the “Following the Pointer to the Value with the Dereference Operator” - https://doc.rust-lang.org/book/ch15-02-deref.html#following-the-pointer-to-the-value-with-the-dereference-operator section of Chapter 15.
Using an Enum to Store Multiple Types
Fortunately, the variants of an enum are defined under the same enum type, so when we need to store elements of a different type in a vector, we can define and use an enum!
1 | enum SpreadsheetCell { |
When you’re writing a program, if you don’t know the exhaustive set of types the program will get at runtime to store in a vector, the enum technique won’t work. Instead, you can use a trait object, which we’ll cover in Chapter 17.
8.2 Storing UTF-8 Encoded Text with Strings
Rust has only one string type in the core language, which is the string slice str that is usually seen in its borrowed form &str. String literals, are stored in the program’s binary and are therefore string slices.
The String type, which is provided by Rust’s standard library rather than coded into the core language, is a growable, mutable, owned, UTF-8 encoded string type.
Rust’s standard library also includes a number of other string types, such as OsString, OsStr, CString, and CStr. Library crates can provide even more options for storing string data.
Creating a New String
1 | // Using the new function |
Updating a String
A String can grow in size and its contents can change, just like the contents of a Vec
Appending to a String with push_str and push
We can grow a String by using the push_str
method to append a string slice
1 | let mut s = String::from("foo"); |
The push_str
method takes a string slice because we don’t necessarily want to take ownership of the parameter
The push method takes a single character as a parameter and adds it to the String. Listing 8-17 shows code that adds the letter “l” to a String using the push method.
1 | let mut s = String::from("lo"); |
Concatenation with the + Operator or the format! Macro
One way is to use the + operator
1 | let s1 = String::from("Hello, "); |
The + operator uses the add method, whose signature looks something like this:
1 | fn add(self, s: &str) -> String |
When we call the add method, Rust uses a deref coercion, which here turns &s2 into &s2[…]. We’ll discuss deref coercion in more depth in Chapter 15. Because add does not take ownership of the s parameter, s2 will still be a valid String after this operation.
Second, we can see in the signature that add takes ownership of self, because self does not have an &. This statement actually takes ownership of s1, appends a copy of the contents of s2, and then returns ownership of the result.
For more complicated string combining, we can use the format! macro:
1 | let s1 = String::from("tic"); |
The format!
macro works in the same way as println!
, but instead of printing the output to the screen, it returns a String with the contents. The version of the code using format!
is much easier to read and doesn’t take ownership of any of its parameters.
Indexing into Strings
If you try to access parts of a String using indexing syntax in Rust, you’ll get an error.
1 | // This code does not compile! |
Internal Representation
A String is a wrapper over a Vec
Bytes and Scalar Values and Grapheme Clusters! Oh My!
Another point about UTF-8 is that there are actually three relevant ways to look at strings from Rust’s perspective: as bytes, scalar values, and grapheme clusters (the closest thing to what we would call letters).
A final reason Rust doesn’t allow us to index into a String to get a character is that indexing operations are expected to always take constant time (O(1)). But it isn’t possible to guarantee that performance with a String, because Rust would have to walk through the contents from the beginning to the index to determine how many valid characters there were.
Slicing Strings
Indexing into a string is often a bad idea because it’s not clear what the return type of the string-indexing operation should be: a byte value, a character, a grapheme cluster, or a string slice. Therefore, Rust asks you to be more specific if you really need to use indices to create string slices. To be more specific in your indexing and indicate that you want a string slice, rather than indexing using [] with a single number, you can use [] with a range to create a string slice containing particular bytes:
1 | let hello = "Здравствуйте"; |
You should use ranges to create string slices with caution, because doing so can crash your program.
Methods for Iterating Over Strings
If you need to perform operations on individual Unicode scalar values, the best way to do so is to use the chars method.
1 | for c in "नमस्ते".chars() { |
The bytes method returns each raw byte, which might be appropriate for your domain:
1 | for b in "नमस्ते".bytes() { |
But be sure to remember that valid Unicode scalar values may be made up of more than 1 byte.
Getting grapheme clusters from strings is complex, so this functionality is not provided by the standard library. Crates are available on crates.io if this is the functionality you need.
8.3 Storing Keys with Associated Values in Hash Maps
The type HashMap<K, V> stores a mapping of keys of type K to values of type V. It does this via a hashing function, which determines how it places these keys and values into memory.
Creating a New Hash Map
You can create an empty hash map with new and add elements with insert. In Listing 8-20, we’re keeping track of the scores of two teams whose names are Blue and Yellow. The Blue team starts with 10 points, and the Yellow team starts with 50.
1 | use std::collections::HashMap; |
Another way of constructing a hash map is by using iterators and the collect method on a vector of tuples, where each tuple consists of a key and its value. We’ll be going into more detail about iterators and their associated methods in the ”Processing a Series of Items with Iterators” - https://doc.rust-lang.org/book/ch13-02-iterators.html section of Chapter 13.
1 | use std::collections::HashMap; |
The type annotation HashMap<_, _>
is needed here because it’s possible to collect into many different data structures and Rust doesn’t know which you want unless you specify. For the parameters for the key and value types, however, we use underscores, and Rust can infer the types that the hash map contains based on the types of the data in the vectors.
Hash Maps and Ownership
For types that implement the Copy trait, like i32, the values are copied into the hash map. For owned values like String, the values will be moved and the hash map will be the owner of those values.
1 | let field_name = String::from("Favorite color"); |
If we insert references to values into the hash map, the values won’t be moved into the hash map. The values that the references point to must be valid for at least as long as the hash map is valid. We’ll talk more about these issues in the “Validating References with Lifetimes” - https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#validating-references-with-lifetimes section in Chapter 10.
Accessing Values in a Hash Map
We can get a value out of the hash map by providing its key to the get method.
1 | use std::collections::HashMap; |
Here, score will have the value that’s associated with the Blue team, and the result will be Some(&10). The result is wrapped in Some because get returns an Option<&V>; if there’s no value for that key in the hash map, get will return None.
We can iterate over each key/value pair in a hash map in a similar manner as we do with vectors, using a for loop:
1 | use std::collections::HashMap; |
Only Inserting a Value If the Key Has No Value
It’s common to check whether a particular key has a value and, if it doesn’t, insert a value for it. Hash maps have a special API for this called entry that takes the key you want to check as a parameter. The return value of the entry method is an enum called Entry that represents a value that might or might not exist.
1 | use std::collections::HashMap; |
Updating a Value Based on the Old Value
1 | use std::collections::HashMap; |
This code will print {“world”: 2, “hello”: 1, “wonderful”: 1}. The or_insert method actually returns a mutable reference (&mut V) to the value for this key. Here we store that mutable reference in the count variable, so in order to assign to that value, we must first dereference count using the asterisk (*). The mutable reference goes out of scope at the end of the for loop, so all of these changes are safe and allowed by the borrowing rules.
Hashing Functions
By default, HashMap uses a hashing function called SipHash that can provide resistance to Denial of Service (DoS) attacks involving hash tables1. This is not the fastest hashing algorithm available, but the trade-off for better security that comes with the drop in performance is worth it. If you profile your code and find that the default hash function is too slow for your purposes, you can switch to another function by specifying a different hasher. A hasher is a type that implements the BuildHasher trait. We’ll talk about traits and how to implement them in Chapter 10. You don’t necessarily have to implement your own hasher from scratch; crates.io has libraries shared by other Rust users that provide hashers implementing many common hashing algorithms.