Just How do Derive Macros Work?
A macro, as you probably heard before, is code that writes code. Procedural macros are one of two types of macros in Rust (the other being declarative macros). These macros give you complete control over how code is generated. They are procedural in the sense that you have to write how you want the code to be generated or transformed given a token tree input
Whenever you annotate your struct or enum (or union??) with #[derive(Debug)]
, the Debug
proc macro responsible for implementing this trait is invoked to automatically create the impl Debug for MyStruct { ... }
, directly accessing things code can’t usually access, such as the name of the struct and its fields. Let’s try to understand how they do this, by making our own macro. All of the code is available here
Example: Deriving the IntoMap Trait
We want to create a derive macro to simplify the implementation of the IntoMap
trait. Our macro will only be valid when derived from a struct with named fields
This trait has one member function as_map
, that “serializes” the struct and maps its fields and values to a BTreeMap
. We define it as follows:
1
2
3
pub trait IntoMap {
fn as_map(&self) -> BTreeMap<String, String>;
}
To make our derive macro, we first need to make a new crate, and specify that it’s a procedual macro crate
1
2
[lib]
proc-macro = true
In our lib file, we create a function with the #[proc_macro_derive(Trait)]
attribute. This function is what will be called with we annotate a type with #[derive(Trait)]
We start by declaring our function with the predefined signature of a derive macro: TokenStream
as input, TokenStream
as output, because it’s taking in source code, and outputting new source code
1
2
3
4
#[proc_macro_derive(IntoMap)]
pub fn intomap_derive(input: TokenStream) -> TokenStream {
todo!()
}
In order to leverage Rust’s elegant error handling, and because of the fixed derive function signature, we typically export a macro’s implementation to another function that returns a
syn::Result<TokenStream2>
, whereTokenStream2
is an aliasedproc_macro2::TokenStream
1 2 3 4 5 mod expand { pub fn intomap_impl(parsed_input: DeriveInput) -> Result<TokenStream2> { todo!() } }We use
parse_macro_input!
from the syn crate to parse theinput
into aDeriveInput
, and extract the struct’s name (identifier)We will then call this in the main derive function, and convert any
Err
into a compile error. The.into()
is to convert fromTokenStream2
intoTokenStream
1 2 3 4 5 6 7 #[proc_macro_derive(IntoMap)] pub fn intomap_derive(input: TokenStream) -> TokenStream { let parsed_input = parse_macro_input!(input as DeriveInput); expand::intomap_impl(parsed_input) .unwrap_or_else(Error::into_compile_error) .into() }
Now, back in expand::intomap_impl
, We extract the struct’s name, which we will need later. We then perform nested pattern matching to parse only structs with named fields. We invalidate everything else by returning an error
1
2
3
4
5
6
7
8
9
10
11
12
13
let struct_name = parsed_input.ident;
match parsed_input.data {
Data::Struct(DataStruct {
fields: Fields::Named(FieldsNamed { ref named, .. }),
..
}) => todo!(),
_ => {
return Err(Error::new(
Span::call_site(),
"#[derive(IntoMap)] is only valid for structs with named fields",
))
}
};
named
is of type Punctuated<Field, Comma>
, which implements the Iterator
trait. We now want to return an iterator over TokenStream
, where each iteration contains the generated code for inserting each field in named
into a BTreeMap
The variables map
and self
do not exist within the current context, which might seem like we are making a non-hygienic macro. That is not the case, as we are just creating a TokenStream
for now. These tokens will be passed to a context which does have map
and self
We obtain our lazy iterator of tokens impl Iterator<Item = TokenStream>
1
2
3
4
5
6
7
8
9
let insert_tokens = named.iter().map(|field| {
let field_name = field.ident.clone().unwrap();
quote! {
map.insert(
stringify!(#field_name).to_string(),
self.#field_name.to_string()
);
}
}
Finally, we create the TokenStream
that our function will be returning. We first bring the necessary types into scope, then implement IntoMap
for our struct, whose name (identifier) we extracted earlier
Inside intomap
, we find ourselves in the previously mentioned context that has both map
and self
. We then consume the iterator to add our tokens that perform the insertions using a syntax similar to that of declarative macro repetitions, courtesy of quote!
1
2
3
4
5
6
7
8
9
10
11
12
let tokens = quote! {
use std::collections::BTreeMap;
use intomap::IntoMap;
impl IntoMap for #struct_name {
fn as_map(&self) -> BTreeMap<String, String> {
let mut map = BTreeMap::new();
#(#insert_tokens)*
map
}
}
};
An Attribute to Ignore Fields
We want to extend the capability of our macro by adding the option to ignore some struct fields when converting it to a BTreeMap
. We will do this by tagging the fields with the #[intomap_ignore]
attribute. To do so, we first have to define the attribute in the function definition
1
2
3
4
5
6
const ATTR_IGNORE: &str = "intomap_ignore";
#[proc_macro_derive(IntoMap, attributes(intomap_ignore))]
pub fn intomap_derive(input: TokenStream) -> TokenStream {
todo!()
}
We then will modify how we obtain insert_tokens
so that it skips the fields tagged with our new attribute. We will replace our map
over the struct fields with a filter_map
, and only return Some(TokenStream)
if the field is not tagged with #[intomap_ignore]
We do that by iterating over the attributes of each field, validating that all
of its attributes are not "intomap_ignore"
, then returning the insert tokens wrapped in a Some
if the condition is valid
1
2
3
4
5
6
7
8
9
10
11
12
13
let insert_tokens = named.iter().filter_map(|field| {
let field_name = field.ident.clone().unwrap();
field
.attrs
.iter()
.all(|attr| !attr.path().is_ident(ATTR_IGNORE))
.then_some(quote! {
map.insert(
stringify!(#field_name).to_string(),
self.#field_name.to_string()
);
})
})
A Better Ignore
We previously used #[intomap_ignore]
as #[ignore]
already exists (for ignoring a test) and will cause conflict if we give the same name to our attribute. But #[intomap_ignore]
is ugly and if we want to add other attributes, we’d have to follow this style of #[intomap_...]
A more elegant attribute would look like #[intomap(ignore)]
. This way, we can pseudo-namespace all of our attributes
1
2
3
4
5
const ATTR_INTOMAP: &str = "intomap";
const IDENT_IGNORE: &str = "ignore";
#[proc_macro_derive(IntoMap, attributes(intomap))]
pub fn intomap_derive(input: TokenStream) -> TokenStream {}
The way to do it is slightly more complex, but it only requires changing the predicate inside our all
call. If the field’s attribute’s path is intomap
, we parse its arguments as an identifier, and match on the result such that all
returns true only if all of the attribute arguments do not match ignore
.
.path()
returns the path that identifies the interpretation of an attribute, which istest
in#[test]
,derive
in#[derive(Copy)]
, andpath
in#[path = "sys/windows.rs"]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
field
.attrs
.iter()
.filter(|attr| attr.path().is_ident(ATTR_INTOMAP))
.all(|attr| match attr.parse_args::<Ident>() {
Ok(ident) => ident != IDENT_IGNORE,
Err(_) => true,
})
.then_some(quote! {
map.insert(
stringify!(#field_name).to_string(),
self.#field_name.to_string()
);
})
This way, we have a prettier attribute that allows for later consistency with our library. We can even add later attributes that will make more sense with this structure like #[intomap(rename = "new_name")]
. Let’s do that next
An Attribute to Rename Fields
All that we will need to change is the key name when inserting into the map
1
2
3
4
5
6
7
let field_rename = todo!();
quote! {
map.insert(
stringify!(#field_rename).to_string(),
self.#field_name.to_string()
);
})
Now to parse our rename syntax. Let’s export the functionality to a new function. Our derive function is starting to get long. We want to take a field as input, go through its attributes, and return Ident(new_name)
if there is a rename attribute
1
2
3
fn field_rename(field: &Field) -> Option<Ident> {
todo!()
}
We then iterate over the field’s attributes, filter out the attributes whose paths aren’t intomap
, and use a find_map
to return the first instance of a renaming. We then parse its arguments as an assignment expression because our rename syntax is expression-like
1
2
3
4
5
6
7
8
9
field
.attrs
.iter()
.filter(|attr| attr.path().is_ident(ATTR_INTOMAP))
.find_map(|attr| {
attr.parse_args::<ExprAssign>().ok().and_then(|expr| {
todo!()
})
})
Finally, we match the left and right sides of the expression, validating that the left side is the rename
identifier, and converting the string literal on the right side to an identifier.
1
2
3
4
5
6
7
8
9
match (*expr.left, *expr.right) {
(
Expr::Path(ExprPath { path, .. }),
Expr::Lit(ExprLit { lit: Lit::Str(s), .. }),
) if path.is_ident(IDENT_RENAME) => {
Some(Ident::new(s.value().as_str(), s.span()))
}
_ => None,
}
Now to use this function, we call it, and default to the original field name in case of a None
1
let field_rename = field_rename(&field).unwrap_or(field_name.clone());
We’re done! Derive macros are one Rust’s greatest features, and understanding them lifts the veil of complexity. No more magic, or rather, you are now a magician
Once again, all of the code is available in this repo. If there is an issue or if you want to contribute, please write an issue here