Rust trait、动态派发和向上转型

2023年 10月 13日 45.1k 0

原文连接

为了保证概念的严谨性,翻译时保留了英文原文。

I recently hit a limitation of Rust when working with trait objects. I had a function that returned a trait object and I needed a trait object for one of its supertraits.

最近,我在处理特征对象时遇到了 Rust 的限制。我写了一个返回特征对象的函数,需要将子特征对象向上转型为它的 Super 特征对象返回。

trait Super {}

trait Sub: Super {}

fn upcast(obj: Arc) -> Arc {
    obj
}

To my surprise, the code did not compile:

令我惊讶的是,代码不能编译:

error[E0308]: mismatched types
 --> src/lib.rs:8:5
  |
7 | fn upcast(obj: Arc) -> Arc {
  |                                 -------------- expected `std::sync::Arc`
             found struct `std::sync::Arc {
    t
}

This compiled just fine, but this doesn't:

这编译得很好,但是下边这样不行:

trait Super {}

fn to_trait_object(t: Arc) -> Arc {
  |                        - this type parameter needs to be `std::marker::Sized`
6 |     t
  |     ^ doesn't have a size known at compile-time
  |
  = help: the trait `std::marker::Sized` is not implemented for `T`
  = note: to learn more, visit 
  = note: required for the cast to the object type `dyn Super`

So, the reference is clearly misleading here. I started out to explore why this doesn't work and if I can do something about it.

然而,这里的引用显然具有误导性。我开始探索为什么这行不通,以及我是否可以为此做点什么。

Let's start with some basics.

让我们从一些基础知识开始。

1. 什么是动态派发? (What is dynamic dispatch?)

The most common and idomatic way to use traits in Rust is through generics:

在 Rust 中使用特征的最常见和最惯用的方法是通过泛型:

trait TypeDescription {
    fn get_description(&self) -> String;
}

impl TypeDescription for u8 {
    fn get_description(&self) -> String {
        format!("{} is an unsigned 8 bit integer.", self)
    }
}

impl TypeDescription for i64 {
    fn get_description(&self) -> String {
        format!("{} is a signed 64 bit integer.", self)
    }
}

fn print_description(t: &T) {
    println!("{}", t.get_description());
}

fn main() {
    print_description(&42u8);
    print_description(&42i64);
}

The output of this program is as follows:

该程序的输出如下:

42 is an unsigned 8 bit integer.
42 is a signed 64 bit integer.

So what actually happens here? The signature fn print_description(t: &T) defines a generic function. Every time the function is called, the compiler determines the type T of the first argument, checks if T implements the TypeDescription trait and if so, generates code for this function specific to the type arguments. Depending on the precise size, alignment and layout of the type, and the specifics of the trait implementation, the generated code might differ considerably. This process is often called monomorphisation.

那么这里到底发生了什么?签名 fn print_description(t: &T) 定义了一个通用函数。每次调用该函数时,编译器都会确定第一个参数的类型 T ,检查 T 是否实现 TypeDescription 特征,如果是,则生成 此函数特定类型参数 的 代码。根据类型的精确大小、对齐和布局以及特征实现的细节,生成的代码可能会有很大差异。这个过程通常称为单态化。

Because the code for the function call is generated statically at compile time, this method of calling generic code is sometimes called static dispatch. If you think that this is all very similar to C++ templates, you are right: A similar monomorphisation process happens when C++ templates are instantiated.

由于函数调用的代码是在编译时静态生成的,因此这种调用泛型代码的方法有时称为静态派发。如果您认为这与 C++ 模板非常相似,那么您是对的:实例化 C++ 模板时会发生类似的单态化过程。

Static dispatch is considered efficient, and it is a major reason why Rust performs so well in the presence of generic code: You don't pay an additional runtime cost compared to writing multiple almost identical functions for different types.

静态派发被认为是高效的,这也是 Rust 在存在通用代码的情况下表现如此出色的一个主要原因:与为不同类型编写多个几乎相同的函数相比,您不需要支付额外的运行时成本。

However, in order for static dispatch to work, the compiler must know all types at compile time. But that is not always possible. A common example is having a collection of items of different type that all implement a common trait. In Rust, this can be achieved with trait objects.

但是,为了使静态派发正常工作,编译器必须在编译时知道 所有类型。但这并不总是可能的。一个常见的例子是拥有不同类型的 元素 的集合,这些 元素 都实现了共同的特征。在 Rust 中,这可以通过特征对象来实现。

fn print_descriptions(ts: &Vec) {
    for t in ts {
        println!("{}", t.get_description());
    }
}

fn main() {
    let ts: Vec = vec![Box::new(42u8), Box::new(42i64)];

    print_descriptions(&ts);
}

The output of this program is the same as above. But how does this work, and how is it different from the first program? To understand this, we need to talk about dynamically sized types.

该程序的输出与上面相同。但这是如何工作的,它与第一个程序有何不同?为了理解这一点,我们需要讨论动态大小类型。

2. 动态大小类型 (Dynamically sized types )

A dynamically sized type (or DST, sometimes also referred to as an unsized type) is a type whose size is unknown at compile time. But how can such a type even exist? After all, the compiler creates the values so it must know its size, right?

动态大小的类型(或 DST,有时也称为unsized大小的类型)是编译时大小未知的类型。但这样的类型怎么可能存在呢?毕竟,编译器创建这些值,因此它必须知道其大小,对吗?

Let's look at the previous code example: We create a few values of different types, then put them in Boxes and out those into a Vec. From now on, all we know about these types is their address and ... something else.

让我们看一下前面的代码示例:我们创建了一些 不同类型的值,然后将它们放入 Box 中,并将它们放入 Vec 中。从现在开始,我们对这些类型的了解就是它们的地址和......其他一些东西。

The first rule of DSTs is that they can only exist behind some kind of pointer, since their size is not known at compile time. But what is a pointer anyway? This is a non-comprehensive list of pointer types in Rust:

DST 的第一条规则是它们只能存在于某种指针后面,因为它们的大小在编译时未知。但指针到底是什么?这是 Rust 中指针类型的非完整列表:

  • *const T, *mut T
  • &T, &mut T
  • Box
  • Rc
  • Arc
  • Pin

    where P is any of the above.
    Pin

    其中 P 是以上任意一项。

What all these types have in common is that their in-memory representation is a simple pointer, i.e. an integer the size of a machine word that refers to a memory address. The only difference between them is what the compiler allows you to do with them and what code it generates for them.

所有这些类型的共同点是它们在内存中的表示形式是一个简单的指针,即引用内存地址的机器字大小的整数。它们之间的唯一区别是编译器允许您对它们执行什么操作以及为它们生成什么代码。

For the purpose of this section, it is sufficient to consider two types of DSTs: Slices and trait objects. A DST is created by a process that is sometimes called unsizing:

出于本节的目的,考虑两种类型的 DST 就足够了:切片和特征对象。创建 DST 的过程 被称为 unsizing,这个过程分为2个步骤:

  • First you create a pointer (see above) to a value of a sized type.
    首先,创建一个指向sized类型值的指针(见上文)。

  • This pointer is then coerced into the corresponding trait object type, which is a tuple of two values: the original pointer and something else. This coercion happens implicitly.
    然后,该 指针被强制转换为相应的 特征对象 类型,该特征对象类型是两个值的元组:原始指针和其他值。这种强制转换过程是隐式发生的。

What's important about unsizing coercions is that the information needed to generate the second value must be known at compile time.

关于 unsizing 强制转换的重要之处在于,生成第二个值所需的信息必须在编译时已知。

2.1 数组强制转换为切片(Arrays coerce to slices )

Arrays implicitly coerce to slices. The second value is simply the length of the slice.

数组隐式强制转换为切片。第二个值只是切片的长度。

Slices can also be created manually from a pointer and a length, but this is not a coercion, so in this case, the length need not be known at compile time. The standard library does this, for example in the Deref implementation of Vec.

也可以根据指针和长度手动创建切片,但这不是强制,因此在这种情况下,不需要在编译时知道长度。标准库就是这样做的,例如在 VecDeref 实现中。

While slices are definitely interesting, we won't discuss them further in this article.

虽然切片确实很有趣,但我们不会在本文中进一步讨论它们。

2.2 Sized类型 强制转换为 特征对象(Sized types coerce to trait objects)

Values of sized types implicitly coerce to trait objects for any object-safe trait they implement. The second value is a pointer to the so-called vtable.

Sized类型的值 隐式强制转换为它们实现的任何 object-safe特征的特征对象。第二个值是指向所谓的 vtable 的指针。

The vtable has many more names, Wikipedia says the following:

其中vtable 还有更多的名字,维基百科是这样说的:

A virtual method table (VMT), virtual function table, virtual call table, dispatch table, vtable, or vftable is a mechanism used in a programming language to support dynamic dispatch (or run-time method binding).

虚拟方法表 (VMT)、虚拟函数表、虚拟调用表、调度表、vtable 或 vftable 是编程语言中用于支持动态调度(或运行时方法绑定)的机制。

We'll explore the vtable in more detail.

我们将更详细地探讨 vtable。

3. Trait objects and the vtable Trait 对象和 vtable

The vtable is what allows Rust to call trait methods on a value without knowing its type. The vtable is generated at compile time and stored as part of the binary. As of Rust 1.43, the layout of the vtable is as follows (although rustc makes no guarantess about it):
vtable 允许 Rust 在不知道 值类型 的情况下调用特征方法。 vtable 在编译时生成并作为二进制文件的一部分存储。从 Rust 1.43 开始,vtable 的布局如下(尽管 rustc 对此不做任何保证):

Field 字段 Type 类型 意义
drop_in_place implementation Pointer 指向值的析构函数
size of the value usize 值占用内存大小
minimum alignment of the value usize 值的对齐方式
first trait function Pointer 第一个指向特征函数的指针
... ...
n'th trait function Pointer 第n个指向特征函数的指针

在内存中,特征对象 是一个胖指针,由指向值的指针和指向表示该值类型的表的指针组成。因此,每个特征对象占用两个机器字,如图 11-1 所示。注意vtable 的成员

image.png

图片来源 《Programming Rust》一书

Let's go through these items:

让我们来看看这些字段:

3.1 drop_in_place

When a Box is dropped or the strong count of an Rc or Arc drops to zero, the standard library calls drop_in_place on the value it points to. For sized types, the compiler statically knows how to drop a value. For slices, it calls drop_in_place for every element. For trait objects, it calls the drop_in_place implementation that from the vtable.

Box 被删除或者 RcArc 的 强引用计数降至零时,标准库会对指向的值调用drop_in_place ( drop_in_place执行指向值的析构函数)。对于sized类型,编译器静态地知道如何删除值。对于切片,它为每个元素调用 drop_in_place 。对于 特征对象,它调用 vtable 中的 drop_in_place 实现。

3.2 大小和对齐方式(Size and alignment )

The size and alignment are used to implement std::mem::size_of_val and std::mem::align_of_val. They are also used during code generation in the internals of the compiler.

大小和对齐方式用于实现 std::mem::size_of_valstd::mem::align_of_val 。它们还在编译器内部的代码生成过程中使用。

Since the size of the trait object is part of the vtable, logic dictates that you cannot create a trait object from a DST (e.g. a slice).

由于特征对象的大小是 vtable 的一部分,逻辑表明您不能从 DST(例如切片)创建特征对象。

3.3 指向特征函数的指针(Pointers to the trait functions )

To dynamically dispatch method calls, rustc needs function pointers to all trait methods (including supertraits). The order in which they appear in the vtable is unspecified.

为了动态派发方法调用,rustc 需要指向所有特征方法(包括super特征)的函数指针。但它们在 vtable 中出现的顺序未指定。

This brings us to object safety: In order to create a vtable, the compiler needs to create a function pointer for all trait methods and the first argument must always be a pointer to the object itself. Object safety makes sure that this is always possible. In particular, you cannot dynamically dispatch generic methods.

这给我们带来了 对象安全性:为了创建 vtable,编译器需要为所有特征方法创建函数指针,并且 第一个参数必须始终是指向对象本身的指针。对象安全确保这始终是可能的。特别是,您无法动态派发泛型方法。

4. 向上转型(Upcasting )

Let's come back to the original problem. Coming from object-oriented languages, upcasting is taken for granted. Imagine the following C++ code:

让我们回到最初的问题。来自面向对象的语言,向上转型 被认为是理所当然的。想象一下以下 C++ 代码:

class Super {};

class Sub : public Super {};

void func_taking_super(Super& obj) {
    // ...
}

void func_taking_sub(Sub& obj) {
    func_taking_super(obj);
}

Or the following C# code:

或者以下 C# 代码:

class Super {}

class Sub {}

static class Methods {
    void FuncTakingSuper(Super obj) {
        // ...
    }

    void FuncTakingSub(Sub obj) {
        FuncTakingSuper(obj);
    }
}

In Rust, as we've seen in the beginning, this isn't always possible:

但在 Rust 中,正如我们一开始所看到的,这并不总是可行的:

trait Super {}

trait Sub: Super {}

fn func_taking_super(obj: &T) {
    // ...
}

fn func_taking_super_dyn(obj: &dyn Super) {
    // ...
}

fn func_taking_sub(obj: &dyn Sub) {
    func_taking_super(obj); //此处是作为泛型约束传入   
    //func_taking_super_dyn(obj);//此处&dyn Sub需要转换为 &dyn Super 才能传入
}

This compiles and works. But uncommenting the second line in func_taking_sub leads to the following compiler error:

这可以编译并运行。但是取消注释 func_taking_sub 中的第二行会导致以下编译器错误:

error[E0308]: mismatched types
  --> src/lib.rs:18:27
   |
18 |     func_taking_super_dyn(obj);
   |                           ^^^ expected trait `Super`, found trait `Sub`
   |
   = note: expected reference `&dyn Super`
              found reference `&dyn Sub`

But why doesn't it work? It is completely reasonable to expect that it should. After all, the compiler knows how to call any method of the trait Super on values of type &dyn Sub.

但为什么不行呢?期望它应该可以是完全合理的。毕竟,编译器知道如何对 &dyn Sub 类型的值调用特征 Super 的任何方法。

The problem is that the vtable has to be generated at compile time and the compiler does not know the actual type of obj when compiling the function func_taking_sub.
问题是 vtable 必须在编译时生成,而编译器在编译函数 func_taking_sub 时并不知道 obj 的实际类型。

I can see two solutions to this:

这里有两个解决方案:

4.1 Solution 1 解决方案1

Change the layout of the vtable so that the vtables of supertraits are sub-tables of the main vtable. As far as I know, this is what C++ compilers do. For the following traits ...
更改 vtable 的布局,使 supertraits 的 vtable 成为主 vtable 的子表。据我所知,C++ 编译器就是这么做的。对于以下特征...

trait Super { /* ... */ }
trait Sub: Super { /* ... */ }

... a vtable of Sub would look like this:

... Sub 的 vtable 看起来像这样:

Field 字段 Type 类型 意义
drop_in_place implementation Pointer 指向值的析构函数
size of the value usize 值占用内存大小
minimum alignment of the value usize 值的对齐方式
first trait function of Super Pointer 第一个指向Super特征函数的指针
... ...
n'th trait function of Super Pointer 第N个指向Super特征函数的指针
first trait function of Sub Pointer 第一个指向Sub特征函数的指针
... ...
m'th trait function of Sub Pointer 第n个指向Sub特征函数的指针

Then you can use the same vtable pointer when upcasting Sub to Super.

然后,您可以在将 Sub 向上转换为 Super 时使用相同的虚函数表指针。

The problem here is that as soon as any trait in the chain has more than one supertrait, you'd have to repeat the drop_in_place pointer, the size and the alignment multiple times to allow upcasting to all possible supertraits. Consider the following traits ...

这里的问题是,一旦链中的任何特征具有多个super特征,您就必须 多次重复 drop_in_place 指针、大小和对齐方式,以允许向上转换到所有可能的super特征。考虑以下特征...

trait Super1 { /* ... */ }
trait Super2 { /* ... */ }
trait Sub: Super1 + Super2 { /* ... */ }
Field Type
vtable of Sub and Super1 -> drop_in_place implementation Pointer
size of the value usize
minimum alignment of the value usize
first trait function of Super1 Pointer
... ...
m'th trait function of Super1 Pointer
vtable of Super2 -> drop_in_place implementation Pointer
size of the value usize
minimum alignment of the value usize
first trait function of Super2 Pointer
... ...
n'th trait function of Super2 Pointer
first trait function of Sub Pointer
... ...
o'th trait function of Sub Pointer

This way, the compiler would still be able to determine a vtable of both Super1 and Super2 from a vtable of Sub. However, this gets pretty complex when more traits are involved.

这样,编译器仍然能够从 Sub 的 vtable 确定 Super1Super2 的 vtable。然而,当涉及更多特征时,这会变得相当复杂。

This is also part of why C++ class inheritance is so complex and I can understand why rustc developers would not want this complexity.

这也是 C++ 类继承如此复杂的部分原因,我可以理解为什么 rustc 开发人员不希望这种复杂性。

4.2 Solution 2 解决方案2

When creating a vtable, generate the vtables of all possible supertraits, and include pointers to those supertrait vtables in the vtable itself.

创建 vtable 时,生成所有可能的 supertraits 的 vtable,并在 vtable 本身中包含指向这些 supertrait vtable 的指针。

Field Type
drop_in_place implementation Pointer
size of the value usize
minimum alignment of the value usize
vtable of Super1 Pointer
vtable of Super2 Pointer
first trait function of Super1 Pointer
... ...
m'th trait function of Super1 Pointer
drop_in_place implementation Pointer
size of the value usize
minimum alignment of the value usize
first trait function of Super2 Pointer
... ...
n'th trait function of Super2 Pointer
first trait function of Sub Pointer
... ...
o'th trait function of Sub Pointer

This is not nearly as complex and would (probably unnecessarily) increase binary size.
这并不复杂,并且会(可能不必要地)增加二进制大小。

You could also combine these solutions and choose solution 1. where easily possible, but fall back to solution 2. otherwise. Both of these solutions add complexity to the compiler that may be undesirable.

您还可以组合这些解决方案,并在可能的情况下选择解决方案 1.,但否则返回到解决方案 2.。这两种解决方案都增加了编译器的复杂性,这可能是不受欢迎的。

5. A practical solution 实用的解决方案

Due to the added complexity, I am unsure if Rust will ever allow upcasting trait objects. After all, I seem to be the first one to have cared about this.

由于增加了复杂性,我不确定 Rust 是否允许向上转换特征对象。毕竟,我似乎是第一个关心这个问题的人。

(It seems that I may have been wrong about this. There is a tracking issue and an experimental pull request about this topic.)

(看来我对此可能是错的。存在一个跟踪问题和关于该主题的实验性pull request。)

However, there is a neat trick to solve this problem, at least for traits that you define yourself.

然而,有一个巧妙的技巧可以解决这个问题,至少对于您自己定义的特征来说是这样。

trait Super: AsDynSuper {}

trait AsDynSuper {
    fn as_dyn_super
    where  Self: 'a;
}
// 为所有 T: Super + Sized 实现 AsDynSuper  特征的方法as_dyn_super
impl AsDynSuper for T {
    fn as_dyn_super
    where  Self: 'a,
    {
        self
    }
}

trait Sub: Super {}

fn upcast(obj: Arc) -> Arc {
    obj.as_dyn_super()
}

This compiles and works. And whoever implements the Super trait does not need to do anything, if the type is sized.

这可以编译并运行。如果是sized类型,那么无论谁实现 Super 特征都不需要做任何事情。

The downside is that AsDynSuper is not automatically implemented for DSTs that implement Super. If you want to implement Super for a DST, then you need to implement AsDynSuper and panic! in the implementation of as_dyn_super, since you cannot create a trait object. This is inconvenient, but not an issue for many use cases.

缺点是 AsDynSuper 不会自动为实现 Super 的 DST 实现。如果您想为 DST 实现 Super ,那么您需要在 as_dyn_super 的实现中实现 AsDynSuperpanic! ,因为您无法创建特征对象。这很不方便,但对于许多用例来说并不是问题。

And because I am such a huge fan of macros, I created the as-dyn-trait crate that solves this problem automatically for your traits:

因为我是宏的忠实粉丝,所以我创建了 as-dyn-trait 箱,可以根据您的特征自动解决这个问题:

#[as_dyn_trait]
trait Super {}

trait Sub: Super {}

fn upcast(obj: Arc) -> Arc {
    obj.as_dyn_super()
}

6. Closing 结论

I understand now why upcasting a trait object in Rust is problematic, and I found a workaround for my use case. On the way, I also explored traits, generics and DSTs in more detail.

我现在明白为什么在 Rust 中向上转换特征对象是有问题的,并且我找到了适合我的用例的解决方法。在此过程中,我还更详细地探讨了特征、泛型和 DST。

If you think any part of this article is confusing, misleading or even incorrect, please file an issue or open a pull request on GitHub.

如果您认为本文的任何部分令人困惑、具有误导性甚至不正确,请在 GitHub 上提出问题或打开pull request。

Thanks for reading! 谢谢阅读!

相关文章

JavaScript2024新功能:Object.groupBy、正则表达式v标志
PHP trim 函数对多字节字符的使用和限制
新函数 json_validate() 、randomizer 类扩展…20 个PHP 8.3 新特性全面解析
使用HTMX为WordPress增效:如何在不使用复杂框架的情况下增强平台功能
为React 19做准备:WordPress 6.6用户指南
如何删除WordPress中的所有评论

发布评论