| commit | 01164dbcc1dcd5b8a1f09af225ed5582fbb0f3cb | [log] [tgz] |
|---|---|---|
| author | David Stone <[email protected]> | Fri Nov 21 23:12:54 2025 |
| committer | David Stone <[email protected]> | Fri Nov 21 23:12:54 2025 |
| tree | 4f0aa837aa051653e998925b01b130ef97aa63c9 | |
| parent | c3bc565c959ee690db386241d2c16d808b961faf [diff] |
[llvm][clang] Remove `llvm::OwningArrayRef`
`OwningArrayRef` has several problems.
The naming is strange: `ArrayRef` is specifically a non-owning view, so the name means "owning non-owning view".
It has a const-correctness bug that is inherent to the interface. `OwningArrayRef<T>` publicly derives from `MutableArrayRef<T>`. This means that the following code compiles:
```c++
void const_incorrect(llvm::OwningArrayRef<int> const a) {
a[0] = 5;
}
```
It's surprising for a non-reference type to allow modification of its elements even when it's declared `const`. However, the problems from this inheritance (which ultimately stem from the same issue as the weird name) are even worse. The following function compiles without warning but corrupts memory when called:
```c++
void memory_corruption(llvm::OwningArrayRef<int> a) {
a.consume_front();
}
```
This happens because `MutableArrayRef::consume_front` modifies the internal data pointer to advance the referenced array forward. That's not an issue for `MutableArrayRef` because it's just a view. It is an issue for `OwningArrayRef` because that pointer is passed as the argument to `delete[]`, so when it's modified by advancing it forward it ceases to be valid to `delete[]`. From there, undefined behavior occurs.
It is mostly less convenient than `std::vector` for construction. By combining the `size` and the `capacity` together without going through `std::allocator` to get memory, it's not possible to fill in data with the correct value to begin with. Instead, the user must construct an `OwningArrayRef` of the appropriate size, then fill in the data. This has one of two consequences:
1. If `T` is a class type, we have to first default construct all of the elements when we construct `OwningArrayRef` and then in a second pass we can assign to those elements to give what we want. This wastes time and for some classes is not possible.
2. If `T` is a built-in type, the data starts out uninitialized. This easily forgotten step means we access uninitialized memory.
Using `std::vector`, by constrast, has well-known constructors that can fill in the data that we actually want on construction.
`OwningArrayRef` has slightly different performance characteristics than `std::vector`, but the difference is minimal.
The first difference is a theoretical negative for `OwningArrayRef`: by implementing in terms of `new[]` and `delete[]`, the implementation has less room to optimize these calls. However, I say this is theoretical because for clang, at least, the extra freedom of optimization given to `std::allocator` is not yet taken advantage of (see https://github.com/llvm/llvm-project/issues/68365)
The second difference is slightly in favor of `OwningArrayRef`: `sizeof(std::vector<T>) == sizeof(void *) 3` on pretty much any implementation, whereas `sizeof(OwningArrayRef) == sizeof(void *) * 2` which seems like a win. However, this is just a misdirection of the accounting costs: array-new sticks bookkeeping information in the allocated storage. There are some cases where this is beneficial to reduce stack usage, but that minor benefit doesn't seem worth the costs. If we actually need that optimization, we'd be better served by writing a `DynamicArray` type that implements a full vector-like feature set (except for operations that change the size of the container) while allocating through `std::allocator` to avoid the pitfalls outlined earlier.
Welcome to the LLVM project!
This repository contains the source code for LLVM, a toolkit for the construction of highly optimized compilers, optimizers, and run-time environments.
The LLVM project has multiple components. The core of the project is itself called “LLVM”. This contains all of the tools, libraries, and header files needed to process intermediate representations and convert them into object files. Tools include an assembler, disassembler, bitcode analyzer, and bitcode optimizer.
C-like languages use the Clang frontend. This component compiles C, C++, Objective-C, and Objective-C++ code into LLVM bitcode -- and from there into object files, using LLVM.
Other components include: the libc++ C++ standard library, the LLD linker, and more.
Consult the Getting Started with LLVM page for information on building and running LLVM.
For information on how to contribute to the LLVM project, please take a look at the Contributing to LLVM guide.
Join the LLVM Discourse forums, Discord chat, LLVM Office Hours or Regular sync-ups.
The LLVM project has adopted a code of conduct for participants to all modes of communication within the project.