Previously, md5 was used which is significantly slower. In almost all cases
this does not have a significant performance impact in practice. However,
it's possible to build geometry nodes setups that become a few percent
faster ( by combining lots of cheap node groups). Using xxhash instead of
md5 should never be slower.
Pull Request: https://projects.blender.org/blender/blender/pulls/120225
This is intended to be used in the new exact mesh boolean algorithm by @howardt.
The new `BLI_fixed_width_int.hh` header provides types like `Int256` and
`UInt256` which are like e.g. `uint64_t` but with higher precision. The code
supports many different integer sizes.
The following operations are supported:
* Addition
* Subtraction
* Multiplication
* Comparisons
* Negation
* Conversion to and from other number types
* Conversion to and from string (based on `GMP`)
Division is not implemented. It could be implemented, but it's more complex and
is not required for the new mesh boolean algorithm.
Some alternatives to having a custom implementation have been discussed in
https://devtalk.blender.org/t/fixed-length-multiprecision-arithmetic/29189/.
Generally, the implementation is fairly straight forward. The main complexity is
the addition/multiplication algorithm which isn't too complicated. It's nice to
have control over this part as it allows us to optimize the code more if
necessary. Also, from what I understand, we might be able to benefit from some
special cases like multiplying a large integer with a smaller one.
I tried some different ways to optimize this already, but so far the normal
compiler optimization turned out to work best. Not sure if the same is true on
windows though, as it doesn't have native support for an `int128` which helps
the compiler understand what I'm doing. Alternatives I tried so far are using
intrinsics directly (mainly `_addcarry_u64` and similar), writing inline
assembly manually and copying the assembly output from the compiler. I assume
the assembly implementation didn't help for me because it prohibited other
compiler optimizations.
Pull Request: https://projects.blender.org/blender/blender/pulls/119528
The `IndexMask` data structure was designed to allow us to implement set
operations like `union`, `intersection` and `difference` efficiently
(2cfcb8b0b8). This patch adds an evaluator for
arbitrary expressions involving the mentioned operations. The evaluator makes
use of the design of the `IndexMask` data structure to be quite efficient.
In some common cases, the evaluator runs in constant time. So it's very fast
even if the mask contains many millions of indices. If possible the evaluator
works on entire segments at once instead of looking at the individual indices.
This results in a very low constant factor even if the evaluation time is
linear. If the evaluator has to look at the individual indices to be able to
perform the operation, it can make use of multi-threading.
The evaluation consists of the following steps:
1. A coarse evaluation that looks at entire segments at once.
2. All segments that couldn't be fully evaluated by the coarse evaluation are
evaluated exactly by looking at the actual indices. There are two evaluators
for this case. One that is based on `std::set_union` etc. The other one first
converts the index masks to bit spans, then does bit operations to evaluate
the expression, and then converts the bits back into indices. Depending on
the expression, one or the other can be more efficient.
3. Construct an index mask from the evaluated segments.
Showing the performance of the evaluator is kind of difficult because it highly
depends on the input data. Comparing the performance to something that does not
short-circuit when there are full ranges is meaningless, because one can
construct an example where the new evaluator is arbitrarily faster. I'm still
working on a case where performance can be compared to e.g. using
`std::set_union`. This comparison is only fair when the input data when
constructing a case where the new evaluator can't short-circuit.
One of the main remaining bottlenecks are the calls to `slice_content` on large
index masks. I think the impact of those can still be reduced.
We are not using this evaluator much yet, except through `IndexMask::complement`
calls. I intend to use it when I get to refactoring the field evaluator for
geometry nodes to optimize the evaluation of selections.
Pull Request: https://projects.blender.org/blender/blender/pulls/117805
This adds a new special purpose container data structure that can be
used to gather many elements into many (potentially small) lists efficiently.
I originally worked on this data structure because I might want to use it
in #118772. However, also it's useful in the geometry nodes logger already.
I'm measuring a 10-20% speed improvement in my many-math-nodes file
when I enable logging for all sockets (not just the ones that are currently visible).
Pull Request: https://projects.blender.org/blender/blender/pulls/118774
The difficulty of implementing this iterator is that it requires lots of operator
overloads which are usually very simple to implement, but result in a lot of code.
The goal of this patch is to abstract the common parts so that it becomes easier
to implement random accessor iterators. Many algorithms can work more
efficiently with random access iterators than with other iterator types.
Also see https://en.cppreference.com/w/cpp/iterator/random_access_iterator
Pull Request: https://projects.blender.org/blender/blender/pulls/118113
Use EXPECT_NEAR instead of EXPECT_EQ to account for a differences in
atan2 implementation on macOS, more generally relying on exact
float comparison for tests is error prone.
Move BLI_convexhull_aabb_fit_points_2d to a public function to be able
to compare compare fitting one convex hull with a simple reference
method.
One test is disabled as it exposes an error in convex hull calculation
which needs further investigation.
`UUID` generally stands for "universally unique identifier". The session identifier that
we use is neither universally unique, nor does it follow the standard. Therefor, the term
"session uuid" is confusing and should be replaced.
In #116888 we briefly talked about a better name and ended up with "session uid".
The reason for "uid" instead of "id" is that the latter is a very overloaded term in Blender
already.
This patch changes all uses of "uuid" to "uid" where it's used in the context of a
"session uid". It's not always trivial to see whether a specific mention of "uuid" refers
to an actual uuid or something else. Therefore, I might have missed some renames.
I can't think of an automated way to differentiate the case.
BMesh also uses the term "uuid" sometimes in a the wrong context (e.g. `UUIDFaceStepItem`)
but there it also does not mean "session uid", so it's *not* changed by this patch.
Pull Request: https://projects.blender.org/blender/blender/pulls/117350
The term `PIL` stands for "platform independent library." It exists since the `Initial Revision`
commit from 2002. Nowadays, we generally just use the `BLI` (blenlib) prefix for such code
and the `PIL` prefix feels more confusing then useful. Therefore, this patch renames the
`PIL` to `BLI`.
Pull Request: https://projects.blender.org/blender/blender/pulls/117325
Part of overall "improve filtering situation" (#116980) task:
* Add Bicubic filtering option to strip Transform "Filter" setting.
Previously this option only existed in Transform Effect "Interpolation"
setting.
- With this addition, it feels like the transform effect could
possibly be marked as legacy/deprecated, since the regular Transform
that is on all strips can do everything that Transform Effect did?
* Speed up bicubic filtering (used now in VSE, but also in CPU Compositor,
image paint, etc.) by slightly simplifying the code and using some SIMD.
Upscaling 96x54 image to 3840x2160 resolution, using Bicubic filtering:
- Windows (VS2022, Ryzen 5950X): 35.5ms -> 15.1ms
- Mac (clang 15, M1 Max): 29.6ms -> 24.4ms
* Add gtest coverage for bicubic functionality.
Pull Request: https://projects.blender.org/blender/blender/pulls/117100
Adds a header that defines the same constants as the C++ 20
<numbers> header.
Benefits:
- Decouple our C++ and C math APIs
- Avoid using macros everywhere, nicer syntax
- Less header parsing during compilation
- Can be replaced by `std::numbers` with C++ 20
Downsides:
- There are fewer numbers defined in the C++ standard header
- Maybe we should just wait until we can use C++ 20
Pull Request: https://projects.blender.org/blender/blender/pulls/116805
Bundling many tests in a single binary reduces build time and disk space
usage, but is less convenient for running individual tests command line
as filter flags need to be used.
This adds WITH_TESTS_SINGLE_BINARY to generate one executable file per
source file. Note that enabling this option requires a significant amount
of disk space.
Due to refactoring, the resulting ctest names are a bit different than
before. The number of tests is also a bit different depending if this
option is used, as one uses gtests discovery and the other is organized
purely by filename, which isn't always 1:1.
Co-authored-by: Sergey Sharybin <sergey@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/114604
There are some tragic design flaws with the Microsoft STL
implementation of `std::dequeue`. Unless we implement our
own similar data structure or use an implementation from
another library, the change isn't worth it.
This reverts commit b26cd6a4b9.
This reverts commit cc11ba33d9.
This reverts commit c929d75054.
This reverts commit bd3d5a750d.
GSQueue dates back over 21 years, past the initial git commit. Nowadays
we generally prefer to use data structures from the C++ standard library
or our own C++ data structures. Previous commits replaced this container
with `std::queue` in a few areas. Now it is unused and can be removed.
IMB_transform is used by Sequencer (and other places) to do image
translation/rotation/scale on the CPU. This PR speeds up parts of it,
particularly when bilinear filtering is used. No behavior changes are
expected.
- Don't use virtual function calls inside inner loop. The code was using
class hierarchies with virtual calls just to do equivalent of "outside
of image? ignore" and "wrap UV coordinates or not?" decisions. Make those
use non-virtual function based code.
- Simplify pixel sampling functions to only do the work as needed by
anything within Blender codebase. For example, bilinear sampling of uchar
images always uses 4 RGBA channels and never does "UV wrap" logic.
- Bilinear interpolation uchar: completely branchless SIMD code now.
- Bilinear interpolation float: 2x floor() calls instead of 4x floor() +
2x ceil(), and final sample blending is done with SIMD.
Sequencer at 4K UHD resolution, with two image strips that need a transform,
playback framerate:
- Windows Ryzen 5950X: 18.7fps -> 26.2fps (IMB_transform time per frame goes
26.3ms -> 11.2ms)
- Mac M1 Max: 27.3fps -> 31.4fps
At that point the IMB_transform is not the slowest part of where playback
takes time (but rather sequencer effect application etc.).
Note: the amount of _actual code_ got a bit smaller. But I've added 100 lines
of unit tests in BLI_math_interp_test.cc, the bilinear interpolation
functions were only tested very indirectly by CPU compositor template
image tests.
Pull Request: https://projects.blender.org/blender/blender/pulls/115653
Use blender::Set which is similar but offsers better type safety
and likely better performance as well. The only remaining user
was the mesh edit mode knife tool, and replacing that usage
with `Set` and `Map` was straightforward.
This adds a new function, `compare_meshes`,
as a replacement for `BKE_mesh_cmp`.
The main benefits of the new version are the following:
- The code is written in c++, and makes use of the new attributes API.
- It adds an additional check, to see if the meshes only differ by
their indices. This is useful to verify correctness of new algorithmic
changes in mesh code, which might produce mesh elements in a different
order than the original algorithm. The tests will still fail, but the
error will show that the indices changed.
Some downsides:
- The code is more complex, due to having to be index-independent.
- The code is probably slower due to having to do comparisons "index-
independently". I have not tested this, as correctness was my priority
for this patch. A future update could look to improve the speed,
if that is desired.
- This is technically a breaking API change, since it changes the
returned values of `rna_Mesh_unit_test_compare`. I don't think that
there are many people (if any) using this, besides our own unit tests.
All tests that pass with `BKE_mesh_cmp` still pass with the new version.
**NOTE:**
Currently, mesh edge indices are allowed to be different in the
comparison, because `BKE_mesh_cmp` also allowed this. There are some
tests which would fail otherwise. These tests should be updated, and
then the corresponding code as well.
I wrote up a more detailed explanation of the algorithm here:
https://hackmd.io/@bo-JY945TOmvepQ1tAWy6w/SyuaFtay6
Pull Request: https://projects.blender.org/blender/blender/pulls/112794
Especially on windows, direct output to `cout` via `<<` is very expensive.
Instead, use fmtlib to do all formatting into a no-alloc `fmt::memory_buffer`,
and output that with one call to `cout`.
timeit utilities are not used much by default, but during development or
profiling one often uncomments macros like `DEBUG_TIME` that then enable
`SCOPED_TIMER` or `SCOPED_TIMER_AVERAGED`.
Having one `SCOPED_TIMER_AVERAGED` inside sequencer `draw_channels`, with
empty timeline and all default channels; the overhead in % of `draw_channels`
duration of said scoped timer before and after this change:
- Windows: 29% -> 5%
- Mac: 5.0% -> 4.4%
Pull Request: https://projects.blender.org/blender/blender/pulls/115233
Windows allows people to pin an application to their taskbar, when a
user pins blender, the data we set in
`GHOST_WindowWin32::registerWindowAppUserModelProperties` is used
which includes the path to the `blender-launcher.exe`. Now once that
shortcut is created on the taskbar, this will never be updated, if
people remove blender and install it again to a different path
(happens often when using nightly builds) this leads to the
situation where the shortcut on the taskbar points to a no longer
existing blender installation. Now you may think, just un-pin and
re-pin that should clear that right up! It doesn't, it'll keep using
the outdated path till the end of time and there's no window API call
we can do to update this information. However this shortcut is stored
in the user profile in a sub-foder we can easily query, from there, we
can iterate over all files, look for the one that has our appid in it, and
when we find it, update the path to the blender launcher to the
current installation, bit of a hack, but Microsoft seemingly offers no
other way to deal with this problem.
Pull Request: https://projects.blender.org/blender/blender/pulls/113859
The last good commit was 8474716abb.
After this commits from main were pushed to blender-v4.0-release. These are
being reverted.
Commits a4880576dc from to b26f176d1a that happend afterwards were meant for
4.0, and their contents is preserved.
winstuff.c got converted to .cc recently, and a merge was done of
#113674 from 4.0 which still was using C style code.
Small update in code style was required here.
Some of these devices are not capable of running >=4.0, due to issues
with Mesa's Compute Shaders and their D3D drivers.
This PR marks those GPUs as unsupported, and prints info to stdout.
A driver update will be available for 8cx Gen3 on the 17th October
from here:
https://www.qualcomm.com/products/mobile/snapdragon/pcs-and-tablets/snapdragon-8-series-mobile-compute-platforms/snapdragon-8cx-gen-3-compute-platform#Software
It will take longer via the standard MS Windows Update channels,
as there is certification, testing, etc required, but it is possible
to get the drivers, at least.
This issue applies even when using emulated x64.
If this does not get merged, all WoA devices will break with 4.0,
where older ones will just launch a grey screen and crash, and newer
ones will open, but scenes will not render correctly in Workbench.
These devices work by using Mesa's D3D12 Gallium driver ("GLOn12"),
which is why we have to read the DirectX driver version - the version
reported by OpenGL is the mesa version, which is independent of the
driver (which is the part with the bug).
Pull Request: https://projects.blender.org/blender/blender/pulls/113674