Questions tagged [compiler-optimization]

Compiler optimization involves adapting a compiler to reduce run-time or object size or both. This can be accomplished using compiler arguments (i.e. CFLAGS, LDFLAGS), compiler plugins (DEHYDRA for instance) or direct modifications to the compiler (such as modifying source code).

1
vote
1answer
78 views

Does the order of the constant operand with any arithmetic operator affect optimization?

Assume we define const or constexpr and we do simple arithmetic operator, several times, also variables and function calls won't express or return constants. #define NU = 3; //Macro or const ...
-1
votes
0answers
30 views

Do functions in large loops get inlined? [duplicate]

Look at this code: inline void someFunction() { //more than several lines of code } ----------------------- for (int i = 0; i < 1000000; ++i) { someFunction(/*several arguments*/); } I ...
2
votes
1answer
53 views

GCC is neither sinking assignements nor eliminating partial dead codes

I am trying to see whether or not the GCC applies partial dead code elimination optimization. Partial dead code elimination can be considered as the result of the two opt passes: assignment sinking + ...
29
votes
1answer
1k views

Array initialization optimization

When compiling the following code snippet (clang x86-64 -O3) std::array<int, 5> test() { std::array<int, 5> values {{0, 1, 2, 3, 4}}; return values; } It produced the typical ...
0
votes
0answers
24 views

What “phantom” means in javascriptcore compiler?

In javascriptcore optimization compiler codes, there are some operations which has prefix "phantom". For example, PhantomNewArrayBuffer, PhantomNewObject, PhantomNewFunction, etc). I don't know ...
2
votes
1answer
180 views

Why vector length SIMD code is slower than plain C

Why is my SIMD vector4 length function 3x slower than a naive vector length method? SIMD vector4 length function: __extern_always_inline float vec4_len(const float *v) { __m128 vec1 = ...
3
votes
1answer
115 views

Changing dynamic type of an object in C++

In the following question one of the answers suggested that the dynamic type of an object cannot change: When may the dynamic type of a referred to object change? However, I've heard that it is not ...
2
votes
1answer
50 views

atomic operations take longer than locking (without contention)

I'm trying to measure the overhead of the various synchronization options when there is no contention. I use the following program: #include <atomic> #include <chrono> #include <...
1
vote
1answer
64 views

Is there efficiency lost when declaring an automatic variable in a frequently opening block scope { }?

In K&R it is stated: An automatic variable declared and initialized in a block is initialized each time the block is entered. Here is a code snippet solely for communicating the question. ...
1
vote
1answer
44 views

What is lost in going from AVX512 on Intel Xeon Phi to AVX2 on Intel i5-8259U?

Trying to follow a course on Coursera, I tried to optimize a sample C++ code for my Intel i5-8259U CPU which I believe supports AVX2 SIMD instructions set. Now, AVX2 supplies 16 registers per core (...
0
votes
0answers
41 views

Should “#pragma optimize(“”, off)” be in header, .cpp or both?

Should #pragma optimize reside in B.h :- class B{ #pragma optimize( "", off ) public: void f(); #pragma optimize( "", on ) }; , or B.cpp :- #include "B.h" #pragma optimize( "", off ) ...
3
votes
1answer
97 views

Am I seeing an optimisation bug when iterating through a list in C?

I have been working on integrating a colleague's software library into our larger application. He has been writing and testing his library under -O0 on gcc 4.9.3. This is embedded software for an ...
67
votes
2answers
3k views

Inconsistent behavior of compiler optimization of unused string

I am curious why the following piece of code: #include <string> int main() { std::string a = "ABCDEFGHIJKLMNO"; } when compiled with -O3 yields the following code: main: ...
-1
votes
0answers
57 views

Can floating point contraction be made repeatable for a given compilation?

This questions is about how to avoid non-determinism of a function in a single compilation. It is not about the practical or theoretical consequences of non-determinism. It is not about dealing with ...
8
votes
0answers
77 views

Missed optimization in clang only on powers of two

When compiling with -Ofast, clang correctly deduces that the following function will always return 0. int zero(bool b) { const int x = 5; return (x * b) + (-x * b); } compiles to zero(bool):...
3
votes
0answers
62 views

Options to reduce the number of template instantiations for std::vector of trivial type

Suppose I have the following sample code just to instantiate the insert method of std::vector for at least two trivial types: #include <vector> void insert(std::vector<int>& v, int ...
3
votes
0answers
112 views

Creating a mask using C++ bit shift intrinsics

Let's say we want to create a mask of type unsigned short with contiguous set of len 1s starting from position a with as little instructions as possible. The most popular way to do this is T mask = ((...
0
votes
2answers
45 views

Can a sequential always block be triggered by a short pulse from a combi block

Could a sequential always block be triggered by a short lived pulse coming from a combi block ? I have tried to trigger the always block, by assigning a value and set the value back to 0 in an ...
4
votes
1answer
94 views

Multi-threaded degradation of performance with newer versions of g++?

I've written some C++ backpropagation code which I'm running on a i9-9900K in Ubuntu 18.04. The issue I'm seeing is that I'm getting progressively worse mulithreaded performance with newer versions ...
2
votes
0answers
66 views

Should we expect the compiler to optimize away variables that are only used once in the function body?

I've seen it recommended (see here, for example) that if you want to create a copy of a parameter to a function, then you should pass it by value so that the copy is done automatically in the ...
0
votes
0answers
29 views

Stripping out static global objects with LLVM LTO

I am working on a static library which gets linked into multiple binaries. My objective is to reduce the memory footprint of my library when it gets linked in. The users of my library require certain ...
0
votes
0answers
58 views

Does C# allocate Array of Structs as Structure of Arrays in memory?

I am wondering if there is any difference in these 2 constructions in memory when writing c#? I have made an example and both constructions run at the same speed. for (int i = 0; i < ...
6
votes
0answers
141 views

Function pointer indirection not optimized by gcc or clang - bug or intended?

Given the following two functions: int f() { return 0; } int g() { return 1; } And the following code to invoke one of them depending on a boolean b: int t0(bool b) { return (b ? &f : &g)();...
1
vote
1answer
25 views

Create React App Typescript: Transpile only, do not type check, do not lint

I have a small React project I want to deploy on a Google Compute Engine instance with limited RAM, under 1.5 GB. When building a production version on my app, the typescript linter and compiler ...
0
votes
0answers
92 views

Is Lambda Lifting necessary?

Many functional programming language's compiler has a Lambda Lifting pass, and I'm wondering, is this the best way to implement a functional programming language? Let's consider the following example:...
0
votes
0answers
48 views

Different evaluation for object equality when running Python with file or in a shell [duplicate]

I know that Python has optimizations regarding numbers and the generation of objects, for example: >>> a = 255 # or another number from -5 to 256 >>> b = 999 # or another number ...
0
votes
1answer
21 views

GraalVM: How to implement compiler optimizations?

I want to develop a tool that performs certain optimizations in a program based on the program structure. For example, let's say I want to identify if-else within a loop, and my tool shall rewrite it ...
10
votes
1answer
194 views

Which of these pointer comparisons should a conforming compiler be able to optimize to “always false”?

In an attempt to get a better understand of how pointer aliasing invariants manifested during optimization, I plugged some code into the renowned Compiler Explorer, which I'll repeat here: #include &...
6
votes
0answers
156 views

Why are unnecessary atomic loads not optimized away?

Let's consider this trivial code: #include <atomic> std::atomic<int> a; void f(){ for(int k=0;k<100;++k) a.load(std::memory_order_relaxed); } MSVC, Clang and GCC all ...
0
votes
0answers
32 views

Swift - Ensure CVarArg parameters are of the correct type and count

In Objective-c I use NS_FORMAT_FUNCTION(...) (explaination) and in Swift I've found somewhere some mention to printflike but I can't find a way to use it. Any idea on how to ensure that the CVarArg ...
0
votes
1answer
21 views

arm-none-eabi-gcc not inferring floating point multiply-accumulate from code

The ARM fpv5 instruction set supports double precision floating point operations, including single cycle multiply accumulate instructions (VMLA/VMLS) as detailed in their ISA documentation. ...
0
votes
1answer
64 views

Can we implement tail recursion modulo cons et al. through trampolines?

You can regard trampolines as compiler optimizations reified in the program. So what is stopping us from adapting more general optimization techniques in exactly the same manner. Here is a sketch of ...
2
votes
0answers
42 views

Is it possible to determine the compiler / linker options used to generate a binary (exe, dll)?

Assume the following situation: you have either a DLL or an executable file, all you know is that this is the binary result created by a specific Visual Studio Version (i.e. VS 2017) using C++. ...
0
votes
3answers
107 views

Is it bad if my program only achieves the results I want when compiler optimizations are turn off?

I have a larger project, I will not post the code here as it is too much for one post, written in c++. It is a checkers AI which uses Minimax and an evaluation function of my own design to find the ...
5
votes
1answer
76 views

How can I work around GCC optimization-miss bug 90271?

GCC versions released before May 2019 (and maybe later) fail to optimize this piece of code: // Replace the k'th byte within an int int replace_byte(int v1 ,char v2, size_t k) { memcpy( (void*) ((...
0
votes
1answer
53 views

What is better in terms of cpu cost? Shifting bits in runtime or store all possible values in array?

I'm writing a C++ code to ESP8266 MCU in the Arduino platform and I'm trying to get my code as efficient as possible. To operate other MCU via I2C, I need to configure his internals registers which ...
1
vote
2answers
67 views

Why does gnu_inline attribute affects code generation so much compared to general inlining?

Why does using the extern inline __attribute__((gnu_inline)) over static inline affects GCC 8.3 code generation so much? The example code is based on glibc bsearch code (build with -O3): #include &...
1
vote
0answers
79 views

How to correctly mix debug and release binaries involving std::string among others?

I have a c++ project, I want to use it with debug binary, but I also want to use its dependencies no in debug but release. The problem is that is does not seem possible, in particular when I have std:...
0
votes
0answers
67 views

reduce executable size mixing dynamically / static compilations

I am coding a project using c++ under visual c++ 2017 , but I would like to know if I could compile dynamically my project , and add #pragma comment(lib,"ws2_32") only once for all the project my ...
9
votes
1answer
119 views

Can compilers (specifically rustc) really simplify triangle-summation to avoid a loop? How?

On page 322 of Programming Rust by Blandy and Orendorff is this claim: ...Rust...recognizes that there's a simpler way to sum the numbers from one to n: the sum is always equal to n * (n+1) / 2. ...
26
votes
1answer
482 views

Why might a C++ compiler duplicate a function exit basic block?

Consider the following snippet of code: int* find_ptr(int* mem, int sz, int val) { for (int i = 0; i < sz; i++) { if (mem[i] == val) { return &mem[i]; } } ...
12
votes
2answers
338 views

Is there a flaw in how clang implements char8_t or does some dark corner of the standard prohibit optimization?

clang 8.0.0 introduces support for the char8_t type from c++20. However, I would expect the following functions to have the same compiler output #include <algorithm> bool compare4(char const* ...
4
votes
2answers
69 views

Can I rely on the compiler finding and optimizing simple boolean loop invariants?

I have a loop like the one below which has an invariant, here the never changing value of scaleEveryValueByTwo. Can I rely on the compiler finding this invariant and not checking the condition in ...
3
votes
2answers
55 views

Does JavaScript optimise multiple pure filters/maps/etc?

I have this JavaScript code written in functional style: someArray .filter((element) => element) .map((element) => element.property) .map((property) => doSomethingWithIt) Now, a naïve ...
4
votes
0answers
45 views

embed string via header that cannot be optimized away

While developing a header-only library, I'd like to make sure that a given string is embedded in all binaries that use my header, even if the compiler is configured to optimize away unused constants, ...
2
votes
1answer
105 views

Irritating performance decrease by a simple additional branch

In my project I have a function in which a code path should be conditionally skipped for performance reasons. If the condition is true, I have an increase of up to 50 % as expected. But if the ...
3
votes
2answers
87 views

Why do Compilers put data inside .text(code) section of the PE and ELF files and how does the CPU distinguish between data and code?

So i am referencing this paper : Binary Stirring: Self-randomizing Instruction Addresses of Legacy x86 Binary Code https://www.utdallas.edu/~hamlen/wartell12ccs.pdf Code interleaved with data: ...
2
votes
1answer
43 views

Different behavior when I running a program compiled with G ++ in Docker

The behavior of the executable is different if it is run inside the docker, or on the host. But this only happens when we change the optimization level of G++. Compiler: g++ (Ubuntu 7.3.0-27ubuntu1~...
1
vote
1answer
51 views

Optimization bug in Apple's LLVM, or bug in code?

I have some iOS C++ code that compiles correctly on my local machine (LLVM 9.0) but compiles incorrectly on my build server (LLVM 10.0). The project is generated via CMake (same version on both) so ...
1
vote
2answers
102 views

how can I get clang to vectorize a simple loop?

I have the following loop: float* s; float* ap; float* bp; ... // initialize s, ap, bp for(size_t i=0;i<64;++i) { s[i] = ap[i]+bp[i]; } Seems like a good candidate for vectorization. Though ...