Why is the “alignment” the same on 32-bit and 64-bit systems?Why is integer assignment on a naturally aligned variable atomic on x86?Memory alignment on a 32-bit Intel processorWhy does unaligned access to mmap'ed memory sometimes segfault on AMD64?Why is the default alignment for `int64_t` 8 byte on 32 bit x86 architecture?How do you set, clear, and toggle a single bit?Why can templates only be implemented in the header file?Why is “using namespace std” considered bad practice?What is an application binary interface (ABI)?Why are elementwise additions much faster in separate loops than in a combined loop?Why is reading lines from stdin much slower in C++ than Python?Why is it faster to process a sorted array than an unsorted array?Why should I use a pointer rather than the object itself?Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviationsWhy g++ isn't performing structure packing here?
Out of scope work duties and resignation
Agena docking and RCS Brakes in First Man
Is Soreness in Middle Knuckle of Fretting Hand Index Finger Normal for Beginners?
Has the United States ever had a non-Christian President?
What is a common way to tell if an academic is "above average," or outstanding in their field? Is their h-index (Hirsh index) one of them?
Why would a military not separate its forces into different branches?
How do I calculate how many of an item I'll have in this inventory system?
How can I get people to remember my character's gender?
When an imagined world resembles or has similarities with a famous world
Should homeowners insurance cover the cost of the home?
Python 3 - simple temperature program
To kill a cuckoo
Is 'contemporary' ambiguous and if so is there a better word?
A factorization game
Is disk brake effectiveness mitigated by tyres losing traction under strong braking?
Feasibility of lava beings?
Will 700 more planes a day fly because of the Heathrow expansion?
Is there an age requirement to play in Adventurers League?
What do "Sech" and "Vich" mean in this sentence?
Indentation Tex
Is there a word for food that's gone 'bad', but is still edible?
How to pass hash as password to ssh server
Mug and wireframe entirely disappeared
When does tabularx decide to break the cell entry instead of reducing the columns separation?
Why is the “alignment” the same on 32-bit and 64-bit systems?
Why is integer assignment on a naturally aligned variable atomic on x86?Memory alignment on a 32-bit Intel processorWhy does unaligned access to mmap'ed memory sometimes segfault on AMD64?Why is the default alignment for `int64_t` 8 byte on 32 bit x86 architecture?How do you set, clear, and toggle a single bit?Why can templates only be implemented in the header file?Why is “using namespace std” considered bad practice?What is an application binary interface (ABI)?Why are elementwise additions much faster in separate loops than in a combined loop?Why is reading lines from stdin much slower in C++ than Python?Why is it faster to process a sorted array than an unsorted array?Why should I use a pointer rather than the object itself?Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviationsWhy g++ isn't performing structure packing here?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I was wondering whether the compiler would use different padding on 32-bit and 64-bit systems, so I wrote the code below in a simple VS2019 C++ console project:
struct Z
char s;
__int64 i;
;
int main()
std::cout << sizeof(Z) <<"n";
What I expected on each "Platform" setting:
x86: 12
X64: 16
Actual result:
x86: 16
X64: 16
Since the memory word size on x86 is 4 bytes, this means it has to store the bytes of i
in two different words. So I thought the compiler would do padding this way:
struct Z
char s;
char _pad[3];
__int64 i;
;
So may I know what the reason behind this is?
- For forward-compatibility with the 64-bit system?
- Due to the limitation of supporting 64-bit numbers on the 32-bit processor?
c++ visual-c++ 32bit-64bit memory-alignment abi
New contributor
add a comment |
I was wondering whether the compiler would use different padding on 32-bit and 64-bit systems, so I wrote the code below in a simple VS2019 C++ console project:
struct Z
char s;
__int64 i;
;
int main()
std::cout << sizeof(Z) <<"n";
What I expected on each "Platform" setting:
x86: 12
X64: 16
Actual result:
x86: 16
X64: 16
Since the memory word size on x86 is 4 bytes, this means it has to store the bytes of i
in two different words. So I thought the compiler would do padding this way:
struct Z
char s;
char _pad[3];
__int64 i;
;
So may I know what the reason behind this is?
- For forward-compatibility with the 64-bit system?
- Due to the limitation of supporting 64-bit numbers on the 32-bit processor?
c++ visual-c++ 32bit-64bit memory-alignment abi
New contributor
3
Related: Why is the default alignment forint64_t
8 byte on 32 bit x86 architecture?, Memory alignment on a 32-bit Intel processor.
– Daniel Langr
Apr 30 at 12:40
add a comment |
I was wondering whether the compiler would use different padding on 32-bit and 64-bit systems, so I wrote the code below in a simple VS2019 C++ console project:
struct Z
char s;
__int64 i;
;
int main()
std::cout << sizeof(Z) <<"n";
What I expected on each "Platform" setting:
x86: 12
X64: 16
Actual result:
x86: 16
X64: 16
Since the memory word size on x86 is 4 bytes, this means it has to store the bytes of i
in two different words. So I thought the compiler would do padding this way:
struct Z
char s;
char _pad[3];
__int64 i;
;
So may I know what the reason behind this is?
- For forward-compatibility with the 64-bit system?
- Due to the limitation of supporting 64-bit numbers on the 32-bit processor?
c++ visual-c++ 32bit-64bit memory-alignment abi
New contributor
I was wondering whether the compiler would use different padding on 32-bit and 64-bit systems, so I wrote the code below in a simple VS2019 C++ console project:
struct Z
char s;
__int64 i;
;
int main()
std::cout << sizeof(Z) <<"n";
What I expected on each "Platform" setting:
x86: 12
X64: 16
Actual result:
x86: 16
X64: 16
Since the memory word size on x86 is 4 bytes, this means it has to store the bytes of i
in two different words. So I thought the compiler would do padding this way:
struct Z
char s;
char _pad[3];
__int64 i;
;
So may I know what the reason behind this is?
- For forward-compatibility with the 64-bit system?
- Due to the limitation of supporting 64-bit numbers on the 32-bit processor?
c++ visual-c++ 32bit-64bit memory-alignment abi
c++ visual-c++ 32bit-64bit memory-alignment abi
New contributor
New contributor
edited Apr 30 at 19:17
Peter Cordes
138k19210354
138k19210354
New contributor
asked Apr 30 at 11:38
Shen YuanShen Yuan
735
735
New contributor
New contributor
3
Related: Why is the default alignment forint64_t
8 byte on 32 bit x86 architecture?, Memory alignment on a 32-bit Intel processor.
– Daniel Langr
Apr 30 at 12:40
add a comment |
3
Related: Why is the default alignment forint64_t
8 byte on 32 bit x86 architecture?, Memory alignment on a 32-bit Intel processor.
– Daniel Langr
Apr 30 at 12:40
3
3
Related: Why is the default alignment for
int64_t
8 byte on 32 bit x86 architecture?, Memory alignment on a 32-bit Intel processor.– Daniel Langr
Apr 30 at 12:40
Related: Why is the default alignment for
int64_t
8 byte on 32 bit x86 architecture?, Memory alignment on a 32-bit Intel processor.– Daniel Langr
Apr 30 at 12:40
add a comment |
4 Answers
4
active
oldest
votes
Size and alignof()
(minimum alignment that any object of that type must have) for each primitive type is an ABI1 design choice separate from the register width of the architecture.
Struct-packing rules can also be more complicated than just aligning each struct member to its minimum alignment inside the struct; that's another part of the ABI.
MSVC targeting 32-bit x86 gives __int64
a minimum alignment of 4, but its default struct-packing rules align types within structs to min(8, sizeof(T))
relative to the start of the struct. (For non-aggregate types only). That's not a direct quote, that's my paraphrase of the MSVC docs link from @P.W's answer, based on what MSVC seems to actually do. (I suspect the "whichever is less" in the text is supposed to be outside the parens, but maybe they're making a different point about the interaction on the pragma and the command-line option?)
(An 8-byte struct containing a char[8]
still only gets 1-byte alignment inside another struct, or a struct containing an alignas(16)
member still gets 16-byte alignment inside another struct.)
Note that ISO C++ doesn't guarantee that primitive types have alignof(T) == sizeof(T)
. Also note that MSVC's definition of alignof()
doesn't match the ISO C++ standard: MSVC says alignof(__int64) == 8
, but some __int64
objects have less than that alignment2.
So surprisingly, we get extra padding even though MSVC doesn't always bother to make sure the struct itself has any more than 4-byte alignment, unless you specify that with alignas()
on the variable, or on a struct member to imply that for the type. (e.g. a local struct Z tmp
on the stack inside a function will only have 4-byte alignment, because MSVC doesn't use extra instructions like and esp, -8
to round the stack pointer down to an 8-byte boundary.)
However, new
/ malloc
does give you 8-byte-aligned memory in 32-bit mode, so this makes a lot of sense for dynamically-allocated objects (which are common). Forcing locals on the stack to be fully aligned would add cost to align the stack pointer, but by setting struct layout to take advantage of 8-byte-aligned storage, we get the advantage for static and dynamic storage.
This might also be designed to get 32 and 64-bit code to agree on some struct layouts for shared memory. (But note that the default for x86-64 is min(16, sizeof(T))
, so they still don't fully agree on struct layout if there are any 16-byte types that aren't aggregates (struct/union/array) and don't have an alignas
.)
The minimum absolute alignment of 4 comes from the 4-byte stack alignment that 32-bit code can assume. In static storage, compilers will choose natural alignment up to maybe 8 or 16 bytes for vars outside of structs, for efficient copying with SSE2 vectors.
In larger functions, MSVC may decide to align the stack by 8 for performance reasons, e.g. for double
vars on the stack which actually can be manipulated with single instructions, or maybe also for int64_t
with SSE2 vectors. See the Stack Alignment section in this 2006 article: Windows Data Alignment on IPF, x86, and x64. So in 32-bit code you can't depend on an int64_t*
or double*
being naturally aligned.
(I'm not sure if MSVC will ever create even less aligned int64_t
or double
objects on its own. Certainly yes if you use #pragma pack 1
or -Zp1
, but that changes the ABI. But otherwise probably not, unless you carve space for an int64_t
out of a buffer manually and don't bother to align it. But assuming alignof(int64_t)
is still 8, that would be C++ undefined behaviour.)
If you use alignas(8) int64_t tmp
, MSVC emits extra instructions to and esp, -8
. If you don't, MSVC doesn't do anything special, so it's luck whether or not tmp
ends up 8-byte aligned or not.
Other designs are possible, for example the i386 System V ABI (used on most non-Windows OSes) has alignof(long long) = 4
but sizeof(long long) = 8
. These choices
Outside of structs (e.g. global vars or locals on the stack), modern compilers in 32-bit mode do choose to align int64_t
to an 8-byte boundary for efficiency (so it can be loaded / copied with MMX or SSE2 64-bit loads, or x87 fild
to do int64_t -> double conversion).
This is one reason why modern version of the i386 System V ABI maintain 16-byte stack alignment: so 8-byte and 16-byte aligned local vars are possible.
When the 32-bit Windows ABI was being designed, Pentium CPUs were at least on the horizon. Pentium has 64-bit wide data busses, so its FPU really can load a 64-bit double
in a single cache access if it's 64-bit aligned.
Or for fild
/ fistp
, load/store a 64-bit integer when converting to/from double
. Fun fact: naturally aligned accesses up to 64 bits are guaranteed atomic on x86, since Pentium: Why is integer assignment on a naturally aligned variable atomic on x86?
Footnote 1: An ABI also includes a calling convention, or in the case of MS Windows, a choice of various calling conventions which you can declare with function attributes like __fastcall
), but the sizes and alignment-requirements for primitive types like long long
are also something that compilers have to agree on to make functions that can call each other. (The ISO C++ standard only talks about a single "C++ implementation"; ABI standards are how "C++ implementations" make themselves compatible with each other.)
Note that struct-layout rules are also part of the ABI: compilers have to agree with each other on struct layout to create compatible binaries that pass around structs or pointers to structs. Otherwise s.x = 10; foo(&x);
might write to a different offset relative to the base of the struct than separately-compiled foo()
(maybe in a DLL) was expecting to read it at.
Footnote 2:
GCC had this C++ alignof()
bug, too, until it was fixed in 2018 for g++8 some time after being fixed for C11 _Alignof()
. See that bug report for some discussion based on quotes from the standard which conclude that alignof(T)
should really report the minimum guaranteed alignment you can ever see, not the preferred alignment you want for performance. i.e. that using an int64_t*
with less than alignof(int64_t)
alignment is undefined behaviour.
(It will usually work fine on x86, but vectorization that assumes a whole number of int64_t
iterations will reach a 16 or 32-byte alignment boundary can fault. See Why does unaligned access to mmap'ed memory sometimes segfault on AMD64? for an example with gcc.)
The gcc bug report discusses the i386 System V ABI, which has different struct-packing rules than MSVC: based on minimum alignment, not preferred. But modern i386 System V maintains 16-byte stack alignment, so it's only inside structs (because of struct-packing rules that are part of the ABI) that the compiler ever creates int64_t
and double
objects that are less than naturally aligned. Anyway, that's why the GCC bug report was discussing struct members as the special case.
Kind of the opposite from 32-bit Windows with MSVC where the struct-packing rules are compatible with an alignof(int64_t) == 8
but locals on the stack are always potentially under-aligned unless you use alignas()
to specifically request alignment.
32-bit MSVC has the bizarre behaviour that alignas(int64_t) int64_t tmp
is not the same as int64_t tmp;
, and emits extra instructions to align the stack. That's because alignas(int64_t)
is like alignas(8)
, which is more aligned than the actual minimum.
void extfunc(int64_t *);
void foo_align8(void)
alignas(int64_t) int64_t tmp;
extfunc(&tmp);
(32-bit) x86 MSVC 19.20 -O2 compiles it like so (on Godbolt, also includes 32-bit GCC and the struct test-case):
_tmp$ = -8 ; size = 8
void foo_align8(void) PROC ; foo_align8, COMDAT
push ebp
mov ebp, esp
and esp, -8 ; fffffff8H align the stack
sub esp, 8 ; and reserve 8 bytes
lea eax, DWORD PTR _tmp$[esp+8] ; get a pointer to those 8 bytes
push eax ; pass the pointer as an arg
call void extfunc(__int64 *) ; extfunc
add esp, 4
mov esp, ebp
pop ebp
ret 0
But without the alignas()
, or with alignas(4)
, we get the much simpler
_tmp$ = -8 ; size = 8
void foo_noalign(void) PROC ; foo_noalign, COMDAT
sub esp, 8 ; reserve 8 bytes
lea eax, DWORD PTR _tmp$[esp+8] ; "calculate" a pointer to it
push eax ; pass the pointer as a function arg
call void extfunc(__int64 *) ; extfunc
add esp, 12 ; 0000000cH
ret 0
It could just push esp
instead of LEA/push; that's a minor missed optimization.
Passing a pointer to a non-inline function proves that it's not just locally bending the rules. Some other function that just gets an int64_t*
as an arg has to deal with this potentially under-aligned pointer, without having gotten any information about where it came from.
If alignof(int64_t)
was really 8, that function could be hand-written in asm in a way that faulted on misaligned pointers. Or it could be written in C with SSE2 intrinsics like _mm_load_si128()
that require 16-byte alignment, after handling 0 or 1 elements to reach an alignment boundary.
But with MSVC's actual behaviour, it's possible that none of the int64_t
array elements are aligned by 16, because they all span an 8-byte boundary.
BTW, I wouldn't recommend using compiler-specific types like __int64
directly. You can write portable code by using int64_t
from <cstdint>
, aka <stdint.h>
.
In MSVC, int64_t
will be the same type as __int64
.
On other platforms, it will typically be long
or long long
. int64_t
is guaranteed to be exactly 64 bits with no padding, and 2's complement, if provided at all. (It is by all sane compilers targeting normal CPUs. C99 and C++ require long long
to be at least 64-bit, and on machines with 8-bit bytes and registers that are a power of 2, long long
is normally exactly 64 bits and can be used as int64_t
. Or if long
is a 64-bit type, then <cstdint>
might use that as the typedef.)
I assume __int64
and long long
are the same type in MSVC, but MSVC doesn't enforce strict-aliasing anyway so it doesn't matter whether they're the exact same type or not, just that they use the same representation.
OK, I must admit I had never suspected I could get an answer like this which covered so many aspects. Thanks!
– Shen Yuan
2 days ago
1
@ShenYuan: I was surprised how non-simple a sufficient explanation turned out to be. MSVC's struct-packing rules (using preferred alignment instead of actual minimum alignment) are a lot different from what I'm familiar with (i386 System V), so thesizeof(Z) == 16
in your question got me curious. I knew 32-bit Windows only keeps the stack 4-byte aligned, so there was a real mystery to solve in terms of when it does do extra stack alignment for locals and what the actual minimum alignments were.
– Peter Cordes
2 days ago
add a comment |
The padding is not determined by the word size, but by the alignment of each data type.
In most cases, the alignment requirement is equal to the type's size. So for a 64 bit type like int64
you will get an 8 byte (64 bit) alignment. Padding needs to be inserted into the struct to make sure that the storage for the type ends up at an address that is properly aligned.
You may see a difference in padding between 32 bit and 64 bit when using built-in datatypes that have different sizes on both architectures, for instance pointer types (int*
).
2
Default alignment is determined by wordsize however. Reason beeing that words are adressed in memory so that they fit into registers perfectly. On x86(_64) unaligned data requires a shift operation to work with it. On other paltforms like sun sparc unaligned data will cause a bus exception. If you want to remove padding try adding__attribute__((packed))
(GCC) to the struct definition.
– Nefrin
Apr 30 at 12:20
1
@Nefrin Do you have a reference for that? I am not aware of any such behavior for the C or C++ built-in datatypes.
– ComicSansMS
Apr 30 at 12:22
2
Also this is not C/C++ Language behaviour but rather compiler behaviour
– Nefrin
Apr 30 at 12:50
2
@Nefrin: x86-64 has efficient unaligned loads that handle the required shift in hardware. Yes it's normal for types wider than an integer register to only get aligned to the register width in some ABIs (like i386 System V, and 32-bit Windows). But for x86-64 System V,alignof(__int128) = 16
so it can be copied with SSE vectors, or forlock cmpxchg16b
. But as far as the C++ standard is concerned, that's all up to the implementation. And the struct-packing rules are allowed to be different from what you'd expect based on justalignof(member)
, as P.W's answer shows is the case for MSVC.
– Peter Cordes
Apr 30 at 19:58
2
@ComicSansMS:alignof(int64_t) == 8
in 32-bit MSVC, but it doesn't actually bother to ensure that for locals on the stack so that's not really the minimum alignment necessary for anyint64_t
object. If you usealignas(8) int64_t tmp;
, you get extra instructions to align the stack pointer which you don't get with justint64_t tmp
. godbolt.org/z/lsuXAQ. The struct-packing rules are allowed to be more complicated than just padding toalignof(T)
relative to the start of the struct, as @P.W's answer shows.
– Peter Cordes
Apr 30 at 20:13
|
show 3 more comments
This is a matter of alignment requirement of the data type as specified in
Padding and Alignment of Structure Members
Every data object has an alignment-requirement. The alignment-requirement for all data except structures, unions, and arrays is either the size of the object or the current packing size (specified with either
/Zp
or the pack pragma, whichever is less).
And the default value for structure member alignment is specified in /Zp (Struct Member Alignment)
The available packing values are described in the following table:
/
Zp
argument Effect
1 Packs structures on 1-byte boundaries. Same as /Zp.
2 Packs structures on 2-byte boundaries.
4 Packs structures on 4-byte boundaries.
8 Packs structures on 8-byte boundaries (default for x86, ARM, and ARM64).
16 Packs structures on 16-byte boundaries (default for x64).
Since the default for x86 is /Zp8 which is 8 bytes, the output is 16.
However, you can specify a different packing size with /Zp
option.
Here is a Live Demo with /Zp4
which gives the output as 12 instead of 16.
add a comment |
A struct's alignment is the size of its largest member.
That means if you have an 8-byte(64bit) member in the struct, then the struct will align to 8 bytes.
In the case that you are describing, if the compiler allows the struct to align to 4 bytes, it possibly leads to an 8-byte member lying across the cache line boundary.
Say we have a CPU that has a 16-byte cache line.
Consider a struct like this:
struct Z
char s; // 1-4 byte
__int64 i; // 5-12 byte
__int64 i2; // 13-20 byte, need two cache line fetches to read this variable
;
3
Nope, a struct's alignment is the alignment of its most-aligned member. Not all C types have a size that matches their alignment, notably other composite types like astruct
or array inside anotherstruct
, but primitive types aren't guaranteed to havealignof(T) == sizeof(T)
. On an ABI like i386 System V (32-bit x86 Linux),alignof(int64_t) == 4
, so the OP would see their expectedsizeof(struct)==12
andalignof(struct)==4
.
– Peter Cordes
Apr 30 at 19:08
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Shen Yuan is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55920103%2fwhy-is-the-alignment-the-same-on-32-bit-and-64-bit-systems%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Size and alignof()
(minimum alignment that any object of that type must have) for each primitive type is an ABI1 design choice separate from the register width of the architecture.
Struct-packing rules can also be more complicated than just aligning each struct member to its minimum alignment inside the struct; that's another part of the ABI.
MSVC targeting 32-bit x86 gives __int64
a minimum alignment of 4, but its default struct-packing rules align types within structs to min(8, sizeof(T))
relative to the start of the struct. (For non-aggregate types only). That's not a direct quote, that's my paraphrase of the MSVC docs link from @P.W's answer, based on what MSVC seems to actually do. (I suspect the "whichever is less" in the text is supposed to be outside the parens, but maybe they're making a different point about the interaction on the pragma and the command-line option?)
(An 8-byte struct containing a char[8]
still only gets 1-byte alignment inside another struct, or a struct containing an alignas(16)
member still gets 16-byte alignment inside another struct.)
Note that ISO C++ doesn't guarantee that primitive types have alignof(T) == sizeof(T)
. Also note that MSVC's definition of alignof()
doesn't match the ISO C++ standard: MSVC says alignof(__int64) == 8
, but some __int64
objects have less than that alignment2.
So surprisingly, we get extra padding even though MSVC doesn't always bother to make sure the struct itself has any more than 4-byte alignment, unless you specify that with alignas()
on the variable, or on a struct member to imply that for the type. (e.g. a local struct Z tmp
on the stack inside a function will only have 4-byte alignment, because MSVC doesn't use extra instructions like and esp, -8
to round the stack pointer down to an 8-byte boundary.)
However, new
/ malloc
does give you 8-byte-aligned memory in 32-bit mode, so this makes a lot of sense for dynamically-allocated objects (which are common). Forcing locals on the stack to be fully aligned would add cost to align the stack pointer, but by setting struct layout to take advantage of 8-byte-aligned storage, we get the advantage for static and dynamic storage.
This might also be designed to get 32 and 64-bit code to agree on some struct layouts for shared memory. (But note that the default for x86-64 is min(16, sizeof(T))
, so they still don't fully agree on struct layout if there are any 16-byte types that aren't aggregates (struct/union/array) and don't have an alignas
.)
The minimum absolute alignment of 4 comes from the 4-byte stack alignment that 32-bit code can assume. In static storage, compilers will choose natural alignment up to maybe 8 or 16 bytes for vars outside of structs, for efficient copying with SSE2 vectors.
In larger functions, MSVC may decide to align the stack by 8 for performance reasons, e.g. for double
vars on the stack which actually can be manipulated with single instructions, or maybe also for int64_t
with SSE2 vectors. See the Stack Alignment section in this 2006 article: Windows Data Alignment on IPF, x86, and x64. So in 32-bit code you can't depend on an int64_t*
or double*
being naturally aligned.
(I'm not sure if MSVC will ever create even less aligned int64_t
or double
objects on its own. Certainly yes if you use #pragma pack 1
or -Zp1
, but that changes the ABI. But otherwise probably not, unless you carve space for an int64_t
out of a buffer manually and don't bother to align it. But assuming alignof(int64_t)
is still 8, that would be C++ undefined behaviour.)
If you use alignas(8) int64_t tmp
, MSVC emits extra instructions to and esp, -8
. If you don't, MSVC doesn't do anything special, so it's luck whether or not tmp
ends up 8-byte aligned or not.
Other designs are possible, for example the i386 System V ABI (used on most non-Windows OSes) has alignof(long long) = 4
but sizeof(long long) = 8
. These choices
Outside of structs (e.g. global vars or locals on the stack), modern compilers in 32-bit mode do choose to align int64_t
to an 8-byte boundary for efficiency (so it can be loaded / copied with MMX or SSE2 64-bit loads, or x87 fild
to do int64_t -> double conversion).
This is one reason why modern version of the i386 System V ABI maintain 16-byte stack alignment: so 8-byte and 16-byte aligned local vars are possible.
When the 32-bit Windows ABI was being designed, Pentium CPUs were at least on the horizon. Pentium has 64-bit wide data busses, so its FPU really can load a 64-bit double
in a single cache access if it's 64-bit aligned.
Or for fild
/ fistp
, load/store a 64-bit integer when converting to/from double
. Fun fact: naturally aligned accesses up to 64 bits are guaranteed atomic on x86, since Pentium: Why is integer assignment on a naturally aligned variable atomic on x86?
Footnote 1: An ABI also includes a calling convention, or in the case of MS Windows, a choice of various calling conventions which you can declare with function attributes like __fastcall
), but the sizes and alignment-requirements for primitive types like long long
are also something that compilers have to agree on to make functions that can call each other. (The ISO C++ standard only talks about a single "C++ implementation"; ABI standards are how "C++ implementations" make themselves compatible with each other.)
Note that struct-layout rules are also part of the ABI: compilers have to agree with each other on struct layout to create compatible binaries that pass around structs or pointers to structs. Otherwise s.x = 10; foo(&x);
might write to a different offset relative to the base of the struct than separately-compiled foo()
(maybe in a DLL) was expecting to read it at.
Footnote 2:
GCC had this C++ alignof()
bug, too, until it was fixed in 2018 for g++8 some time after being fixed for C11 _Alignof()
. See that bug report for some discussion based on quotes from the standard which conclude that alignof(T)
should really report the minimum guaranteed alignment you can ever see, not the preferred alignment you want for performance. i.e. that using an int64_t*
with less than alignof(int64_t)
alignment is undefined behaviour.
(It will usually work fine on x86, but vectorization that assumes a whole number of int64_t
iterations will reach a 16 or 32-byte alignment boundary can fault. See Why does unaligned access to mmap'ed memory sometimes segfault on AMD64? for an example with gcc.)
The gcc bug report discusses the i386 System V ABI, which has different struct-packing rules than MSVC: based on minimum alignment, not preferred. But modern i386 System V maintains 16-byte stack alignment, so it's only inside structs (because of struct-packing rules that are part of the ABI) that the compiler ever creates int64_t
and double
objects that are less than naturally aligned. Anyway, that's why the GCC bug report was discussing struct members as the special case.
Kind of the opposite from 32-bit Windows with MSVC where the struct-packing rules are compatible with an alignof(int64_t) == 8
but locals on the stack are always potentially under-aligned unless you use alignas()
to specifically request alignment.
32-bit MSVC has the bizarre behaviour that alignas(int64_t) int64_t tmp
is not the same as int64_t tmp;
, and emits extra instructions to align the stack. That's because alignas(int64_t)
is like alignas(8)
, which is more aligned than the actual minimum.
void extfunc(int64_t *);
void foo_align8(void)
alignas(int64_t) int64_t tmp;
extfunc(&tmp);
(32-bit) x86 MSVC 19.20 -O2 compiles it like so (on Godbolt, also includes 32-bit GCC and the struct test-case):
_tmp$ = -8 ; size = 8
void foo_align8(void) PROC ; foo_align8, COMDAT
push ebp
mov ebp, esp
and esp, -8 ; fffffff8H align the stack
sub esp, 8 ; and reserve 8 bytes
lea eax, DWORD PTR _tmp$[esp+8] ; get a pointer to those 8 bytes
push eax ; pass the pointer as an arg
call void extfunc(__int64 *) ; extfunc
add esp, 4
mov esp, ebp
pop ebp
ret 0
But without the alignas()
, or with alignas(4)
, we get the much simpler
_tmp$ = -8 ; size = 8
void foo_noalign(void) PROC ; foo_noalign, COMDAT
sub esp, 8 ; reserve 8 bytes
lea eax, DWORD PTR _tmp$[esp+8] ; "calculate" a pointer to it
push eax ; pass the pointer as a function arg
call void extfunc(__int64 *) ; extfunc
add esp, 12 ; 0000000cH
ret 0
It could just push esp
instead of LEA/push; that's a minor missed optimization.
Passing a pointer to a non-inline function proves that it's not just locally bending the rules. Some other function that just gets an int64_t*
as an arg has to deal with this potentially under-aligned pointer, without having gotten any information about where it came from.
If alignof(int64_t)
was really 8, that function could be hand-written in asm in a way that faulted on misaligned pointers. Or it could be written in C with SSE2 intrinsics like _mm_load_si128()
that require 16-byte alignment, after handling 0 or 1 elements to reach an alignment boundary.
But with MSVC's actual behaviour, it's possible that none of the int64_t
array elements are aligned by 16, because they all span an 8-byte boundary.
BTW, I wouldn't recommend using compiler-specific types like __int64
directly. You can write portable code by using int64_t
from <cstdint>
, aka <stdint.h>
.
In MSVC, int64_t
will be the same type as __int64
.
On other platforms, it will typically be long
or long long
. int64_t
is guaranteed to be exactly 64 bits with no padding, and 2's complement, if provided at all. (It is by all sane compilers targeting normal CPUs. C99 and C++ require long long
to be at least 64-bit, and on machines with 8-bit bytes and registers that are a power of 2, long long
is normally exactly 64 bits and can be used as int64_t
. Or if long
is a 64-bit type, then <cstdint>
might use that as the typedef.)
I assume __int64
and long long
are the same type in MSVC, but MSVC doesn't enforce strict-aliasing anyway so it doesn't matter whether they're the exact same type or not, just that they use the same representation.
OK, I must admit I had never suspected I could get an answer like this which covered so many aspects. Thanks!
– Shen Yuan
2 days ago
1
@ShenYuan: I was surprised how non-simple a sufficient explanation turned out to be. MSVC's struct-packing rules (using preferred alignment instead of actual minimum alignment) are a lot different from what I'm familiar with (i386 System V), so thesizeof(Z) == 16
in your question got me curious. I knew 32-bit Windows only keeps the stack 4-byte aligned, so there was a real mystery to solve in terms of when it does do extra stack alignment for locals and what the actual minimum alignments were.
– Peter Cordes
2 days ago
add a comment |
Size and alignof()
(minimum alignment that any object of that type must have) for each primitive type is an ABI1 design choice separate from the register width of the architecture.
Struct-packing rules can also be more complicated than just aligning each struct member to its minimum alignment inside the struct; that's another part of the ABI.
MSVC targeting 32-bit x86 gives __int64
a minimum alignment of 4, but its default struct-packing rules align types within structs to min(8, sizeof(T))
relative to the start of the struct. (For non-aggregate types only). That's not a direct quote, that's my paraphrase of the MSVC docs link from @P.W's answer, based on what MSVC seems to actually do. (I suspect the "whichever is less" in the text is supposed to be outside the parens, but maybe they're making a different point about the interaction on the pragma and the command-line option?)
(An 8-byte struct containing a char[8]
still only gets 1-byte alignment inside another struct, or a struct containing an alignas(16)
member still gets 16-byte alignment inside another struct.)
Note that ISO C++ doesn't guarantee that primitive types have alignof(T) == sizeof(T)
. Also note that MSVC's definition of alignof()
doesn't match the ISO C++ standard: MSVC says alignof(__int64) == 8
, but some __int64
objects have less than that alignment2.
So surprisingly, we get extra padding even though MSVC doesn't always bother to make sure the struct itself has any more than 4-byte alignment, unless you specify that with alignas()
on the variable, or on a struct member to imply that for the type. (e.g. a local struct Z tmp
on the stack inside a function will only have 4-byte alignment, because MSVC doesn't use extra instructions like and esp, -8
to round the stack pointer down to an 8-byte boundary.)
However, new
/ malloc
does give you 8-byte-aligned memory in 32-bit mode, so this makes a lot of sense for dynamically-allocated objects (which are common). Forcing locals on the stack to be fully aligned would add cost to align the stack pointer, but by setting struct layout to take advantage of 8-byte-aligned storage, we get the advantage for static and dynamic storage.
This might also be designed to get 32 and 64-bit code to agree on some struct layouts for shared memory. (But note that the default for x86-64 is min(16, sizeof(T))
, so they still don't fully agree on struct layout if there are any 16-byte types that aren't aggregates (struct/union/array) and don't have an alignas
.)
The minimum absolute alignment of 4 comes from the 4-byte stack alignment that 32-bit code can assume. In static storage, compilers will choose natural alignment up to maybe 8 or 16 bytes for vars outside of structs, for efficient copying with SSE2 vectors.
In larger functions, MSVC may decide to align the stack by 8 for performance reasons, e.g. for double
vars on the stack which actually can be manipulated with single instructions, or maybe also for int64_t
with SSE2 vectors. See the Stack Alignment section in this 2006 article: Windows Data Alignment on IPF, x86, and x64. So in 32-bit code you can't depend on an int64_t*
or double*
being naturally aligned.
(I'm not sure if MSVC will ever create even less aligned int64_t
or double
objects on its own. Certainly yes if you use #pragma pack 1
or -Zp1
, but that changes the ABI. But otherwise probably not, unless you carve space for an int64_t
out of a buffer manually and don't bother to align it. But assuming alignof(int64_t)
is still 8, that would be C++ undefined behaviour.)
If you use alignas(8) int64_t tmp
, MSVC emits extra instructions to and esp, -8
. If you don't, MSVC doesn't do anything special, so it's luck whether or not tmp
ends up 8-byte aligned or not.
Other designs are possible, for example the i386 System V ABI (used on most non-Windows OSes) has alignof(long long) = 4
but sizeof(long long) = 8
. These choices
Outside of structs (e.g. global vars or locals on the stack), modern compilers in 32-bit mode do choose to align int64_t
to an 8-byte boundary for efficiency (so it can be loaded / copied with MMX or SSE2 64-bit loads, or x87 fild
to do int64_t -> double conversion).
This is one reason why modern version of the i386 System V ABI maintain 16-byte stack alignment: so 8-byte and 16-byte aligned local vars are possible.
When the 32-bit Windows ABI was being designed, Pentium CPUs were at least on the horizon. Pentium has 64-bit wide data busses, so its FPU really can load a 64-bit double
in a single cache access if it's 64-bit aligned.
Or for fild
/ fistp
, load/store a 64-bit integer when converting to/from double
. Fun fact: naturally aligned accesses up to 64 bits are guaranteed atomic on x86, since Pentium: Why is integer assignment on a naturally aligned variable atomic on x86?
Footnote 1: An ABI also includes a calling convention, or in the case of MS Windows, a choice of various calling conventions which you can declare with function attributes like __fastcall
), but the sizes and alignment-requirements for primitive types like long long
are also something that compilers have to agree on to make functions that can call each other. (The ISO C++ standard only talks about a single "C++ implementation"; ABI standards are how "C++ implementations" make themselves compatible with each other.)
Note that struct-layout rules are also part of the ABI: compilers have to agree with each other on struct layout to create compatible binaries that pass around structs or pointers to structs. Otherwise s.x = 10; foo(&x);
might write to a different offset relative to the base of the struct than separately-compiled foo()
(maybe in a DLL) was expecting to read it at.
Footnote 2:
GCC had this C++ alignof()
bug, too, until it was fixed in 2018 for g++8 some time after being fixed for C11 _Alignof()
. See that bug report for some discussion based on quotes from the standard which conclude that alignof(T)
should really report the minimum guaranteed alignment you can ever see, not the preferred alignment you want for performance. i.e. that using an int64_t*
with less than alignof(int64_t)
alignment is undefined behaviour.
(It will usually work fine on x86, but vectorization that assumes a whole number of int64_t
iterations will reach a 16 or 32-byte alignment boundary can fault. See Why does unaligned access to mmap'ed memory sometimes segfault on AMD64? for an example with gcc.)
The gcc bug report discusses the i386 System V ABI, which has different struct-packing rules than MSVC: based on minimum alignment, not preferred. But modern i386 System V maintains 16-byte stack alignment, so it's only inside structs (because of struct-packing rules that are part of the ABI) that the compiler ever creates int64_t
and double
objects that are less than naturally aligned. Anyway, that's why the GCC bug report was discussing struct members as the special case.
Kind of the opposite from 32-bit Windows with MSVC where the struct-packing rules are compatible with an alignof(int64_t) == 8
but locals on the stack are always potentially under-aligned unless you use alignas()
to specifically request alignment.
32-bit MSVC has the bizarre behaviour that alignas(int64_t) int64_t tmp
is not the same as int64_t tmp;
, and emits extra instructions to align the stack. That's because alignas(int64_t)
is like alignas(8)
, which is more aligned than the actual minimum.
void extfunc(int64_t *);
void foo_align8(void)
alignas(int64_t) int64_t tmp;
extfunc(&tmp);
(32-bit) x86 MSVC 19.20 -O2 compiles it like so (on Godbolt, also includes 32-bit GCC and the struct test-case):
_tmp$ = -8 ; size = 8
void foo_align8(void) PROC ; foo_align8, COMDAT
push ebp
mov ebp, esp
and esp, -8 ; fffffff8H align the stack
sub esp, 8 ; and reserve 8 bytes
lea eax, DWORD PTR _tmp$[esp+8] ; get a pointer to those 8 bytes
push eax ; pass the pointer as an arg
call void extfunc(__int64 *) ; extfunc
add esp, 4
mov esp, ebp
pop ebp
ret 0
But without the alignas()
, or with alignas(4)
, we get the much simpler
_tmp$ = -8 ; size = 8
void foo_noalign(void) PROC ; foo_noalign, COMDAT
sub esp, 8 ; reserve 8 bytes
lea eax, DWORD PTR _tmp$[esp+8] ; "calculate" a pointer to it
push eax ; pass the pointer as a function arg
call void extfunc(__int64 *) ; extfunc
add esp, 12 ; 0000000cH
ret 0
It could just push esp
instead of LEA/push; that's a minor missed optimization.
Passing a pointer to a non-inline function proves that it's not just locally bending the rules. Some other function that just gets an int64_t*
as an arg has to deal with this potentially under-aligned pointer, without having gotten any information about where it came from.
If alignof(int64_t)
was really 8, that function could be hand-written in asm in a way that faulted on misaligned pointers. Or it could be written in C with SSE2 intrinsics like _mm_load_si128()
that require 16-byte alignment, after handling 0 or 1 elements to reach an alignment boundary.
But with MSVC's actual behaviour, it's possible that none of the int64_t
array elements are aligned by 16, because they all span an 8-byte boundary.
BTW, I wouldn't recommend using compiler-specific types like __int64
directly. You can write portable code by using int64_t
from <cstdint>
, aka <stdint.h>
.
In MSVC, int64_t
will be the same type as __int64
.
On other platforms, it will typically be long
or long long
. int64_t
is guaranteed to be exactly 64 bits with no padding, and 2's complement, if provided at all. (It is by all sane compilers targeting normal CPUs. C99 and C++ require long long
to be at least 64-bit, and on machines with 8-bit bytes and registers that are a power of 2, long long
is normally exactly 64 bits and can be used as int64_t
. Or if long
is a 64-bit type, then <cstdint>
might use that as the typedef.)
I assume __int64
and long long
are the same type in MSVC, but MSVC doesn't enforce strict-aliasing anyway so it doesn't matter whether they're the exact same type or not, just that they use the same representation.
OK, I must admit I had never suspected I could get an answer like this which covered so many aspects. Thanks!
– Shen Yuan
2 days ago
1
@ShenYuan: I was surprised how non-simple a sufficient explanation turned out to be. MSVC's struct-packing rules (using preferred alignment instead of actual minimum alignment) are a lot different from what I'm familiar with (i386 System V), so thesizeof(Z) == 16
in your question got me curious. I knew 32-bit Windows only keeps the stack 4-byte aligned, so there was a real mystery to solve in terms of when it does do extra stack alignment for locals and what the actual minimum alignments were.
– Peter Cordes
2 days ago
add a comment |
Size and alignof()
(minimum alignment that any object of that type must have) for each primitive type is an ABI1 design choice separate from the register width of the architecture.
Struct-packing rules can also be more complicated than just aligning each struct member to its minimum alignment inside the struct; that's another part of the ABI.
MSVC targeting 32-bit x86 gives __int64
a minimum alignment of 4, but its default struct-packing rules align types within structs to min(8, sizeof(T))
relative to the start of the struct. (For non-aggregate types only). That's not a direct quote, that's my paraphrase of the MSVC docs link from @P.W's answer, based on what MSVC seems to actually do. (I suspect the "whichever is less" in the text is supposed to be outside the parens, but maybe they're making a different point about the interaction on the pragma and the command-line option?)
(An 8-byte struct containing a char[8]
still only gets 1-byte alignment inside another struct, or a struct containing an alignas(16)
member still gets 16-byte alignment inside another struct.)
Note that ISO C++ doesn't guarantee that primitive types have alignof(T) == sizeof(T)
. Also note that MSVC's definition of alignof()
doesn't match the ISO C++ standard: MSVC says alignof(__int64) == 8
, but some __int64
objects have less than that alignment2.
So surprisingly, we get extra padding even though MSVC doesn't always bother to make sure the struct itself has any more than 4-byte alignment, unless you specify that with alignas()
on the variable, or on a struct member to imply that for the type. (e.g. a local struct Z tmp
on the stack inside a function will only have 4-byte alignment, because MSVC doesn't use extra instructions like and esp, -8
to round the stack pointer down to an 8-byte boundary.)
However, new
/ malloc
does give you 8-byte-aligned memory in 32-bit mode, so this makes a lot of sense for dynamically-allocated objects (which are common). Forcing locals on the stack to be fully aligned would add cost to align the stack pointer, but by setting struct layout to take advantage of 8-byte-aligned storage, we get the advantage for static and dynamic storage.
This might also be designed to get 32 and 64-bit code to agree on some struct layouts for shared memory. (But note that the default for x86-64 is min(16, sizeof(T))
, so they still don't fully agree on struct layout if there are any 16-byte types that aren't aggregates (struct/union/array) and don't have an alignas
.)
The minimum absolute alignment of 4 comes from the 4-byte stack alignment that 32-bit code can assume. In static storage, compilers will choose natural alignment up to maybe 8 or 16 bytes for vars outside of structs, for efficient copying with SSE2 vectors.
In larger functions, MSVC may decide to align the stack by 8 for performance reasons, e.g. for double
vars on the stack which actually can be manipulated with single instructions, or maybe also for int64_t
with SSE2 vectors. See the Stack Alignment section in this 2006 article: Windows Data Alignment on IPF, x86, and x64. So in 32-bit code you can't depend on an int64_t*
or double*
being naturally aligned.
(I'm not sure if MSVC will ever create even less aligned int64_t
or double
objects on its own. Certainly yes if you use #pragma pack 1
or -Zp1
, but that changes the ABI. But otherwise probably not, unless you carve space for an int64_t
out of a buffer manually and don't bother to align it. But assuming alignof(int64_t)
is still 8, that would be C++ undefined behaviour.)
If you use alignas(8) int64_t tmp
, MSVC emits extra instructions to and esp, -8
. If you don't, MSVC doesn't do anything special, so it's luck whether or not tmp
ends up 8-byte aligned or not.
Other designs are possible, for example the i386 System V ABI (used on most non-Windows OSes) has alignof(long long) = 4
but sizeof(long long) = 8
. These choices
Outside of structs (e.g. global vars or locals on the stack), modern compilers in 32-bit mode do choose to align int64_t
to an 8-byte boundary for efficiency (so it can be loaded / copied with MMX or SSE2 64-bit loads, or x87 fild
to do int64_t -> double conversion).
This is one reason why modern version of the i386 System V ABI maintain 16-byte stack alignment: so 8-byte and 16-byte aligned local vars are possible.
When the 32-bit Windows ABI was being designed, Pentium CPUs were at least on the horizon. Pentium has 64-bit wide data busses, so its FPU really can load a 64-bit double
in a single cache access if it's 64-bit aligned.
Or for fild
/ fistp
, load/store a 64-bit integer when converting to/from double
. Fun fact: naturally aligned accesses up to 64 bits are guaranteed atomic on x86, since Pentium: Why is integer assignment on a naturally aligned variable atomic on x86?
Footnote 1: An ABI also includes a calling convention, or in the case of MS Windows, a choice of various calling conventions which you can declare with function attributes like __fastcall
), but the sizes and alignment-requirements for primitive types like long long
are also something that compilers have to agree on to make functions that can call each other. (The ISO C++ standard only talks about a single "C++ implementation"; ABI standards are how "C++ implementations" make themselves compatible with each other.)
Note that struct-layout rules are also part of the ABI: compilers have to agree with each other on struct layout to create compatible binaries that pass around structs or pointers to structs. Otherwise s.x = 10; foo(&x);
might write to a different offset relative to the base of the struct than separately-compiled foo()
(maybe in a DLL) was expecting to read it at.
Footnote 2:
GCC had this C++ alignof()
bug, too, until it was fixed in 2018 for g++8 some time after being fixed for C11 _Alignof()
. See that bug report for some discussion based on quotes from the standard which conclude that alignof(T)
should really report the minimum guaranteed alignment you can ever see, not the preferred alignment you want for performance. i.e. that using an int64_t*
with less than alignof(int64_t)
alignment is undefined behaviour.
(It will usually work fine on x86, but vectorization that assumes a whole number of int64_t
iterations will reach a 16 or 32-byte alignment boundary can fault. See Why does unaligned access to mmap'ed memory sometimes segfault on AMD64? for an example with gcc.)
The gcc bug report discusses the i386 System V ABI, which has different struct-packing rules than MSVC: based on minimum alignment, not preferred. But modern i386 System V maintains 16-byte stack alignment, so it's only inside structs (because of struct-packing rules that are part of the ABI) that the compiler ever creates int64_t
and double
objects that are less than naturally aligned. Anyway, that's why the GCC bug report was discussing struct members as the special case.
Kind of the opposite from 32-bit Windows with MSVC where the struct-packing rules are compatible with an alignof(int64_t) == 8
but locals on the stack are always potentially under-aligned unless you use alignas()
to specifically request alignment.
32-bit MSVC has the bizarre behaviour that alignas(int64_t) int64_t tmp
is not the same as int64_t tmp;
, and emits extra instructions to align the stack. That's because alignas(int64_t)
is like alignas(8)
, which is more aligned than the actual minimum.
void extfunc(int64_t *);
void foo_align8(void)
alignas(int64_t) int64_t tmp;
extfunc(&tmp);
(32-bit) x86 MSVC 19.20 -O2 compiles it like so (on Godbolt, also includes 32-bit GCC and the struct test-case):
_tmp$ = -8 ; size = 8
void foo_align8(void) PROC ; foo_align8, COMDAT
push ebp
mov ebp, esp
and esp, -8 ; fffffff8H align the stack
sub esp, 8 ; and reserve 8 bytes
lea eax, DWORD PTR _tmp$[esp+8] ; get a pointer to those 8 bytes
push eax ; pass the pointer as an arg
call void extfunc(__int64 *) ; extfunc
add esp, 4
mov esp, ebp
pop ebp
ret 0
But without the alignas()
, or with alignas(4)
, we get the much simpler
_tmp$ = -8 ; size = 8
void foo_noalign(void) PROC ; foo_noalign, COMDAT
sub esp, 8 ; reserve 8 bytes
lea eax, DWORD PTR _tmp$[esp+8] ; "calculate" a pointer to it
push eax ; pass the pointer as a function arg
call void extfunc(__int64 *) ; extfunc
add esp, 12 ; 0000000cH
ret 0
It could just push esp
instead of LEA/push; that's a minor missed optimization.
Passing a pointer to a non-inline function proves that it's not just locally bending the rules. Some other function that just gets an int64_t*
as an arg has to deal with this potentially under-aligned pointer, without having gotten any information about where it came from.
If alignof(int64_t)
was really 8, that function could be hand-written in asm in a way that faulted on misaligned pointers. Or it could be written in C with SSE2 intrinsics like _mm_load_si128()
that require 16-byte alignment, after handling 0 or 1 elements to reach an alignment boundary.
But with MSVC's actual behaviour, it's possible that none of the int64_t
array elements are aligned by 16, because they all span an 8-byte boundary.
BTW, I wouldn't recommend using compiler-specific types like __int64
directly. You can write portable code by using int64_t
from <cstdint>
, aka <stdint.h>
.
In MSVC, int64_t
will be the same type as __int64
.
On other platforms, it will typically be long
or long long
. int64_t
is guaranteed to be exactly 64 bits with no padding, and 2's complement, if provided at all. (It is by all sane compilers targeting normal CPUs. C99 and C++ require long long
to be at least 64-bit, and on machines with 8-bit bytes and registers that are a power of 2, long long
is normally exactly 64 bits and can be used as int64_t
. Or if long
is a 64-bit type, then <cstdint>
might use that as the typedef.)
I assume __int64
and long long
are the same type in MSVC, but MSVC doesn't enforce strict-aliasing anyway so it doesn't matter whether they're the exact same type or not, just that they use the same representation.
Size and alignof()
(minimum alignment that any object of that type must have) for each primitive type is an ABI1 design choice separate from the register width of the architecture.
Struct-packing rules can also be more complicated than just aligning each struct member to its minimum alignment inside the struct; that's another part of the ABI.
MSVC targeting 32-bit x86 gives __int64
a minimum alignment of 4, but its default struct-packing rules align types within structs to min(8, sizeof(T))
relative to the start of the struct. (For non-aggregate types only). That's not a direct quote, that's my paraphrase of the MSVC docs link from @P.W's answer, based on what MSVC seems to actually do. (I suspect the "whichever is less" in the text is supposed to be outside the parens, but maybe they're making a different point about the interaction on the pragma and the command-line option?)
(An 8-byte struct containing a char[8]
still only gets 1-byte alignment inside another struct, or a struct containing an alignas(16)
member still gets 16-byte alignment inside another struct.)
Note that ISO C++ doesn't guarantee that primitive types have alignof(T) == sizeof(T)
. Also note that MSVC's definition of alignof()
doesn't match the ISO C++ standard: MSVC says alignof(__int64) == 8
, but some __int64
objects have less than that alignment2.
So surprisingly, we get extra padding even though MSVC doesn't always bother to make sure the struct itself has any more than 4-byte alignment, unless you specify that with alignas()
on the variable, or on a struct member to imply that for the type. (e.g. a local struct Z tmp
on the stack inside a function will only have 4-byte alignment, because MSVC doesn't use extra instructions like and esp, -8
to round the stack pointer down to an 8-byte boundary.)
However, new
/ malloc
does give you 8-byte-aligned memory in 32-bit mode, so this makes a lot of sense for dynamically-allocated objects (which are common). Forcing locals on the stack to be fully aligned would add cost to align the stack pointer, but by setting struct layout to take advantage of 8-byte-aligned storage, we get the advantage for static and dynamic storage.
This might also be designed to get 32 and 64-bit code to agree on some struct layouts for shared memory. (But note that the default for x86-64 is min(16, sizeof(T))
, so they still don't fully agree on struct layout if there are any 16-byte types that aren't aggregates (struct/union/array) and don't have an alignas
.)
The minimum absolute alignment of 4 comes from the 4-byte stack alignment that 32-bit code can assume. In static storage, compilers will choose natural alignment up to maybe 8 or 16 bytes for vars outside of structs, for efficient copying with SSE2 vectors.
In larger functions, MSVC may decide to align the stack by 8 for performance reasons, e.g. for double
vars on the stack which actually can be manipulated with single instructions, or maybe also for int64_t
with SSE2 vectors. See the Stack Alignment section in this 2006 article: Windows Data Alignment on IPF, x86, and x64. So in 32-bit code you can't depend on an int64_t*
or double*
being naturally aligned.
(I'm not sure if MSVC will ever create even less aligned int64_t
or double
objects on its own. Certainly yes if you use #pragma pack 1
or -Zp1
, but that changes the ABI. But otherwise probably not, unless you carve space for an int64_t
out of a buffer manually and don't bother to align it. But assuming alignof(int64_t)
is still 8, that would be C++ undefined behaviour.)
If you use alignas(8) int64_t tmp
, MSVC emits extra instructions to and esp, -8
. If you don't, MSVC doesn't do anything special, so it's luck whether or not tmp
ends up 8-byte aligned or not.
Other designs are possible, for example the i386 System V ABI (used on most non-Windows OSes) has alignof(long long) = 4
but sizeof(long long) = 8
. These choices
Outside of structs (e.g. global vars or locals on the stack), modern compilers in 32-bit mode do choose to align int64_t
to an 8-byte boundary for efficiency (so it can be loaded / copied with MMX or SSE2 64-bit loads, or x87 fild
to do int64_t -> double conversion).
This is one reason why modern version of the i386 System V ABI maintain 16-byte stack alignment: so 8-byte and 16-byte aligned local vars are possible.
When the 32-bit Windows ABI was being designed, Pentium CPUs were at least on the horizon. Pentium has 64-bit wide data busses, so its FPU really can load a 64-bit double
in a single cache access if it's 64-bit aligned.
Or for fild
/ fistp
, load/store a 64-bit integer when converting to/from double
. Fun fact: naturally aligned accesses up to 64 bits are guaranteed atomic on x86, since Pentium: Why is integer assignment on a naturally aligned variable atomic on x86?
Footnote 1: An ABI also includes a calling convention, or in the case of MS Windows, a choice of various calling conventions which you can declare with function attributes like __fastcall
), but the sizes and alignment-requirements for primitive types like long long
are also something that compilers have to agree on to make functions that can call each other. (The ISO C++ standard only talks about a single "C++ implementation"; ABI standards are how "C++ implementations" make themselves compatible with each other.)
Note that struct-layout rules are also part of the ABI: compilers have to agree with each other on struct layout to create compatible binaries that pass around structs or pointers to structs. Otherwise s.x = 10; foo(&x);
might write to a different offset relative to the base of the struct than separately-compiled foo()
(maybe in a DLL) was expecting to read it at.
Footnote 2:
GCC had this C++ alignof()
bug, too, until it was fixed in 2018 for g++8 some time after being fixed for C11 _Alignof()
. See that bug report for some discussion based on quotes from the standard which conclude that alignof(T)
should really report the minimum guaranteed alignment you can ever see, not the preferred alignment you want for performance. i.e. that using an int64_t*
with less than alignof(int64_t)
alignment is undefined behaviour.
(It will usually work fine on x86, but vectorization that assumes a whole number of int64_t
iterations will reach a 16 or 32-byte alignment boundary can fault. See Why does unaligned access to mmap'ed memory sometimes segfault on AMD64? for an example with gcc.)
The gcc bug report discusses the i386 System V ABI, which has different struct-packing rules than MSVC: based on minimum alignment, not preferred. But modern i386 System V maintains 16-byte stack alignment, so it's only inside structs (because of struct-packing rules that are part of the ABI) that the compiler ever creates int64_t
and double
objects that are less than naturally aligned. Anyway, that's why the GCC bug report was discussing struct members as the special case.
Kind of the opposite from 32-bit Windows with MSVC where the struct-packing rules are compatible with an alignof(int64_t) == 8
but locals on the stack are always potentially under-aligned unless you use alignas()
to specifically request alignment.
32-bit MSVC has the bizarre behaviour that alignas(int64_t) int64_t tmp
is not the same as int64_t tmp;
, and emits extra instructions to align the stack. That's because alignas(int64_t)
is like alignas(8)
, which is more aligned than the actual minimum.
void extfunc(int64_t *);
void foo_align8(void)
alignas(int64_t) int64_t tmp;
extfunc(&tmp);
(32-bit) x86 MSVC 19.20 -O2 compiles it like so (on Godbolt, also includes 32-bit GCC and the struct test-case):
_tmp$ = -8 ; size = 8
void foo_align8(void) PROC ; foo_align8, COMDAT
push ebp
mov ebp, esp
and esp, -8 ; fffffff8H align the stack
sub esp, 8 ; and reserve 8 bytes
lea eax, DWORD PTR _tmp$[esp+8] ; get a pointer to those 8 bytes
push eax ; pass the pointer as an arg
call void extfunc(__int64 *) ; extfunc
add esp, 4
mov esp, ebp
pop ebp
ret 0
But without the alignas()
, or with alignas(4)
, we get the much simpler
_tmp$ = -8 ; size = 8
void foo_noalign(void) PROC ; foo_noalign, COMDAT
sub esp, 8 ; reserve 8 bytes
lea eax, DWORD PTR _tmp$[esp+8] ; "calculate" a pointer to it
push eax ; pass the pointer as a function arg
call void extfunc(__int64 *) ; extfunc
add esp, 12 ; 0000000cH
ret 0
It could just push esp
instead of LEA/push; that's a minor missed optimization.
Passing a pointer to a non-inline function proves that it's not just locally bending the rules. Some other function that just gets an int64_t*
as an arg has to deal with this potentially under-aligned pointer, without having gotten any information about where it came from.
If alignof(int64_t)
was really 8, that function could be hand-written in asm in a way that faulted on misaligned pointers. Or it could be written in C with SSE2 intrinsics like _mm_load_si128()
that require 16-byte alignment, after handling 0 or 1 elements to reach an alignment boundary.
But with MSVC's actual behaviour, it's possible that none of the int64_t
array elements are aligned by 16, because they all span an 8-byte boundary.
BTW, I wouldn't recommend using compiler-specific types like __int64
directly. You can write portable code by using int64_t
from <cstdint>
, aka <stdint.h>
.
In MSVC, int64_t
will be the same type as __int64
.
On other platforms, it will typically be long
or long long
. int64_t
is guaranteed to be exactly 64 bits with no padding, and 2's complement, if provided at all. (It is by all sane compilers targeting normal CPUs. C99 and C++ require long long
to be at least 64-bit, and on machines with 8-bit bytes and registers that are a power of 2, long long
is normally exactly 64 bits and can be used as int64_t
. Or if long
is a 64-bit type, then <cstdint>
might use that as the typedef.)
I assume __int64
and long long
are the same type in MSVC, but MSVC doesn't enforce strict-aliasing anyway so it doesn't matter whether they're the exact same type or not, just that they use the same representation.
answered 2 days ago
Peter CordesPeter Cordes
138k19210354
138k19210354
OK, I must admit I had never suspected I could get an answer like this which covered so many aspects. Thanks!
– Shen Yuan
2 days ago
1
@ShenYuan: I was surprised how non-simple a sufficient explanation turned out to be. MSVC's struct-packing rules (using preferred alignment instead of actual minimum alignment) are a lot different from what I'm familiar with (i386 System V), so thesizeof(Z) == 16
in your question got me curious. I knew 32-bit Windows only keeps the stack 4-byte aligned, so there was a real mystery to solve in terms of when it does do extra stack alignment for locals and what the actual minimum alignments were.
– Peter Cordes
2 days ago
add a comment |
OK, I must admit I had never suspected I could get an answer like this which covered so many aspects. Thanks!
– Shen Yuan
2 days ago
1
@ShenYuan: I was surprised how non-simple a sufficient explanation turned out to be. MSVC's struct-packing rules (using preferred alignment instead of actual minimum alignment) are a lot different from what I'm familiar with (i386 System V), so thesizeof(Z) == 16
in your question got me curious. I knew 32-bit Windows only keeps the stack 4-byte aligned, so there was a real mystery to solve in terms of when it does do extra stack alignment for locals and what the actual minimum alignments were.
– Peter Cordes
2 days ago
OK, I must admit I had never suspected I could get an answer like this which covered so many aspects. Thanks!
– Shen Yuan
2 days ago
OK, I must admit I had never suspected I could get an answer like this which covered so many aspects. Thanks!
– Shen Yuan
2 days ago
1
1
@ShenYuan: I was surprised how non-simple a sufficient explanation turned out to be. MSVC's struct-packing rules (using preferred alignment instead of actual minimum alignment) are a lot different from what I'm familiar with (i386 System V), so the
sizeof(Z) == 16
in your question got me curious. I knew 32-bit Windows only keeps the stack 4-byte aligned, so there was a real mystery to solve in terms of when it does do extra stack alignment for locals and what the actual minimum alignments were.– Peter Cordes
2 days ago
@ShenYuan: I was surprised how non-simple a sufficient explanation turned out to be. MSVC's struct-packing rules (using preferred alignment instead of actual minimum alignment) are a lot different from what I'm familiar with (i386 System V), so the
sizeof(Z) == 16
in your question got me curious. I knew 32-bit Windows only keeps the stack 4-byte aligned, so there was a real mystery to solve in terms of when it does do extra stack alignment for locals and what the actual minimum alignments were.– Peter Cordes
2 days ago
add a comment |
The padding is not determined by the word size, but by the alignment of each data type.
In most cases, the alignment requirement is equal to the type's size. So for a 64 bit type like int64
you will get an 8 byte (64 bit) alignment. Padding needs to be inserted into the struct to make sure that the storage for the type ends up at an address that is properly aligned.
You may see a difference in padding between 32 bit and 64 bit when using built-in datatypes that have different sizes on both architectures, for instance pointer types (int*
).
2
Default alignment is determined by wordsize however. Reason beeing that words are adressed in memory so that they fit into registers perfectly. On x86(_64) unaligned data requires a shift operation to work with it. On other paltforms like sun sparc unaligned data will cause a bus exception. If you want to remove padding try adding__attribute__((packed))
(GCC) to the struct definition.
– Nefrin
Apr 30 at 12:20
1
@Nefrin Do you have a reference for that? I am not aware of any such behavior for the C or C++ built-in datatypes.
– ComicSansMS
Apr 30 at 12:22
2
Also this is not C/C++ Language behaviour but rather compiler behaviour
– Nefrin
Apr 30 at 12:50
2
@Nefrin: x86-64 has efficient unaligned loads that handle the required shift in hardware. Yes it's normal for types wider than an integer register to only get aligned to the register width in some ABIs (like i386 System V, and 32-bit Windows). But for x86-64 System V,alignof(__int128) = 16
so it can be copied with SSE vectors, or forlock cmpxchg16b
. But as far as the C++ standard is concerned, that's all up to the implementation. And the struct-packing rules are allowed to be different from what you'd expect based on justalignof(member)
, as P.W's answer shows is the case for MSVC.
– Peter Cordes
Apr 30 at 19:58
2
@ComicSansMS:alignof(int64_t) == 8
in 32-bit MSVC, but it doesn't actually bother to ensure that for locals on the stack so that's not really the minimum alignment necessary for anyint64_t
object. If you usealignas(8) int64_t tmp;
, you get extra instructions to align the stack pointer which you don't get with justint64_t tmp
. godbolt.org/z/lsuXAQ. The struct-packing rules are allowed to be more complicated than just padding toalignof(T)
relative to the start of the struct, as @P.W's answer shows.
– Peter Cordes
Apr 30 at 20:13
|
show 3 more comments
The padding is not determined by the word size, but by the alignment of each data type.
In most cases, the alignment requirement is equal to the type's size. So for a 64 bit type like int64
you will get an 8 byte (64 bit) alignment. Padding needs to be inserted into the struct to make sure that the storage for the type ends up at an address that is properly aligned.
You may see a difference in padding between 32 bit and 64 bit when using built-in datatypes that have different sizes on both architectures, for instance pointer types (int*
).
2
Default alignment is determined by wordsize however. Reason beeing that words are adressed in memory so that they fit into registers perfectly. On x86(_64) unaligned data requires a shift operation to work with it. On other paltforms like sun sparc unaligned data will cause a bus exception. If you want to remove padding try adding__attribute__((packed))
(GCC) to the struct definition.
– Nefrin
Apr 30 at 12:20
1
@Nefrin Do you have a reference for that? I am not aware of any such behavior for the C or C++ built-in datatypes.
– ComicSansMS
Apr 30 at 12:22
2
Also this is not C/C++ Language behaviour but rather compiler behaviour
– Nefrin
Apr 30 at 12:50
2
@Nefrin: x86-64 has efficient unaligned loads that handle the required shift in hardware. Yes it's normal for types wider than an integer register to only get aligned to the register width in some ABIs (like i386 System V, and 32-bit Windows). But for x86-64 System V,alignof(__int128) = 16
so it can be copied with SSE vectors, or forlock cmpxchg16b
. But as far as the C++ standard is concerned, that's all up to the implementation. And the struct-packing rules are allowed to be different from what you'd expect based on justalignof(member)
, as P.W's answer shows is the case for MSVC.
– Peter Cordes
Apr 30 at 19:58
2
@ComicSansMS:alignof(int64_t) == 8
in 32-bit MSVC, but it doesn't actually bother to ensure that for locals on the stack so that's not really the minimum alignment necessary for anyint64_t
object. If you usealignas(8) int64_t tmp;
, you get extra instructions to align the stack pointer which you don't get with justint64_t tmp
. godbolt.org/z/lsuXAQ. The struct-packing rules are allowed to be more complicated than just padding toalignof(T)
relative to the start of the struct, as @P.W's answer shows.
– Peter Cordes
Apr 30 at 20:13
|
show 3 more comments
The padding is not determined by the word size, but by the alignment of each data type.
In most cases, the alignment requirement is equal to the type's size. So for a 64 bit type like int64
you will get an 8 byte (64 bit) alignment. Padding needs to be inserted into the struct to make sure that the storage for the type ends up at an address that is properly aligned.
You may see a difference in padding between 32 bit and 64 bit when using built-in datatypes that have different sizes on both architectures, for instance pointer types (int*
).
The padding is not determined by the word size, but by the alignment of each data type.
In most cases, the alignment requirement is equal to the type's size. So for a 64 bit type like int64
you will get an 8 byte (64 bit) alignment. Padding needs to be inserted into the struct to make sure that the storage for the type ends up at an address that is properly aligned.
You may see a difference in padding between 32 bit and 64 bit when using built-in datatypes that have different sizes on both architectures, for instance pointer types (int*
).
edited Apr 30 at 11:49
answered Apr 30 at 11:43
ComicSansMSComicSansMS
33.9k691119
33.9k691119
2
Default alignment is determined by wordsize however. Reason beeing that words are adressed in memory so that they fit into registers perfectly. On x86(_64) unaligned data requires a shift operation to work with it. On other paltforms like sun sparc unaligned data will cause a bus exception. If you want to remove padding try adding__attribute__((packed))
(GCC) to the struct definition.
– Nefrin
Apr 30 at 12:20
1
@Nefrin Do you have a reference for that? I am not aware of any such behavior for the C or C++ built-in datatypes.
– ComicSansMS
Apr 30 at 12:22
2
Also this is not C/C++ Language behaviour but rather compiler behaviour
– Nefrin
Apr 30 at 12:50
2
@Nefrin: x86-64 has efficient unaligned loads that handle the required shift in hardware. Yes it's normal for types wider than an integer register to only get aligned to the register width in some ABIs (like i386 System V, and 32-bit Windows). But for x86-64 System V,alignof(__int128) = 16
so it can be copied with SSE vectors, or forlock cmpxchg16b
. But as far as the C++ standard is concerned, that's all up to the implementation. And the struct-packing rules are allowed to be different from what you'd expect based on justalignof(member)
, as P.W's answer shows is the case for MSVC.
– Peter Cordes
Apr 30 at 19:58
2
@ComicSansMS:alignof(int64_t) == 8
in 32-bit MSVC, but it doesn't actually bother to ensure that for locals on the stack so that's not really the minimum alignment necessary for anyint64_t
object. If you usealignas(8) int64_t tmp;
, you get extra instructions to align the stack pointer which you don't get with justint64_t tmp
. godbolt.org/z/lsuXAQ. The struct-packing rules are allowed to be more complicated than just padding toalignof(T)
relative to the start of the struct, as @P.W's answer shows.
– Peter Cordes
Apr 30 at 20:13
|
show 3 more comments
2
Default alignment is determined by wordsize however. Reason beeing that words are adressed in memory so that they fit into registers perfectly. On x86(_64) unaligned data requires a shift operation to work with it. On other paltforms like sun sparc unaligned data will cause a bus exception. If you want to remove padding try adding__attribute__((packed))
(GCC) to the struct definition.
– Nefrin
Apr 30 at 12:20
1
@Nefrin Do you have a reference for that? I am not aware of any such behavior for the C or C++ built-in datatypes.
– ComicSansMS
Apr 30 at 12:22
2
Also this is not C/C++ Language behaviour but rather compiler behaviour
– Nefrin
Apr 30 at 12:50
2
@Nefrin: x86-64 has efficient unaligned loads that handle the required shift in hardware. Yes it's normal for types wider than an integer register to only get aligned to the register width in some ABIs (like i386 System V, and 32-bit Windows). But for x86-64 System V,alignof(__int128) = 16
so it can be copied with SSE vectors, or forlock cmpxchg16b
. But as far as the C++ standard is concerned, that's all up to the implementation. And the struct-packing rules are allowed to be different from what you'd expect based on justalignof(member)
, as P.W's answer shows is the case for MSVC.
– Peter Cordes
Apr 30 at 19:58
2
@ComicSansMS:alignof(int64_t) == 8
in 32-bit MSVC, but it doesn't actually bother to ensure that for locals on the stack so that's not really the minimum alignment necessary for anyint64_t
object. If you usealignas(8) int64_t tmp;
, you get extra instructions to align the stack pointer which you don't get with justint64_t tmp
. godbolt.org/z/lsuXAQ. The struct-packing rules are allowed to be more complicated than just padding toalignof(T)
relative to the start of the struct, as @P.W's answer shows.
– Peter Cordes
Apr 30 at 20:13
2
2
Default alignment is determined by wordsize however. Reason beeing that words are adressed in memory so that they fit into registers perfectly. On x86(_64) unaligned data requires a shift operation to work with it. On other paltforms like sun sparc unaligned data will cause a bus exception. If you want to remove padding try adding
__attribute__((packed))
(GCC) to the struct definition.– Nefrin
Apr 30 at 12:20
Default alignment is determined by wordsize however. Reason beeing that words are adressed in memory so that they fit into registers perfectly. On x86(_64) unaligned data requires a shift operation to work with it. On other paltforms like sun sparc unaligned data will cause a bus exception. If you want to remove padding try adding
__attribute__((packed))
(GCC) to the struct definition.– Nefrin
Apr 30 at 12:20
1
1
@Nefrin Do you have a reference for that? I am not aware of any such behavior for the C or C++ built-in datatypes.
– ComicSansMS
Apr 30 at 12:22
@Nefrin Do you have a reference for that? I am not aware of any such behavior for the C or C++ built-in datatypes.
– ComicSansMS
Apr 30 at 12:22
2
2
Also this is not C/C++ Language behaviour but rather compiler behaviour
– Nefrin
Apr 30 at 12:50
Also this is not C/C++ Language behaviour but rather compiler behaviour
– Nefrin
Apr 30 at 12:50
2
2
@Nefrin: x86-64 has efficient unaligned loads that handle the required shift in hardware. Yes it's normal for types wider than an integer register to only get aligned to the register width in some ABIs (like i386 System V, and 32-bit Windows). But for x86-64 System V,
alignof(__int128) = 16
so it can be copied with SSE vectors, or for lock cmpxchg16b
. But as far as the C++ standard is concerned, that's all up to the implementation. And the struct-packing rules are allowed to be different from what you'd expect based on just alignof(member)
, as P.W's answer shows is the case for MSVC.– Peter Cordes
Apr 30 at 19:58
@Nefrin: x86-64 has efficient unaligned loads that handle the required shift in hardware. Yes it's normal for types wider than an integer register to only get aligned to the register width in some ABIs (like i386 System V, and 32-bit Windows). But for x86-64 System V,
alignof(__int128) = 16
so it can be copied with SSE vectors, or for lock cmpxchg16b
. But as far as the C++ standard is concerned, that's all up to the implementation. And the struct-packing rules are allowed to be different from what you'd expect based on just alignof(member)
, as P.W's answer shows is the case for MSVC.– Peter Cordes
Apr 30 at 19:58
2
2
@ComicSansMS:
alignof(int64_t) == 8
in 32-bit MSVC, but it doesn't actually bother to ensure that for locals on the stack so that's not really the minimum alignment necessary for any int64_t
object. If you use alignas(8) int64_t tmp;
, you get extra instructions to align the stack pointer which you don't get with just int64_t tmp
. godbolt.org/z/lsuXAQ. The struct-packing rules are allowed to be more complicated than just padding to alignof(T)
relative to the start of the struct, as @P.W's answer shows.– Peter Cordes
Apr 30 at 20:13
@ComicSansMS:
alignof(int64_t) == 8
in 32-bit MSVC, but it doesn't actually bother to ensure that for locals on the stack so that's not really the minimum alignment necessary for any int64_t
object. If you use alignas(8) int64_t tmp;
, you get extra instructions to align the stack pointer which you don't get with just int64_t tmp
. godbolt.org/z/lsuXAQ. The struct-packing rules are allowed to be more complicated than just padding to alignof(T)
relative to the start of the struct, as @P.W's answer shows.– Peter Cordes
Apr 30 at 20:13
|
show 3 more comments
This is a matter of alignment requirement of the data type as specified in
Padding and Alignment of Structure Members
Every data object has an alignment-requirement. The alignment-requirement for all data except structures, unions, and arrays is either the size of the object or the current packing size (specified with either
/Zp
or the pack pragma, whichever is less).
And the default value for structure member alignment is specified in /Zp (Struct Member Alignment)
The available packing values are described in the following table:
/
Zp
argument Effect
1 Packs structures on 1-byte boundaries. Same as /Zp.
2 Packs structures on 2-byte boundaries.
4 Packs structures on 4-byte boundaries.
8 Packs structures on 8-byte boundaries (default for x86, ARM, and ARM64).
16 Packs structures on 16-byte boundaries (default for x64).
Since the default for x86 is /Zp8 which is 8 bytes, the output is 16.
However, you can specify a different packing size with /Zp
option.
Here is a Live Demo with /Zp4
which gives the output as 12 instead of 16.
add a comment |
This is a matter of alignment requirement of the data type as specified in
Padding and Alignment of Structure Members
Every data object has an alignment-requirement. The alignment-requirement for all data except structures, unions, and arrays is either the size of the object or the current packing size (specified with either
/Zp
or the pack pragma, whichever is less).
And the default value for structure member alignment is specified in /Zp (Struct Member Alignment)
The available packing values are described in the following table:
/
Zp
argument Effect
1 Packs structures on 1-byte boundaries. Same as /Zp.
2 Packs structures on 2-byte boundaries.
4 Packs structures on 4-byte boundaries.
8 Packs structures on 8-byte boundaries (default for x86, ARM, and ARM64).
16 Packs structures on 16-byte boundaries (default for x64).
Since the default for x86 is /Zp8 which is 8 bytes, the output is 16.
However, you can specify a different packing size with /Zp
option.
Here is a Live Demo with /Zp4
which gives the output as 12 instead of 16.
add a comment |
This is a matter of alignment requirement of the data type as specified in
Padding and Alignment of Structure Members
Every data object has an alignment-requirement. The alignment-requirement for all data except structures, unions, and arrays is either the size of the object or the current packing size (specified with either
/Zp
or the pack pragma, whichever is less).
And the default value for structure member alignment is specified in /Zp (Struct Member Alignment)
The available packing values are described in the following table:
/
Zp
argument Effect
1 Packs structures on 1-byte boundaries. Same as /Zp.
2 Packs structures on 2-byte boundaries.
4 Packs structures on 4-byte boundaries.
8 Packs structures on 8-byte boundaries (default for x86, ARM, and ARM64).
16 Packs structures on 16-byte boundaries (default for x64).
Since the default for x86 is /Zp8 which is 8 bytes, the output is 16.
However, you can specify a different packing size with /Zp
option.
Here is a Live Demo with /Zp4
which gives the output as 12 instead of 16.
This is a matter of alignment requirement of the data type as specified in
Padding and Alignment of Structure Members
Every data object has an alignment-requirement. The alignment-requirement for all data except structures, unions, and arrays is either the size of the object or the current packing size (specified with either
/Zp
or the pack pragma, whichever is less).
And the default value for structure member alignment is specified in /Zp (Struct Member Alignment)
The available packing values are described in the following table:
/
Zp
argument Effect
1 Packs structures on 1-byte boundaries. Same as /Zp.
2 Packs structures on 2-byte boundaries.
4 Packs structures on 4-byte boundaries.
8 Packs structures on 8-byte boundaries (default for x86, ARM, and ARM64).
16 Packs structures on 16-byte boundaries (default for x64).
Since the default for x86 is /Zp8 which is 8 bytes, the output is 16.
However, you can specify a different packing size with /Zp
option.
Here is a Live Demo with /Zp4
which gives the output as 12 instead of 16.
edited Apr 30 at 12:28
answered Apr 30 at 11:57
P.WP.W
19.9k41961
19.9k41961
add a comment |
add a comment |
A struct's alignment is the size of its largest member.
That means if you have an 8-byte(64bit) member in the struct, then the struct will align to 8 bytes.
In the case that you are describing, if the compiler allows the struct to align to 4 bytes, it possibly leads to an 8-byte member lying across the cache line boundary.
Say we have a CPU that has a 16-byte cache line.
Consider a struct like this:
struct Z
char s; // 1-4 byte
__int64 i; // 5-12 byte
__int64 i2; // 13-20 byte, need two cache line fetches to read this variable
;
3
Nope, a struct's alignment is the alignment of its most-aligned member. Not all C types have a size that matches their alignment, notably other composite types like astruct
or array inside anotherstruct
, but primitive types aren't guaranteed to havealignof(T) == sizeof(T)
. On an ABI like i386 System V (32-bit x86 Linux),alignof(int64_t) == 4
, so the OP would see their expectedsizeof(struct)==12
andalignof(struct)==4
.
– Peter Cordes
Apr 30 at 19:08
add a comment |
A struct's alignment is the size of its largest member.
That means if you have an 8-byte(64bit) member in the struct, then the struct will align to 8 bytes.
In the case that you are describing, if the compiler allows the struct to align to 4 bytes, it possibly leads to an 8-byte member lying across the cache line boundary.
Say we have a CPU that has a 16-byte cache line.
Consider a struct like this:
struct Z
char s; // 1-4 byte
__int64 i; // 5-12 byte
__int64 i2; // 13-20 byte, need two cache line fetches to read this variable
;
3
Nope, a struct's alignment is the alignment of its most-aligned member. Not all C types have a size that matches their alignment, notably other composite types like astruct
or array inside anotherstruct
, but primitive types aren't guaranteed to havealignof(T) == sizeof(T)
. On an ABI like i386 System V (32-bit x86 Linux),alignof(int64_t) == 4
, so the OP would see their expectedsizeof(struct)==12
andalignof(struct)==4
.
– Peter Cordes
Apr 30 at 19:08
add a comment |
A struct's alignment is the size of its largest member.
That means if you have an 8-byte(64bit) member in the struct, then the struct will align to 8 bytes.
In the case that you are describing, if the compiler allows the struct to align to 4 bytes, it possibly leads to an 8-byte member lying across the cache line boundary.
Say we have a CPU that has a 16-byte cache line.
Consider a struct like this:
struct Z
char s; // 1-4 byte
__int64 i; // 5-12 byte
__int64 i2; // 13-20 byte, need two cache line fetches to read this variable
;
A struct's alignment is the size of its largest member.
That means if you have an 8-byte(64bit) member in the struct, then the struct will align to 8 bytes.
In the case that you are describing, if the compiler allows the struct to align to 4 bytes, it possibly leads to an 8-byte member lying across the cache line boundary.
Say we have a CPU that has a 16-byte cache line.
Consider a struct like this:
struct Z
char s; // 1-4 byte
__int64 i; // 5-12 byte
__int64 i2; // 13-20 byte, need two cache line fetches to read this variable
;
answered Apr 30 at 15:02
mitubamituba
255
255
3
Nope, a struct's alignment is the alignment of its most-aligned member. Not all C types have a size that matches their alignment, notably other composite types like astruct
or array inside anotherstruct
, but primitive types aren't guaranteed to havealignof(T) == sizeof(T)
. On an ABI like i386 System V (32-bit x86 Linux),alignof(int64_t) == 4
, so the OP would see their expectedsizeof(struct)==12
andalignof(struct)==4
.
– Peter Cordes
Apr 30 at 19:08
add a comment |
3
Nope, a struct's alignment is the alignment of its most-aligned member. Not all C types have a size that matches their alignment, notably other composite types like astruct
or array inside anotherstruct
, but primitive types aren't guaranteed to havealignof(T) == sizeof(T)
. On an ABI like i386 System V (32-bit x86 Linux),alignof(int64_t) == 4
, so the OP would see their expectedsizeof(struct)==12
andalignof(struct)==4
.
– Peter Cordes
Apr 30 at 19:08
3
3
Nope, a struct's alignment is the alignment of its most-aligned member. Not all C types have a size that matches their alignment, notably other composite types like a
struct
or array inside another struct
, but primitive types aren't guaranteed to have alignof(T) == sizeof(T)
. On an ABI like i386 System V (32-bit x86 Linux), alignof(int64_t) == 4
, so the OP would see their expected sizeof(struct)==12
and alignof(struct)==4
.– Peter Cordes
Apr 30 at 19:08
Nope, a struct's alignment is the alignment of its most-aligned member. Not all C types have a size that matches their alignment, notably other composite types like a
struct
or array inside another struct
, but primitive types aren't guaranteed to have alignof(T) == sizeof(T)
. On an ABI like i386 System V (32-bit x86 Linux), alignof(int64_t) == 4
, so the OP would see their expected sizeof(struct)==12
and alignof(struct)==4
.– Peter Cordes
Apr 30 at 19:08
add a comment |
Shen Yuan is a new contributor. Be nice, and check out our Code of Conduct.
Shen Yuan is a new contributor. Be nice, and check out our Code of Conduct.
Shen Yuan is a new contributor. Be nice, and check out our Code of Conduct.
Shen Yuan is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55920103%2fwhy-is-the-alignment-the-same-on-32-bit-and-64-bit-systems%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
Related: Why is the default alignment for
int64_t
8 byte on 32 bit x86 architecture?, Memory alignment on a 32-bit Intel processor.– Daniel Langr
Apr 30 at 12:40