Why is the “alignment” the same on 32-bit and 64-bit systems?Why is integer assignment on a naturally aligned variable atomic on x86?Memory alignment on a 32-bit Intel processorWhy does unaligned access to mmap'ed memory sometimes segfault on AMD64?Why is the default alignment for `int64_t` 8 byte on 32 bit x86 architecture?How do you set, clear, and toggle a single bit?Why can templates only be implemented in the header file?Why is “using namespace std” considered bad practice?What is an application binary interface (ABI)?Why are elementwise additions much faster in separate loops than in a combined loop?Why is reading lines from stdin much slower in C++ than Python?Why is it faster to process a sorted array than an unsorted array?Why should I use a pointer rather than the object itself?Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviationsWhy g++ isn't performing structure packing here?

Out of scope work duties and resignation

Agena docking and RCS Brakes in First Man

Is Soreness in Middle Knuckle of Fretting Hand Index Finger Normal for Beginners?

Has the United States ever had a non-Christian President?

What is a common way to tell if an academic is "above average," or outstanding in their field? Is their h-index (Hirsh index) one of them?

Why would a military not separate its forces into different branches?

How do I calculate how many of an item I'll have in this inventory system?

How can I get people to remember my character's gender?

When an imagined world resembles or has similarities with a famous world

Should homeowners insurance cover the cost of the home?

Python 3 - simple temperature program

To kill a cuckoo

Is 'contemporary' ambiguous and if so is there a better word?

A factorization game

Is disk brake effectiveness mitigated by tyres losing traction under strong braking?

Feasibility of lava beings?

Will 700 more planes a day fly because of the Heathrow expansion?

Is there an age requirement to play in Adventurers League?

What do "Sech" and "Vich" mean in this sentence?

Indentation Tex

Is there a word for food that's gone 'bad', but is still edible?

How to pass hash as password to ssh server

Mug and wireframe entirely disappeared

When does tabularx decide to break the cell entry instead of reducing the columns separation?



Why is the “alignment” the same on 32-bit and 64-bit systems?


Why is integer assignment on a naturally aligned variable atomic on x86?Memory alignment on a 32-bit Intel processorWhy does unaligned access to mmap'ed memory sometimes segfault on AMD64?Why is the default alignment for `int64_t` 8 byte on 32 bit x86 architecture?How do you set, clear, and toggle a single bit?Why can templates only be implemented in the header file?Why is “using namespace std” considered bad practice?What is an application binary interface (ABI)?Why are elementwise additions much faster in separate loops than in a combined loop?Why is reading lines from stdin much slower in C++ than Python?Why is it faster to process a sorted array than an unsorted array?Why should I use a pointer rather than the object itself?Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviationsWhy g++ isn't performing structure packing here?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








14















I was wondering whether the compiler would use different padding on 32-bit and 64-bit systems, so I wrote the code below in a simple VS2019 C++ console project:



struct Z

char s;
__int64 i;
;

int main()

std::cout << sizeof(Z) <<"n";



What I expected on each "Platform" setting:



x86: 12
X64: 16


Actual result:



x86: 16
X64: 16


Since the memory word size on x86 is 4 bytes, this means it has to store the bytes of i in two different words. So I thought the compiler would do padding this way:



struct Z

char s;
char _pad[3];
__int64 i;
;


So may I know what the reason behind this is?



  1. For forward-compatibility with the 64-bit system?

  2. Due to the limitation of supporting 64-bit numbers on the 32-bit processor?









share|improve this question









New contributor




Shen Yuan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.















  • 3





    Related: Why is the default alignment for int64_t 8 byte on 32 bit x86 architecture?, Memory alignment on a 32-bit Intel processor.

    – Daniel Langr
    Apr 30 at 12:40

















14















I was wondering whether the compiler would use different padding on 32-bit and 64-bit systems, so I wrote the code below in a simple VS2019 C++ console project:



struct Z

char s;
__int64 i;
;

int main()

std::cout << sizeof(Z) <<"n";



What I expected on each "Platform" setting:



x86: 12
X64: 16


Actual result:



x86: 16
X64: 16


Since the memory word size on x86 is 4 bytes, this means it has to store the bytes of i in two different words. So I thought the compiler would do padding this way:



struct Z

char s;
char _pad[3];
__int64 i;
;


So may I know what the reason behind this is?



  1. For forward-compatibility with the 64-bit system?

  2. Due to the limitation of supporting 64-bit numbers on the 32-bit processor?









share|improve this question









New contributor




Shen Yuan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.















  • 3





    Related: Why is the default alignment for int64_t 8 byte on 32 bit x86 architecture?, Memory alignment on a 32-bit Intel processor.

    – Daniel Langr
    Apr 30 at 12:40













14












14








14


1






I was wondering whether the compiler would use different padding on 32-bit and 64-bit systems, so I wrote the code below in a simple VS2019 C++ console project:



struct Z

char s;
__int64 i;
;

int main()

std::cout << sizeof(Z) <<"n";



What I expected on each "Platform" setting:



x86: 12
X64: 16


Actual result:



x86: 16
X64: 16


Since the memory word size on x86 is 4 bytes, this means it has to store the bytes of i in two different words. So I thought the compiler would do padding this way:



struct Z

char s;
char _pad[3];
__int64 i;
;


So may I know what the reason behind this is?



  1. For forward-compatibility with the 64-bit system?

  2. Due to the limitation of supporting 64-bit numbers on the 32-bit processor?









share|improve this question









New contributor




Shen Yuan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












I was wondering whether the compiler would use different padding on 32-bit and 64-bit systems, so I wrote the code below in a simple VS2019 C++ console project:



struct Z

char s;
__int64 i;
;

int main()

std::cout << sizeof(Z) <<"n";



What I expected on each "Platform" setting:



x86: 12
X64: 16


Actual result:



x86: 16
X64: 16


Since the memory word size on x86 is 4 bytes, this means it has to store the bytes of i in two different words. So I thought the compiler would do padding this way:



struct Z

char s;
char _pad[3];
__int64 i;
;


So may I know what the reason behind this is?



  1. For forward-compatibility with the 64-bit system?

  2. Due to the limitation of supporting 64-bit numbers on the 32-bit processor?






c++ visual-c++ 32bit-64bit memory-alignment abi






share|improve this question









New contributor




Shen Yuan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Shen Yuan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited Apr 30 at 19:17









Peter Cordes

138k19210354




138k19210354






New contributor




Shen Yuan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked Apr 30 at 11:38









Shen YuanShen Yuan

735




735




New contributor




Shen Yuan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Shen Yuan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Shen Yuan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







  • 3





    Related: Why is the default alignment for int64_t 8 byte on 32 bit x86 architecture?, Memory alignment on a 32-bit Intel processor.

    – Daniel Langr
    Apr 30 at 12:40












  • 3





    Related: Why is the default alignment for int64_t 8 byte on 32 bit x86 architecture?, Memory alignment on a 32-bit Intel processor.

    – Daniel Langr
    Apr 30 at 12:40







3




3





Related: Why is the default alignment for int64_t 8 byte on 32 bit x86 architecture?, Memory alignment on a 32-bit Intel processor.

– Daniel Langr
Apr 30 at 12:40





Related: Why is the default alignment for int64_t 8 byte on 32 bit x86 architecture?, Memory alignment on a 32-bit Intel processor.

– Daniel Langr
Apr 30 at 12:40












4 Answers
4






active

oldest

votes


















4














Size and alignof() (minimum alignment that any object of that type must have) for each primitive type is an ABI1 design choice separate from the register width of the architecture.



Struct-packing rules can also be more complicated than just aligning each struct member to its minimum alignment inside the struct; that's another part of the ABI.



MSVC targeting 32-bit x86 gives __int64 a minimum alignment of 4, but its default struct-packing rules align types within structs to min(8, sizeof(T)) relative to the start of the struct. (For non-aggregate types only). That's not a direct quote, that's my paraphrase of the MSVC docs link from @P.W's answer, based on what MSVC seems to actually do. (I suspect the "whichever is less" in the text is supposed to be outside the parens, but maybe they're making a different point about the interaction on the pragma and the command-line option?)



(An 8-byte struct containing a char[8] still only gets 1-byte alignment inside another struct, or a struct containing an alignas(16) member still gets 16-byte alignment inside another struct.)



Note that ISO C++ doesn't guarantee that primitive types have alignof(T) == sizeof(T). Also note that MSVC's definition of alignof() doesn't match the ISO C++ standard: MSVC says alignof(__int64) == 8, but some __int64 objects have less than that alignment2.




So surprisingly, we get extra padding even though MSVC doesn't always bother to make sure the struct itself has any more than 4-byte alignment, unless you specify that with alignas() on the variable, or on a struct member to imply that for the type. (e.g. a local struct Z tmp on the stack inside a function will only have 4-byte alignment, because MSVC doesn't use extra instructions like and esp, -8 to round the stack pointer down to an 8-byte boundary.)



However, new / malloc does give you 8-byte-aligned memory in 32-bit mode, so this makes a lot of sense for dynamically-allocated objects (which are common). Forcing locals on the stack to be fully aligned would add cost to align the stack pointer, but by setting struct layout to take advantage of 8-byte-aligned storage, we get the advantage for static and dynamic storage.




This might also be designed to get 32 and 64-bit code to agree on some struct layouts for shared memory. (But note that the default for x86-64 is min(16, sizeof(T)), so they still don't fully agree on struct layout if there are any 16-byte types that aren't aggregates (struct/union/array) and don't have an alignas.)




The minimum absolute alignment of 4 comes from the 4-byte stack alignment that 32-bit code can assume. In static storage, compilers will choose natural alignment up to maybe 8 or 16 bytes for vars outside of structs, for efficient copying with SSE2 vectors.



In larger functions, MSVC may decide to align the stack by 8 for performance reasons, e.g. for double vars on the stack which actually can be manipulated with single instructions, or maybe also for int64_t with SSE2 vectors. See the Stack Alignment section in this 2006 article: Windows Data Alignment on IPF, x86, and x64. So in 32-bit code you can't depend on an int64_t* or double* being naturally aligned.



(I'm not sure if MSVC will ever create even less aligned int64_t or double objects on its own. Certainly yes if you use #pragma pack 1 or -Zp1, but that changes the ABI. But otherwise probably not, unless you carve space for an int64_t out of a buffer manually and don't bother to align it. But assuming alignof(int64_t) is still 8, that would be C++ undefined behaviour.)



If you use alignas(8) int64_t tmp, MSVC emits extra instructions to and esp, -8. If you don't, MSVC doesn't do anything special, so it's luck whether or not tmp ends up 8-byte aligned or not.




Other designs are possible, for example the i386 System V ABI (used on most non-Windows OSes) has alignof(long long) = 4 but sizeof(long long) = 8. These choices



Outside of structs (e.g. global vars or locals on the stack), modern compilers in 32-bit mode do choose to align int64_t to an 8-byte boundary for efficiency (so it can be loaded / copied with MMX or SSE2 64-bit loads, or x87 fild to do int64_t -> double conversion).



This is one reason why modern version of the i386 System V ABI maintain 16-byte stack alignment: so 8-byte and 16-byte aligned local vars are possible.




When the 32-bit Windows ABI was being designed, Pentium CPUs were at least on the horizon. Pentium has 64-bit wide data busses, so its FPU really can load a 64-bit double in a single cache access if it's 64-bit aligned.



Or for fild / fistp, load/store a 64-bit integer when converting to/from double. Fun fact: naturally aligned accesses up to 64 bits are guaranteed atomic on x86, since Pentium: Why is integer assignment on a naturally aligned variable atomic on x86?




Footnote 1: An ABI also includes a calling convention, or in the case of MS Windows, a choice of various calling conventions which you can declare with function attributes like __fastcall), but the sizes and alignment-requirements for primitive types like long long are also something that compilers have to agree on to make functions that can call each other. (The ISO C++ standard only talks about a single "C++ implementation"; ABI standards are how "C++ implementations" make themselves compatible with each other.)



Note that struct-layout rules are also part of the ABI: compilers have to agree with each other on struct layout to create compatible binaries that pass around structs or pointers to structs. Otherwise s.x = 10; foo(&x); might write to a different offset relative to the base of the struct than separately-compiled foo() (maybe in a DLL) was expecting to read it at.




Footnote 2:



GCC had this C++ alignof() bug, too, until it was fixed in 2018 for g++8 some time after being fixed for C11 _Alignof(). See that bug report for some discussion based on quotes from the standard which conclude that alignof(T) should really report the minimum guaranteed alignment you can ever see, not the preferred alignment you want for performance. i.e. that using an int64_t* with less than alignof(int64_t) alignment is undefined behaviour.



(It will usually work fine on x86, but vectorization that assumes a whole number of int64_t iterations will reach a 16 or 32-byte alignment boundary can fault. See Why does unaligned access to mmap'ed memory sometimes segfault on AMD64? for an example with gcc.)



The gcc bug report discusses the i386 System V ABI, which has different struct-packing rules than MSVC: based on minimum alignment, not preferred. But modern i386 System V maintains 16-byte stack alignment, so it's only inside structs (because of struct-packing rules that are part of the ABI) that the compiler ever creates int64_t and double objects that are less than naturally aligned. Anyway, that's why the GCC bug report was discussing struct members as the special case.



Kind of the opposite from 32-bit Windows with MSVC where the struct-packing rules are compatible with an alignof(int64_t) == 8 but locals on the stack are always potentially under-aligned unless you use alignas() to specifically request alignment.



32-bit MSVC has the bizarre behaviour that alignas(int64_t) int64_t tmp is not the same as int64_t tmp;, and emits extra instructions to align the stack. That's because alignas(int64_t) is like alignas(8), which is more aligned than the actual minimum.



void extfunc(int64_t *);

void foo_align8(void)
alignas(int64_t) int64_t tmp;
extfunc(&tmp);



(32-bit) x86 MSVC 19.20 -O2 compiles it like so (on Godbolt, also includes 32-bit GCC and the struct test-case):



_tmp$ = -8 ; size = 8
void foo_align8(void) PROC ; foo_align8, COMDAT
push ebp
mov ebp, esp
and esp, -8 ; fffffff8H align the stack
sub esp, 8 ; and reserve 8 bytes
lea eax, DWORD PTR _tmp$[esp+8] ; get a pointer to those 8 bytes
push eax ; pass the pointer as an arg
call void extfunc(__int64 *) ; extfunc
add esp, 4
mov esp, ebp
pop ebp
ret 0


But without the alignas(), or with alignas(4), we get the much simpler



_tmp$ = -8 ; size = 8
void foo_noalign(void) PROC ; foo_noalign, COMDAT
sub esp, 8 ; reserve 8 bytes
lea eax, DWORD PTR _tmp$[esp+8] ; "calculate" a pointer to it
push eax ; pass the pointer as a function arg
call void extfunc(__int64 *) ; extfunc
add esp, 12 ; 0000000cH
ret 0


It could just push esp instead of LEA/push; that's a minor missed optimization.



Passing a pointer to a non-inline function proves that it's not just locally bending the rules. Some other function that just gets an int64_t* as an arg has to deal with this potentially under-aligned pointer, without having gotten any information about where it came from.



If alignof(int64_t) was really 8, that function could be hand-written in asm in a way that faulted on misaligned pointers. Or it could be written in C with SSE2 intrinsics like _mm_load_si128() that require 16-byte alignment, after handling 0 or 1 elements to reach an alignment boundary.



But with MSVC's actual behaviour, it's possible that none of the int64_t array elements are aligned by 16, because they all span an 8-byte boundary.




BTW, I wouldn't recommend using compiler-specific types like __int64 directly. You can write portable code by using int64_t from <cstdint>, aka <stdint.h>.



In MSVC, int64_t will be the same type as __int64.



On other platforms, it will typically be long or long long. int64_t is guaranteed to be exactly 64 bits with no padding, and 2's complement, if provided at all. (It is by all sane compilers targeting normal CPUs. C99 and C++ require long long to be at least 64-bit, and on machines with 8-bit bytes and registers that are a power of 2, long long is normally exactly 64 bits and can be used as int64_t. Or if long is a 64-bit type, then <cstdint> might use that as the typedef.)



I assume __int64 and long long are the same type in MSVC, but MSVC doesn't enforce strict-aliasing anyway so it doesn't matter whether they're the exact same type or not, just that they use the same representation.






share|improve this answer























  • OK, I must admit I had never suspected I could get an answer like this which covered so many aspects. Thanks!

    – Shen Yuan
    2 days ago






  • 1





    @ShenYuan: I was surprised how non-simple a sufficient explanation turned out to be. MSVC's struct-packing rules (using preferred alignment instead of actual minimum alignment) are a lot different from what I'm familiar with (i386 System V), so the sizeof(Z) == 16 in your question got me curious. I knew 32-bit Windows only keeps the stack 4-byte aligned, so there was a real mystery to solve in terms of when it does do extra stack alignment for locals and what the actual minimum alignments were.

    – Peter Cordes
    2 days ago



















12














The padding is not determined by the word size, but by the alignment of each data type.



In most cases, the alignment requirement is equal to the type's size. So for a 64 bit type like int64 you will get an 8 byte (64 bit) alignment. Padding needs to be inserted into the struct to make sure that the storage for the type ends up at an address that is properly aligned.



You may see a difference in padding between 32 bit and 64 bit when using built-in datatypes that have different sizes on both architectures, for instance pointer types (int*).






share|improve this answer




















  • 2





    Default alignment is determined by wordsize however. Reason beeing that words are adressed in memory so that they fit into registers perfectly. On x86(_64) unaligned data requires a shift operation to work with it. On other paltforms like sun sparc unaligned data will cause a bus exception. If you want to remove padding try adding __attribute__((packed)) (GCC) to the struct definition.

    – Nefrin
    Apr 30 at 12:20






  • 1





    @Nefrin Do you have a reference for that? I am not aware of any such behavior for the C or C++ built-in datatypes.

    – ComicSansMS
    Apr 30 at 12:22






  • 2





    Also this is not C/C++ Language behaviour but rather compiler behaviour

    – Nefrin
    Apr 30 at 12:50






  • 2





    @Nefrin: x86-64 has efficient unaligned loads that handle the required shift in hardware. Yes it's normal for types wider than an integer register to only get aligned to the register width in some ABIs (like i386 System V, and 32-bit Windows). But for x86-64 System V, alignof(__int128) = 16 so it can be copied with SSE vectors, or for lock cmpxchg16b. But as far as the C++ standard is concerned, that's all up to the implementation. And the struct-packing rules are allowed to be different from what you'd expect based on just alignof(member), as P.W's answer shows is the case for MSVC.

    – Peter Cordes
    Apr 30 at 19:58







  • 2





    @ComicSansMS: alignof(int64_t) == 8 in 32-bit MSVC, but it doesn't actually bother to ensure that for locals on the stack so that's not really the minimum alignment necessary for any int64_t object. If you use alignas(8) int64_t tmp;, you get extra instructions to align the stack pointer which you don't get with just int64_t tmp. godbolt.org/z/lsuXAQ. The struct-packing rules are allowed to be more complicated than just padding to alignof(T) relative to the start of the struct, as @P.W's answer shows.

    – Peter Cordes
    Apr 30 at 20:13



















8














This is a matter of alignment requirement of the data type as specified in
Padding and Alignment of Structure Members




Every data object has an alignment-requirement. The alignment-requirement for all data except structures, unions, and arrays is either the size of the object or the current packing size (specified with either /Zp or the pack pragma, whichever is less).




And the default value for structure member alignment is specified in /Zp (Struct Member Alignment)




The available packing values are described in the following table:



/Zp argument Effect

1 Packs structures on 1-byte boundaries. Same as /Zp.

2 Packs structures on 2-byte boundaries.

4 Packs structures on 4-byte boundaries.
8 Packs structures on 8-byte boundaries (default for x86, ARM, and ARM64).

16 Packs structures on 16-byte boundaries (default for x64).




Since the default for x86 is /Zp8 which is 8 bytes, the output is 16.



However, you can specify a different packing size with /Zp option.

Here is a Live Demo with /Zp4 which gives the output as 12 instead of 16.






share|improve this answer
































    -2














    A struct's alignment is the size of its largest member.



    That means if you have an 8-byte(64bit) member in the struct, then the struct will align to 8 bytes.



    In the case that you are describing, if the compiler allows the struct to align to 4 bytes, it possibly leads to an 8-byte member lying across the cache line boundary.






    Say we have a CPU that has a 16-byte cache line.
    Consider a struct like this:



    struct Z

    char s; // 1-4 byte
    __int64 i; // 5-12 byte
    __int64 i2; // 13-20 byte, need two cache line fetches to read this variable
    ;





    share|improve this answer


















    • 3





      Nope, a struct's alignment is the alignment of its most-aligned member. Not all C types have a size that matches their alignment, notably other composite types like a struct or array inside another struct, but primitive types aren't guaranteed to have alignof(T) == sizeof(T). On an ABI like i386 System V (32-bit x86 Linux), alignof(int64_t) == 4, so the OP would see their expected sizeof(struct)==12 and alignof(struct)==4.

      – Peter Cordes
      Apr 30 at 19:08












    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );






    Shen Yuan is a new contributor. Be nice, and check out our Code of Conduct.









    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55920103%2fwhy-is-the-alignment-the-same-on-32-bit-and-64-bit-systems%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    4














    Size and alignof() (minimum alignment that any object of that type must have) for each primitive type is an ABI1 design choice separate from the register width of the architecture.



    Struct-packing rules can also be more complicated than just aligning each struct member to its minimum alignment inside the struct; that's another part of the ABI.



    MSVC targeting 32-bit x86 gives __int64 a minimum alignment of 4, but its default struct-packing rules align types within structs to min(8, sizeof(T)) relative to the start of the struct. (For non-aggregate types only). That's not a direct quote, that's my paraphrase of the MSVC docs link from @P.W's answer, based on what MSVC seems to actually do. (I suspect the "whichever is less" in the text is supposed to be outside the parens, but maybe they're making a different point about the interaction on the pragma and the command-line option?)



    (An 8-byte struct containing a char[8] still only gets 1-byte alignment inside another struct, or a struct containing an alignas(16) member still gets 16-byte alignment inside another struct.)



    Note that ISO C++ doesn't guarantee that primitive types have alignof(T) == sizeof(T). Also note that MSVC's definition of alignof() doesn't match the ISO C++ standard: MSVC says alignof(__int64) == 8, but some __int64 objects have less than that alignment2.




    So surprisingly, we get extra padding even though MSVC doesn't always bother to make sure the struct itself has any more than 4-byte alignment, unless you specify that with alignas() on the variable, or on a struct member to imply that for the type. (e.g. a local struct Z tmp on the stack inside a function will only have 4-byte alignment, because MSVC doesn't use extra instructions like and esp, -8 to round the stack pointer down to an 8-byte boundary.)



    However, new / malloc does give you 8-byte-aligned memory in 32-bit mode, so this makes a lot of sense for dynamically-allocated objects (which are common). Forcing locals on the stack to be fully aligned would add cost to align the stack pointer, but by setting struct layout to take advantage of 8-byte-aligned storage, we get the advantage for static and dynamic storage.




    This might also be designed to get 32 and 64-bit code to agree on some struct layouts for shared memory. (But note that the default for x86-64 is min(16, sizeof(T)), so they still don't fully agree on struct layout if there are any 16-byte types that aren't aggregates (struct/union/array) and don't have an alignas.)




    The minimum absolute alignment of 4 comes from the 4-byte stack alignment that 32-bit code can assume. In static storage, compilers will choose natural alignment up to maybe 8 or 16 bytes for vars outside of structs, for efficient copying with SSE2 vectors.



    In larger functions, MSVC may decide to align the stack by 8 for performance reasons, e.g. for double vars on the stack which actually can be manipulated with single instructions, or maybe also for int64_t with SSE2 vectors. See the Stack Alignment section in this 2006 article: Windows Data Alignment on IPF, x86, and x64. So in 32-bit code you can't depend on an int64_t* or double* being naturally aligned.



    (I'm not sure if MSVC will ever create even less aligned int64_t or double objects on its own. Certainly yes if you use #pragma pack 1 or -Zp1, but that changes the ABI. But otherwise probably not, unless you carve space for an int64_t out of a buffer manually and don't bother to align it. But assuming alignof(int64_t) is still 8, that would be C++ undefined behaviour.)



    If you use alignas(8) int64_t tmp, MSVC emits extra instructions to and esp, -8. If you don't, MSVC doesn't do anything special, so it's luck whether or not tmp ends up 8-byte aligned or not.




    Other designs are possible, for example the i386 System V ABI (used on most non-Windows OSes) has alignof(long long) = 4 but sizeof(long long) = 8. These choices



    Outside of structs (e.g. global vars or locals on the stack), modern compilers in 32-bit mode do choose to align int64_t to an 8-byte boundary for efficiency (so it can be loaded / copied with MMX or SSE2 64-bit loads, or x87 fild to do int64_t -> double conversion).



    This is one reason why modern version of the i386 System V ABI maintain 16-byte stack alignment: so 8-byte and 16-byte aligned local vars are possible.




    When the 32-bit Windows ABI was being designed, Pentium CPUs were at least on the horizon. Pentium has 64-bit wide data busses, so its FPU really can load a 64-bit double in a single cache access if it's 64-bit aligned.



    Or for fild / fistp, load/store a 64-bit integer when converting to/from double. Fun fact: naturally aligned accesses up to 64 bits are guaranteed atomic on x86, since Pentium: Why is integer assignment on a naturally aligned variable atomic on x86?




    Footnote 1: An ABI also includes a calling convention, or in the case of MS Windows, a choice of various calling conventions which you can declare with function attributes like __fastcall), but the sizes and alignment-requirements for primitive types like long long are also something that compilers have to agree on to make functions that can call each other. (The ISO C++ standard only talks about a single "C++ implementation"; ABI standards are how "C++ implementations" make themselves compatible with each other.)



    Note that struct-layout rules are also part of the ABI: compilers have to agree with each other on struct layout to create compatible binaries that pass around structs or pointers to structs. Otherwise s.x = 10; foo(&x); might write to a different offset relative to the base of the struct than separately-compiled foo() (maybe in a DLL) was expecting to read it at.




    Footnote 2:



    GCC had this C++ alignof() bug, too, until it was fixed in 2018 for g++8 some time after being fixed for C11 _Alignof(). See that bug report for some discussion based on quotes from the standard which conclude that alignof(T) should really report the minimum guaranteed alignment you can ever see, not the preferred alignment you want for performance. i.e. that using an int64_t* with less than alignof(int64_t) alignment is undefined behaviour.



    (It will usually work fine on x86, but vectorization that assumes a whole number of int64_t iterations will reach a 16 or 32-byte alignment boundary can fault. See Why does unaligned access to mmap'ed memory sometimes segfault on AMD64? for an example with gcc.)



    The gcc bug report discusses the i386 System V ABI, which has different struct-packing rules than MSVC: based on minimum alignment, not preferred. But modern i386 System V maintains 16-byte stack alignment, so it's only inside structs (because of struct-packing rules that are part of the ABI) that the compiler ever creates int64_t and double objects that are less than naturally aligned. Anyway, that's why the GCC bug report was discussing struct members as the special case.



    Kind of the opposite from 32-bit Windows with MSVC where the struct-packing rules are compatible with an alignof(int64_t) == 8 but locals on the stack are always potentially under-aligned unless you use alignas() to specifically request alignment.



    32-bit MSVC has the bizarre behaviour that alignas(int64_t) int64_t tmp is not the same as int64_t tmp;, and emits extra instructions to align the stack. That's because alignas(int64_t) is like alignas(8), which is more aligned than the actual minimum.



    void extfunc(int64_t *);

    void foo_align8(void)
    alignas(int64_t) int64_t tmp;
    extfunc(&tmp);



    (32-bit) x86 MSVC 19.20 -O2 compiles it like so (on Godbolt, also includes 32-bit GCC and the struct test-case):



    _tmp$ = -8 ; size = 8
    void foo_align8(void) PROC ; foo_align8, COMDAT
    push ebp
    mov ebp, esp
    and esp, -8 ; fffffff8H align the stack
    sub esp, 8 ; and reserve 8 bytes
    lea eax, DWORD PTR _tmp$[esp+8] ; get a pointer to those 8 bytes
    push eax ; pass the pointer as an arg
    call void extfunc(__int64 *) ; extfunc
    add esp, 4
    mov esp, ebp
    pop ebp
    ret 0


    But without the alignas(), or with alignas(4), we get the much simpler



    _tmp$ = -8 ; size = 8
    void foo_noalign(void) PROC ; foo_noalign, COMDAT
    sub esp, 8 ; reserve 8 bytes
    lea eax, DWORD PTR _tmp$[esp+8] ; "calculate" a pointer to it
    push eax ; pass the pointer as a function arg
    call void extfunc(__int64 *) ; extfunc
    add esp, 12 ; 0000000cH
    ret 0


    It could just push esp instead of LEA/push; that's a minor missed optimization.



    Passing a pointer to a non-inline function proves that it's not just locally bending the rules. Some other function that just gets an int64_t* as an arg has to deal with this potentially under-aligned pointer, without having gotten any information about where it came from.



    If alignof(int64_t) was really 8, that function could be hand-written in asm in a way that faulted on misaligned pointers. Or it could be written in C with SSE2 intrinsics like _mm_load_si128() that require 16-byte alignment, after handling 0 or 1 elements to reach an alignment boundary.



    But with MSVC's actual behaviour, it's possible that none of the int64_t array elements are aligned by 16, because they all span an 8-byte boundary.




    BTW, I wouldn't recommend using compiler-specific types like __int64 directly. You can write portable code by using int64_t from <cstdint>, aka <stdint.h>.



    In MSVC, int64_t will be the same type as __int64.



    On other platforms, it will typically be long or long long. int64_t is guaranteed to be exactly 64 bits with no padding, and 2's complement, if provided at all. (It is by all sane compilers targeting normal CPUs. C99 and C++ require long long to be at least 64-bit, and on machines with 8-bit bytes and registers that are a power of 2, long long is normally exactly 64 bits and can be used as int64_t. Or if long is a 64-bit type, then <cstdint> might use that as the typedef.)



    I assume __int64 and long long are the same type in MSVC, but MSVC doesn't enforce strict-aliasing anyway so it doesn't matter whether they're the exact same type or not, just that they use the same representation.






    share|improve this answer























    • OK, I must admit I had never suspected I could get an answer like this which covered so many aspects. Thanks!

      – Shen Yuan
      2 days ago






    • 1





      @ShenYuan: I was surprised how non-simple a sufficient explanation turned out to be. MSVC's struct-packing rules (using preferred alignment instead of actual minimum alignment) are a lot different from what I'm familiar with (i386 System V), so the sizeof(Z) == 16 in your question got me curious. I knew 32-bit Windows only keeps the stack 4-byte aligned, so there was a real mystery to solve in terms of when it does do extra stack alignment for locals and what the actual minimum alignments were.

      – Peter Cordes
      2 days ago
















    4














    Size and alignof() (minimum alignment that any object of that type must have) for each primitive type is an ABI1 design choice separate from the register width of the architecture.



    Struct-packing rules can also be more complicated than just aligning each struct member to its minimum alignment inside the struct; that's another part of the ABI.



    MSVC targeting 32-bit x86 gives __int64 a minimum alignment of 4, but its default struct-packing rules align types within structs to min(8, sizeof(T)) relative to the start of the struct. (For non-aggregate types only). That's not a direct quote, that's my paraphrase of the MSVC docs link from @P.W's answer, based on what MSVC seems to actually do. (I suspect the "whichever is less" in the text is supposed to be outside the parens, but maybe they're making a different point about the interaction on the pragma and the command-line option?)



    (An 8-byte struct containing a char[8] still only gets 1-byte alignment inside another struct, or a struct containing an alignas(16) member still gets 16-byte alignment inside another struct.)



    Note that ISO C++ doesn't guarantee that primitive types have alignof(T) == sizeof(T). Also note that MSVC's definition of alignof() doesn't match the ISO C++ standard: MSVC says alignof(__int64) == 8, but some __int64 objects have less than that alignment2.




    So surprisingly, we get extra padding even though MSVC doesn't always bother to make sure the struct itself has any more than 4-byte alignment, unless you specify that with alignas() on the variable, or on a struct member to imply that for the type. (e.g. a local struct Z tmp on the stack inside a function will only have 4-byte alignment, because MSVC doesn't use extra instructions like and esp, -8 to round the stack pointer down to an 8-byte boundary.)



    However, new / malloc does give you 8-byte-aligned memory in 32-bit mode, so this makes a lot of sense for dynamically-allocated objects (which are common). Forcing locals on the stack to be fully aligned would add cost to align the stack pointer, but by setting struct layout to take advantage of 8-byte-aligned storage, we get the advantage for static and dynamic storage.




    This might also be designed to get 32 and 64-bit code to agree on some struct layouts for shared memory. (But note that the default for x86-64 is min(16, sizeof(T)), so they still don't fully agree on struct layout if there are any 16-byte types that aren't aggregates (struct/union/array) and don't have an alignas.)




    The minimum absolute alignment of 4 comes from the 4-byte stack alignment that 32-bit code can assume. In static storage, compilers will choose natural alignment up to maybe 8 or 16 bytes for vars outside of structs, for efficient copying with SSE2 vectors.



    In larger functions, MSVC may decide to align the stack by 8 for performance reasons, e.g. for double vars on the stack which actually can be manipulated with single instructions, or maybe also for int64_t with SSE2 vectors. See the Stack Alignment section in this 2006 article: Windows Data Alignment on IPF, x86, and x64. So in 32-bit code you can't depend on an int64_t* or double* being naturally aligned.



    (I'm not sure if MSVC will ever create even less aligned int64_t or double objects on its own. Certainly yes if you use #pragma pack 1 or -Zp1, but that changes the ABI. But otherwise probably not, unless you carve space for an int64_t out of a buffer manually and don't bother to align it. But assuming alignof(int64_t) is still 8, that would be C++ undefined behaviour.)



    If you use alignas(8) int64_t tmp, MSVC emits extra instructions to and esp, -8. If you don't, MSVC doesn't do anything special, so it's luck whether or not tmp ends up 8-byte aligned or not.




    Other designs are possible, for example the i386 System V ABI (used on most non-Windows OSes) has alignof(long long) = 4 but sizeof(long long) = 8. These choices



    Outside of structs (e.g. global vars or locals on the stack), modern compilers in 32-bit mode do choose to align int64_t to an 8-byte boundary for efficiency (so it can be loaded / copied with MMX or SSE2 64-bit loads, or x87 fild to do int64_t -> double conversion).



    This is one reason why modern version of the i386 System V ABI maintain 16-byte stack alignment: so 8-byte and 16-byte aligned local vars are possible.




    When the 32-bit Windows ABI was being designed, Pentium CPUs were at least on the horizon. Pentium has 64-bit wide data busses, so its FPU really can load a 64-bit double in a single cache access if it's 64-bit aligned.



    Or for fild / fistp, load/store a 64-bit integer when converting to/from double. Fun fact: naturally aligned accesses up to 64 bits are guaranteed atomic on x86, since Pentium: Why is integer assignment on a naturally aligned variable atomic on x86?




    Footnote 1: An ABI also includes a calling convention, or in the case of MS Windows, a choice of various calling conventions which you can declare with function attributes like __fastcall), but the sizes and alignment-requirements for primitive types like long long are also something that compilers have to agree on to make functions that can call each other. (The ISO C++ standard only talks about a single "C++ implementation"; ABI standards are how "C++ implementations" make themselves compatible with each other.)



    Note that struct-layout rules are also part of the ABI: compilers have to agree with each other on struct layout to create compatible binaries that pass around structs or pointers to structs. Otherwise s.x = 10; foo(&x); might write to a different offset relative to the base of the struct than separately-compiled foo() (maybe in a DLL) was expecting to read it at.




    Footnote 2:



    GCC had this C++ alignof() bug, too, until it was fixed in 2018 for g++8 some time after being fixed for C11 _Alignof(). See that bug report for some discussion based on quotes from the standard which conclude that alignof(T) should really report the minimum guaranteed alignment you can ever see, not the preferred alignment you want for performance. i.e. that using an int64_t* with less than alignof(int64_t) alignment is undefined behaviour.



    (It will usually work fine on x86, but vectorization that assumes a whole number of int64_t iterations will reach a 16 or 32-byte alignment boundary can fault. See Why does unaligned access to mmap'ed memory sometimes segfault on AMD64? for an example with gcc.)



    The gcc bug report discusses the i386 System V ABI, which has different struct-packing rules than MSVC: based on minimum alignment, not preferred. But modern i386 System V maintains 16-byte stack alignment, so it's only inside structs (because of struct-packing rules that are part of the ABI) that the compiler ever creates int64_t and double objects that are less than naturally aligned. Anyway, that's why the GCC bug report was discussing struct members as the special case.



    Kind of the opposite from 32-bit Windows with MSVC where the struct-packing rules are compatible with an alignof(int64_t) == 8 but locals on the stack are always potentially under-aligned unless you use alignas() to specifically request alignment.



    32-bit MSVC has the bizarre behaviour that alignas(int64_t) int64_t tmp is not the same as int64_t tmp;, and emits extra instructions to align the stack. That's because alignas(int64_t) is like alignas(8), which is more aligned than the actual minimum.



    void extfunc(int64_t *);

    void foo_align8(void)
    alignas(int64_t) int64_t tmp;
    extfunc(&tmp);



    (32-bit) x86 MSVC 19.20 -O2 compiles it like so (on Godbolt, also includes 32-bit GCC and the struct test-case):



    _tmp$ = -8 ; size = 8
    void foo_align8(void) PROC ; foo_align8, COMDAT
    push ebp
    mov ebp, esp
    and esp, -8 ; fffffff8H align the stack
    sub esp, 8 ; and reserve 8 bytes
    lea eax, DWORD PTR _tmp$[esp+8] ; get a pointer to those 8 bytes
    push eax ; pass the pointer as an arg
    call void extfunc(__int64 *) ; extfunc
    add esp, 4
    mov esp, ebp
    pop ebp
    ret 0


    But without the alignas(), or with alignas(4), we get the much simpler



    _tmp$ = -8 ; size = 8
    void foo_noalign(void) PROC ; foo_noalign, COMDAT
    sub esp, 8 ; reserve 8 bytes
    lea eax, DWORD PTR _tmp$[esp+8] ; "calculate" a pointer to it
    push eax ; pass the pointer as a function arg
    call void extfunc(__int64 *) ; extfunc
    add esp, 12 ; 0000000cH
    ret 0


    It could just push esp instead of LEA/push; that's a minor missed optimization.



    Passing a pointer to a non-inline function proves that it's not just locally bending the rules. Some other function that just gets an int64_t* as an arg has to deal with this potentially under-aligned pointer, without having gotten any information about where it came from.



    If alignof(int64_t) was really 8, that function could be hand-written in asm in a way that faulted on misaligned pointers. Or it could be written in C with SSE2 intrinsics like _mm_load_si128() that require 16-byte alignment, after handling 0 or 1 elements to reach an alignment boundary.



    But with MSVC's actual behaviour, it's possible that none of the int64_t array elements are aligned by 16, because they all span an 8-byte boundary.




    BTW, I wouldn't recommend using compiler-specific types like __int64 directly. You can write portable code by using int64_t from <cstdint>, aka <stdint.h>.



    In MSVC, int64_t will be the same type as __int64.



    On other platforms, it will typically be long or long long. int64_t is guaranteed to be exactly 64 bits with no padding, and 2's complement, if provided at all. (It is by all sane compilers targeting normal CPUs. C99 and C++ require long long to be at least 64-bit, and on machines with 8-bit bytes and registers that are a power of 2, long long is normally exactly 64 bits and can be used as int64_t. Or if long is a 64-bit type, then <cstdint> might use that as the typedef.)



    I assume __int64 and long long are the same type in MSVC, but MSVC doesn't enforce strict-aliasing anyway so it doesn't matter whether they're the exact same type or not, just that they use the same representation.






    share|improve this answer























    • OK, I must admit I had never suspected I could get an answer like this which covered so many aspects. Thanks!

      – Shen Yuan
      2 days ago






    • 1





      @ShenYuan: I was surprised how non-simple a sufficient explanation turned out to be. MSVC's struct-packing rules (using preferred alignment instead of actual minimum alignment) are a lot different from what I'm familiar with (i386 System V), so the sizeof(Z) == 16 in your question got me curious. I knew 32-bit Windows only keeps the stack 4-byte aligned, so there was a real mystery to solve in terms of when it does do extra stack alignment for locals and what the actual minimum alignments were.

      – Peter Cordes
      2 days ago














    4












    4








    4







    Size and alignof() (minimum alignment that any object of that type must have) for each primitive type is an ABI1 design choice separate from the register width of the architecture.



    Struct-packing rules can also be more complicated than just aligning each struct member to its minimum alignment inside the struct; that's another part of the ABI.



    MSVC targeting 32-bit x86 gives __int64 a minimum alignment of 4, but its default struct-packing rules align types within structs to min(8, sizeof(T)) relative to the start of the struct. (For non-aggregate types only). That's not a direct quote, that's my paraphrase of the MSVC docs link from @P.W's answer, based on what MSVC seems to actually do. (I suspect the "whichever is less" in the text is supposed to be outside the parens, but maybe they're making a different point about the interaction on the pragma and the command-line option?)



    (An 8-byte struct containing a char[8] still only gets 1-byte alignment inside another struct, or a struct containing an alignas(16) member still gets 16-byte alignment inside another struct.)



    Note that ISO C++ doesn't guarantee that primitive types have alignof(T) == sizeof(T). Also note that MSVC's definition of alignof() doesn't match the ISO C++ standard: MSVC says alignof(__int64) == 8, but some __int64 objects have less than that alignment2.




    So surprisingly, we get extra padding even though MSVC doesn't always bother to make sure the struct itself has any more than 4-byte alignment, unless you specify that with alignas() on the variable, or on a struct member to imply that for the type. (e.g. a local struct Z tmp on the stack inside a function will only have 4-byte alignment, because MSVC doesn't use extra instructions like and esp, -8 to round the stack pointer down to an 8-byte boundary.)



    However, new / malloc does give you 8-byte-aligned memory in 32-bit mode, so this makes a lot of sense for dynamically-allocated objects (which are common). Forcing locals on the stack to be fully aligned would add cost to align the stack pointer, but by setting struct layout to take advantage of 8-byte-aligned storage, we get the advantage for static and dynamic storage.




    This might also be designed to get 32 and 64-bit code to agree on some struct layouts for shared memory. (But note that the default for x86-64 is min(16, sizeof(T)), so they still don't fully agree on struct layout if there are any 16-byte types that aren't aggregates (struct/union/array) and don't have an alignas.)




    The minimum absolute alignment of 4 comes from the 4-byte stack alignment that 32-bit code can assume. In static storage, compilers will choose natural alignment up to maybe 8 or 16 bytes for vars outside of structs, for efficient copying with SSE2 vectors.



    In larger functions, MSVC may decide to align the stack by 8 for performance reasons, e.g. for double vars on the stack which actually can be manipulated with single instructions, or maybe also for int64_t with SSE2 vectors. See the Stack Alignment section in this 2006 article: Windows Data Alignment on IPF, x86, and x64. So in 32-bit code you can't depend on an int64_t* or double* being naturally aligned.



    (I'm not sure if MSVC will ever create even less aligned int64_t or double objects on its own. Certainly yes if you use #pragma pack 1 or -Zp1, but that changes the ABI. But otherwise probably not, unless you carve space for an int64_t out of a buffer manually and don't bother to align it. But assuming alignof(int64_t) is still 8, that would be C++ undefined behaviour.)



    If you use alignas(8) int64_t tmp, MSVC emits extra instructions to and esp, -8. If you don't, MSVC doesn't do anything special, so it's luck whether or not tmp ends up 8-byte aligned or not.




    Other designs are possible, for example the i386 System V ABI (used on most non-Windows OSes) has alignof(long long) = 4 but sizeof(long long) = 8. These choices



    Outside of structs (e.g. global vars or locals on the stack), modern compilers in 32-bit mode do choose to align int64_t to an 8-byte boundary for efficiency (so it can be loaded / copied with MMX or SSE2 64-bit loads, or x87 fild to do int64_t -> double conversion).



    This is one reason why modern version of the i386 System V ABI maintain 16-byte stack alignment: so 8-byte and 16-byte aligned local vars are possible.




    When the 32-bit Windows ABI was being designed, Pentium CPUs were at least on the horizon. Pentium has 64-bit wide data busses, so its FPU really can load a 64-bit double in a single cache access if it's 64-bit aligned.



    Or for fild / fistp, load/store a 64-bit integer when converting to/from double. Fun fact: naturally aligned accesses up to 64 bits are guaranteed atomic on x86, since Pentium: Why is integer assignment on a naturally aligned variable atomic on x86?




    Footnote 1: An ABI also includes a calling convention, or in the case of MS Windows, a choice of various calling conventions which you can declare with function attributes like __fastcall), but the sizes and alignment-requirements for primitive types like long long are also something that compilers have to agree on to make functions that can call each other. (The ISO C++ standard only talks about a single "C++ implementation"; ABI standards are how "C++ implementations" make themselves compatible with each other.)



    Note that struct-layout rules are also part of the ABI: compilers have to agree with each other on struct layout to create compatible binaries that pass around structs or pointers to structs. Otherwise s.x = 10; foo(&x); might write to a different offset relative to the base of the struct than separately-compiled foo() (maybe in a DLL) was expecting to read it at.




    Footnote 2:



    GCC had this C++ alignof() bug, too, until it was fixed in 2018 for g++8 some time after being fixed for C11 _Alignof(). See that bug report for some discussion based on quotes from the standard which conclude that alignof(T) should really report the minimum guaranteed alignment you can ever see, not the preferred alignment you want for performance. i.e. that using an int64_t* with less than alignof(int64_t) alignment is undefined behaviour.



    (It will usually work fine on x86, but vectorization that assumes a whole number of int64_t iterations will reach a 16 or 32-byte alignment boundary can fault. See Why does unaligned access to mmap'ed memory sometimes segfault on AMD64? for an example with gcc.)



    The gcc bug report discusses the i386 System V ABI, which has different struct-packing rules than MSVC: based on minimum alignment, not preferred. But modern i386 System V maintains 16-byte stack alignment, so it's only inside structs (because of struct-packing rules that are part of the ABI) that the compiler ever creates int64_t and double objects that are less than naturally aligned. Anyway, that's why the GCC bug report was discussing struct members as the special case.



    Kind of the opposite from 32-bit Windows with MSVC where the struct-packing rules are compatible with an alignof(int64_t) == 8 but locals on the stack are always potentially under-aligned unless you use alignas() to specifically request alignment.



    32-bit MSVC has the bizarre behaviour that alignas(int64_t) int64_t tmp is not the same as int64_t tmp;, and emits extra instructions to align the stack. That's because alignas(int64_t) is like alignas(8), which is more aligned than the actual minimum.



    void extfunc(int64_t *);

    void foo_align8(void)
    alignas(int64_t) int64_t tmp;
    extfunc(&tmp);



    (32-bit) x86 MSVC 19.20 -O2 compiles it like so (on Godbolt, also includes 32-bit GCC and the struct test-case):



    _tmp$ = -8 ; size = 8
    void foo_align8(void) PROC ; foo_align8, COMDAT
    push ebp
    mov ebp, esp
    and esp, -8 ; fffffff8H align the stack
    sub esp, 8 ; and reserve 8 bytes
    lea eax, DWORD PTR _tmp$[esp+8] ; get a pointer to those 8 bytes
    push eax ; pass the pointer as an arg
    call void extfunc(__int64 *) ; extfunc
    add esp, 4
    mov esp, ebp
    pop ebp
    ret 0


    But without the alignas(), or with alignas(4), we get the much simpler



    _tmp$ = -8 ; size = 8
    void foo_noalign(void) PROC ; foo_noalign, COMDAT
    sub esp, 8 ; reserve 8 bytes
    lea eax, DWORD PTR _tmp$[esp+8] ; "calculate" a pointer to it
    push eax ; pass the pointer as a function arg
    call void extfunc(__int64 *) ; extfunc
    add esp, 12 ; 0000000cH
    ret 0


    It could just push esp instead of LEA/push; that's a minor missed optimization.



    Passing a pointer to a non-inline function proves that it's not just locally bending the rules. Some other function that just gets an int64_t* as an arg has to deal with this potentially under-aligned pointer, without having gotten any information about where it came from.



    If alignof(int64_t) was really 8, that function could be hand-written in asm in a way that faulted on misaligned pointers. Or it could be written in C with SSE2 intrinsics like _mm_load_si128() that require 16-byte alignment, after handling 0 or 1 elements to reach an alignment boundary.



    But with MSVC's actual behaviour, it's possible that none of the int64_t array elements are aligned by 16, because they all span an 8-byte boundary.




    BTW, I wouldn't recommend using compiler-specific types like __int64 directly. You can write portable code by using int64_t from <cstdint>, aka <stdint.h>.



    In MSVC, int64_t will be the same type as __int64.



    On other platforms, it will typically be long or long long. int64_t is guaranteed to be exactly 64 bits with no padding, and 2's complement, if provided at all. (It is by all sane compilers targeting normal CPUs. C99 and C++ require long long to be at least 64-bit, and on machines with 8-bit bytes and registers that are a power of 2, long long is normally exactly 64 bits and can be used as int64_t. Or if long is a 64-bit type, then <cstdint> might use that as the typedef.)



    I assume __int64 and long long are the same type in MSVC, but MSVC doesn't enforce strict-aliasing anyway so it doesn't matter whether they're the exact same type or not, just that they use the same representation.






    share|improve this answer













    Size and alignof() (minimum alignment that any object of that type must have) for each primitive type is an ABI1 design choice separate from the register width of the architecture.



    Struct-packing rules can also be more complicated than just aligning each struct member to its minimum alignment inside the struct; that's another part of the ABI.



    MSVC targeting 32-bit x86 gives __int64 a minimum alignment of 4, but its default struct-packing rules align types within structs to min(8, sizeof(T)) relative to the start of the struct. (For non-aggregate types only). That's not a direct quote, that's my paraphrase of the MSVC docs link from @P.W's answer, based on what MSVC seems to actually do. (I suspect the "whichever is less" in the text is supposed to be outside the parens, but maybe they're making a different point about the interaction on the pragma and the command-line option?)



    (An 8-byte struct containing a char[8] still only gets 1-byte alignment inside another struct, or a struct containing an alignas(16) member still gets 16-byte alignment inside another struct.)



    Note that ISO C++ doesn't guarantee that primitive types have alignof(T) == sizeof(T). Also note that MSVC's definition of alignof() doesn't match the ISO C++ standard: MSVC says alignof(__int64) == 8, but some __int64 objects have less than that alignment2.




    So surprisingly, we get extra padding even though MSVC doesn't always bother to make sure the struct itself has any more than 4-byte alignment, unless you specify that with alignas() on the variable, or on a struct member to imply that for the type. (e.g. a local struct Z tmp on the stack inside a function will only have 4-byte alignment, because MSVC doesn't use extra instructions like and esp, -8 to round the stack pointer down to an 8-byte boundary.)



    However, new / malloc does give you 8-byte-aligned memory in 32-bit mode, so this makes a lot of sense for dynamically-allocated objects (which are common). Forcing locals on the stack to be fully aligned would add cost to align the stack pointer, but by setting struct layout to take advantage of 8-byte-aligned storage, we get the advantage for static and dynamic storage.




    This might also be designed to get 32 and 64-bit code to agree on some struct layouts for shared memory. (But note that the default for x86-64 is min(16, sizeof(T)), so they still don't fully agree on struct layout if there are any 16-byte types that aren't aggregates (struct/union/array) and don't have an alignas.)




    The minimum absolute alignment of 4 comes from the 4-byte stack alignment that 32-bit code can assume. In static storage, compilers will choose natural alignment up to maybe 8 or 16 bytes for vars outside of structs, for efficient copying with SSE2 vectors.



    In larger functions, MSVC may decide to align the stack by 8 for performance reasons, e.g. for double vars on the stack which actually can be manipulated with single instructions, or maybe also for int64_t with SSE2 vectors. See the Stack Alignment section in this 2006 article: Windows Data Alignment on IPF, x86, and x64. So in 32-bit code you can't depend on an int64_t* or double* being naturally aligned.



    (I'm not sure if MSVC will ever create even less aligned int64_t or double objects on its own. Certainly yes if you use #pragma pack 1 or -Zp1, but that changes the ABI. But otherwise probably not, unless you carve space for an int64_t out of a buffer manually and don't bother to align it. But assuming alignof(int64_t) is still 8, that would be C++ undefined behaviour.)



    If you use alignas(8) int64_t tmp, MSVC emits extra instructions to and esp, -8. If you don't, MSVC doesn't do anything special, so it's luck whether or not tmp ends up 8-byte aligned or not.




    Other designs are possible, for example the i386 System V ABI (used on most non-Windows OSes) has alignof(long long) = 4 but sizeof(long long) = 8. These choices



    Outside of structs (e.g. global vars or locals on the stack), modern compilers in 32-bit mode do choose to align int64_t to an 8-byte boundary for efficiency (so it can be loaded / copied with MMX or SSE2 64-bit loads, or x87 fild to do int64_t -> double conversion).



    This is one reason why modern version of the i386 System V ABI maintain 16-byte stack alignment: so 8-byte and 16-byte aligned local vars are possible.




    When the 32-bit Windows ABI was being designed, Pentium CPUs were at least on the horizon. Pentium has 64-bit wide data busses, so its FPU really can load a 64-bit double in a single cache access if it's 64-bit aligned.



    Or for fild / fistp, load/store a 64-bit integer when converting to/from double. Fun fact: naturally aligned accesses up to 64 bits are guaranteed atomic on x86, since Pentium: Why is integer assignment on a naturally aligned variable atomic on x86?




    Footnote 1: An ABI also includes a calling convention, or in the case of MS Windows, a choice of various calling conventions which you can declare with function attributes like __fastcall), but the sizes and alignment-requirements for primitive types like long long are also something that compilers have to agree on to make functions that can call each other. (The ISO C++ standard only talks about a single "C++ implementation"; ABI standards are how "C++ implementations" make themselves compatible with each other.)



    Note that struct-layout rules are also part of the ABI: compilers have to agree with each other on struct layout to create compatible binaries that pass around structs or pointers to structs. Otherwise s.x = 10; foo(&x); might write to a different offset relative to the base of the struct than separately-compiled foo() (maybe in a DLL) was expecting to read it at.




    Footnote 2:



    GCC had this C++ alignof() bug, too, until it was fixed in 2018 for g++8 some time after being fixed for C11 _Alignof(). See that bug report for some discussion based on quotes from the standard which conclude that alignof(T) should really report the minimum guaranteed alignment you can ever see, not the preferred alignment you want for performance. i.e. that using an int64_t* with less than alignof(int64_t) alignment is undefined behaviour.



    (It will usually work fine on x86, but vectorization that assumes a whole number of int64_t iterations will reach a 16 or 32-byte alignment boundary can fault. See Why does unaligned access to mmap'ed memory sometimes segfault on AMD64? for an example with gcc.)



    The gcc bug report discusses the i386 System V ABI, which has different struct-packing rules than MSVC: based on minimum alignment, not preferred. But modern i386 System V maintains 16-byte stack alignment, so it's only inside structs (because of struct-packing rules that are part of the ABI) that the compiler ever creates int64_t and double objects that are less than naturally aligned. Anyway, that's why the GCC bug report was discussing struct members as the special case.



    Kind of the opposite from 32-bit Windows with MSVC where the struct-packing rules are compatible with an alignof(int64_t) == 8 but locals on the stack are always potentially under-aligned unless you use alignas() to specifically request alignment.



    32-bit MSVC has the bizarre behaviour that alignas(int64_t) int64_t tmp is not the same as int64_t tmp;, and emits extra instructions to align the stack. That's because alignas(int64_t) is like alignas(8), which is more aligned than the actual minimum.



    void extfunc(int64_t *);

    void foo_align8(void)
    alignas(int64_t) int64_t tmp;
    extfunc(&tmp);



    (32-bit) x86 MSVC 19.20 -O2 compiles it like so (on Godbolt, also includes 32-bit GCC and the struct test-case):



    _tmp$ = -8 ; size = 8
    void foo_align8(void) PROC ; foo_align8, COMDAT
    push ebp
    mov ebp, esp
    and esp, -8 ; fffffff8H align the stack
    sub esp, 8 ; and reserve 8 bytes
    lea eax, DWORD PTR _tmp$[esp+8] ; get a pointer to those 8 bytes
    push eax ; pass the pointer as an arg
    call void extfunc(__int64 *) ; extfunc
    add esp, 4
    mov esp, ebp
    pop ebp
    ret 0


    But without the alignas(), or with alignas(4), we get the much simpler



    _tmp$ = -8 ; size = 8
    void foo_noalign(void) PROC ; foo_noalign, COMDAT
    sub esp, 8 ; reserve 8 bytes
    lea eax, DWORD PTR _tmp$[esp+8] ; "calculate" a pointer to it
    push eax ; pass the pointer as a function arg
    call void extfunc(__int64 *) ; extfunc
    add esp, 12 ; 0000000cH
    ret 0


    It could just push esp instead of LEA/push; that's a minor missed optimization.



    Passing a pointer to a non-inline function proves that it's not just locally bending the rules. Some other function that just gets an int64_t* as an arg has to deal with this potentially under-aligned pointer, without having gotten any information about where it came from.



    If alignof(int64_t) was really 8, that function could be hand-written in asm in a way that faulted on misaligned pointers. Or it could be written in C with SSE2 intrinsics like _mm_load_si128() that require 16-byte alignment, after handling 0 or 1 elements to reach an alignment boundary.



    But with MSVC's actual behaviour, it's possible that none of the int64_t array elements are aligned by 16, because they all span an 8-byte boundary.




    BTW, I wouldn't recommend using compiler-specific types like __int64 directly. You can write portable code by using int64_t from <cstdint>, aka <stdint.h>.



    In MSVC, int64_t will be the same type as __int64.



    On other platforms, it will typically be long or long long. int64_t is guaranteed to be exactly 64 bits with no padding, and 2's complement, if provided at all. (It is by all sane compilers targeting normal CPUs. C99 and C++ require long long to be at least 64-bit, and on machines with 8-bit bytes and registers that are a power of 2, long long is normally exactly 64 bits and can be used as int64_t. Or if long is a 64-bit type, then <cstdint> might use that as the typedef.)



    I assume __int64 and long long are the same type in MSVC, but MSVC doesn't enforce strict-aliasing anyway so it doesn't matter whether they're the exact same type or not, just that they use the same representation.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered 2 days ago









    Peter CordesPeter Cordes

    138k19210354




    138k19210354












    • OK, I must admit I had never suspected I could get an answer like this which covered so many aspects. Thanks!

      – Shen Yuan
      2 days ago






    • 1





      @ShenYuan: I was surprised how non-simple a sufficient explanation turned out to be. MSVC's struct-packing rules (using preferred alignment instead of actual minimum alignment) are a lot different from what I'm familiar with (i386 System V), so the sizeof(Z) == 16 in your question got me curious. I knew 32-bit Windows only keeps the stack 4-byte aligned, so there was a real mystery to solve in terms of when it does do extra stack alignment for locals and what the actual minimum alignments were.

      – Peter Cordes
      2 days ago


















    • OK, I must admit I had never suspected I could get an answer like this which covered so many aspects. Thanks!

      – Shen Yuan
      2 days ago






    • 1





      @ShenYuan: I was surprised how non-simple a sufficient explanation turned out to be. MSVC's struct-packing rules (using preferred alignment instead of actual minimum alignment) are a lot different from what I'm familiar with (i386 System V), so the sizeof(Z) == 16 in your question got me curious. I knew 32-bit Windows only keeps the stack 4-byte aligned, so there was a real mystery to solve in terms of when it does do extra stack alignment for locals and what the actual minimum alignments were.

      – Peter Cordes
      2 days ago

















    OK, I must admit I had never suspected I could get an answer like this which covered so many aspects. Thanks!

    – Shen Yuan
    2 days ago





    OK, I must admit I had never suspected I could get an answer like this which covered so many aspects. Thanks!

    – Shen Yuan
    2 days ago




    1




    1





    @ShenYuan: I was surprised how non-simple a sufficient explanation turned out to be. MSVC's struct-packing rules (using preferred alignment instead of actual minimum alignment) are a lot different from what I'm familiar with (i386 System V), so the sizeof(Z) == 16 in your question got me curious. I knew 32-bit Windows only keeps the stack 4-byte aligned, so there was a real mystery to solve in terms of when it does do extra stack alignment for locals and what the actual minimum alignments were.

    – Peter Cordes
    2 days ago






    @ShenYuan: I was surprised how non-simple a sufficient explanation turned out to be. MSVC's struct-packing rules (using preferred alignment instead of actual minimum alignment) are a lot different from what I'm familiar with (i386 System V), so the sizeof(Z) == 16 in your question got me curious. I knew 32-bit Windows only keeps the stack 4-byte aligned, so there was a real mystery to solve in terms of when it does do extra stack alignment for locals and what the actual minimum alignments were.

    – Peter Cordes
    2 days ago














    12














    The padding is not determined by the word size, but by the alignment of each data type.



    In most cases, the alignment requirement is equal to the type's size. So for a 64 bit type like int64 you will get an 8 byte (64 bit) alignment. Padding needs to be inserted into the struct to make sure that the storage for the type ends up at an address that is properly aligned.



    You may see a difference in padding between 32 bit and 64 bit when using built-in datatypes that have different sizes on both architectures, for instance pointer types (int*).






    share|improve this answer




















    • 2





      Default alignment is determined by wordsize however. Reason beeing that words are adressed in memory so that they fit into registers perfectly. On x86(_64) unaligned data requires a shift operation to work with it. On other paltforms like sun sparc unaligned data will cause a bus exception. If you want to remove padding try adding __attribute__((packed)) (GCC) to the struct definition.

      – Nefrin
      Apr 30 at 12:20






    • 1





      @Nefrin Do you have a reference for that? I am not aware of any such behavior for the C or C++ built-in datatypes.

      – ComicSansMS
      Apr 30 at 12:22






    • 2





      Also this is not C/C++ Language behaviour but rather compiler behaviour

      – Nefrin
      Apr 30 at 12:50






    • 2





      @Nefrin: x86-64 has efficient unaligned loads that handle the required shift in hardware. Yes it's normal for types wider than an integer register to only get aligned to the register width in some ABIs (like i386 System V, and 32-bit Windows). But for x86-64 System V, alignof(__int128) = 16 so it can be copied with SSE vectors, or for lock cmpxchg16b. But as far as the C++ standard is concerned, that's all up to the implementation. And the struct-packing rules are allowed to be different from what you'd expect based on just alignof(member), as P.W's answer shows is the case for MSVC.

      – Peter Cordes
      Apr 30 at 19:58







    • 2





      @ComicSansMS: alignof(int64_t) == 8 in 32-bit MSVC, but it doesn't actually bother to ensure that for locals on the stack so that's not really the minimum alignment necessary for any int64_t object. If you use alignas(8) int64_t tmp;, you get extra instructions to align the stack pointer which you don't get with just int64_t tmp. godbolt.org/z/lsuXAQ. The struct-packing rules are allowed to be more complicated than just padding to alignof(T) relative to the start of the struct, as @P.W's answer shows.

      – Peter Cordes
      Apr 30 at 20:13
















    12














    The padding is not determined by the word size, but by the alignment of each data type.



    In most cases, the alignment requirement is equal to the type's size. So for a 64 bit type like int64 you will get an 8 byte (64 bit) alignment. Padding needs to be inserted into the struct to make sure that the storage for the type ends up at an address that is properly aligned.



    You may see a difference in padding between 32 bit and 64 bit when using built-in datatypes that have different sizes on both architectures, for instance pointer types (int*).






    share|improve this answer




















    • 2





      Default alignment is determined by wordsize however. Reason beeing that words are adressed in memory so that they fit into registers perfectly. On x86(_64) unaligned data requires a shift operation to work with it. On other paltforms like sun sparc unaligned data will cause a bus exception. If you want to remove padding try adding __attribute__((packed)) (GCC) to the struct definition.

      – Nefrin
      Apr 30 at 12:20






    • 1





      @Nefrin Do you have a reference for that? I am not aware of any such behavior for the C or C++ built-in datatypes.

      – ComicSansMS
      Apr 30 at 12:22






    • 2





      Also this is not C/C++ Language behaviour but rather compiler behaviour

      – Nefrin
      Apr 30 at 12:50






    • 2





      @Nefrin: x86-64 has efficient unaligned loads that handle the required shift in hardware. Yes it's normal for types wider than an integer register to only get aligned to the register width in some ABIs (like i386 System V, and 32-bit Windows). But for x86-64 System V, alignof(__int128) = 16 so it can be copied with SSE vectors, or for lock cmpxchg16b. But as far as the C++ standard is concerned, that's all up to the implementation. And the struct-packing rules are allowed to be different from what you'd expect based on just alignof(member), as P.W's answer shows is the case for MSVC.

      – Peter Cordes
      Apr 30 at 19:58







    • 2





      @ComicSansMS: alignof(int64_t) == 8 in 32-bit MSVC, but it doesn't actually bother to ensure that for locals on the stack so that's not really the minimum alignment necessary for any int64_t object. If you use alignas(8) int64_t tmp;, you get extra instructions to align the stack pointer which you don't get with just int64_t tmp. godbolt.org/z/lsuXAQ. The struct-packing rules are allowed to be more complicated than just padding to alignof(T) relative to the start of the struct, as @P.W's answer shows.

      – Peter Cordes
      Apr 30 at 20:13














    12












    12








    12







    The padding is not determined by the word size, but by the alignment of each data type.



    In most cases, the alignment requirement is equal to the type's size. So for a 64 bit type like int64 you will get an 8 byte (64 bit) alignment. Padding needs to be inserted into the struct to make sure that the storage for the type ends up at an address that is properly aligned.



    You may see a difference in padding between 32 bit and 64 bit when using built-in datatypes that have different sizes on both architectures, for instance pointer types (int*).






    share|improve this answer















    The padding is not determined by the word size, but by the alignment of each data type.



    In most cases, the alignment requirement is equal to the type's size. So for a 64 bit type like int64 you will get an 8 byte (64 bit) alignment. Padding needs to be inserted into the struct to make sure that the storage for the type ends up at an address that is properly aligned.



    You may see a difference in padding between 32 bit and 64 bit when using built-in datatypes that have different sizes on both architectures, for instance pointer types (int*).







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Apr 30 at 11:49

























    answered Apr 30 at 11:43









    ComicSansMSComicSansMS

    33.9k691119




    33.9k691119







    • 2





      Default alignment is determined by wordsize however. Reason beeing that words are adressed in memory so that they fit into registers perfectly. On x86(_64) unaligned data requires a shift operation to work with it. On other paltforms like sun sparc unaligned data will cause a bus exception. If you want to remove padding try adding __attribute__((packed)) (GCC) to the struct definition.

      – Nefrin
      Apr 30 at 12:20






    • 1





      @Nefrin Do you have a reference for that? I am not aware of any such behavior for the C or C++ built-in datatypes.

      – ComicSansMS
      Apr 30 at 12:22






    • 2





      Also this is not C/C++ Language behaviour but rather compiler behaviour

      – Nefrin
      Apr 30 at 12:50






    • 2





      @Nefrin: x86-64 has efficient unaligned loads that handle the required shift in hardware. Yes it's normal for types wider than an integer register to only get aligned to the register width in some ABIs (like i386 System V, and 32-bit Windows). But for x86-64 System V, alignof(__int128) = 16 so it can be copied with SSE vectors, or for lock cmpxchg16b. But as far as the C++ standard is concerned, that's all up to the implementation. And the struct-packing rules are allowed to be different from what you'd expect based on just alignof(member), as P.W's answer shows is the case for MSVC.

      – Peter Cordes
      Apr 30 at 19:58







    • 2





      @ComicSansMS: alignof(int64_t) == 8 in 32-bit MSVC, but it doesn't actually bother to ensure that for locals on the stack so that's not really the minimum alignment necessary for any int64_t object. If you use alignas(8) int64_t tmp;, you get extra instructions to align the stack pointer which you don't get with just int64_t tmp. godbolt.org/z/lsuXAQ. The struct-packing rules are allowed to be more complicated than just padding to alignof(T) relative to the start of the struct, as @P.W's answer shows.

      – Peter Cordes
      Apr 30 at 20:13













    • 2





      Default alignment is determined by wordsize however. Reason beeing that words are adressed in memory so that they fit into registers perfectly. On x86(_64) unaligned data requires a shift operation to work with it. On other paltforms like sun sparc unaligned data will cause a bus exception. If you want to remove padding try adding __attribute__((packed)) (GCC) to the struct definition.

      – Nefrin
      Apr 30 at 12:20






    • 1





      @Nefrin Do you have a reference for that? I am not aware of any such behavior for the C or C++ built-in datatypes.

      – ComicSansMS
      Apr 30 at 12:22






    • 2





      Also this is not C/C++ Language behaviour but rather compiler behaviour

      – Nefrin
      Apr 30 at 12:50






    • 2





      @Nefrin: x86-64 has efficient unaligned loads that handle the required shift in hardware. Yes it's normal for types wider than an integer register to only get aligned to the register width in some ABIs (like i386 System V, and 32-bit Windows). But for x86-64 System V, alignof(__int128) = 16 so it can be copied with SSE vectors, or for lock cmpxchg16b. But as far as the C++ standard is concerned, that's all up to the implementation. And the struct-packing rules are allowed to be different from what you'd expect based on just alignof(member), as P.W's answer shows is the case for MSVC.

      – Peter Cordes
      Apr 30 at 19:58







    • 2





      @ComicSansMS: alignof(int64_t) == 8 in 32-bit MSVC, but it doesn't actually bother to ensure that for locals on the stack so that's not really the minimum alignment necessary for any int64_t object. If you use alignas(8) int64_t tmp;, you get extra instructions to align the stack pointer which you don't get with just int64_t tmp. godbolt.org/z/lsuXAQ. The struct-packing rules are allowed to be more complicated than just padding to alignof(T) relative to the start of the struct, as @P.W's answer shows.

      – Peter Cordes
      Apr 30 at 20:13








    2




    2





    Default alignment is determined by wordsize however. Reason beeing that words are adressed in memory so that they fit into registers perfectly. On x86(_64) unaligned data requires a shift operation to work with it. On other paltforms like sun sparc unaligned data will cause a bus exception. If you want to remove padding try adding __attribute__((packed)) (GCC) to the struct definition.

    – Nefrin
    Apr 30 at 12:20





    Default alignment is determined by wordsize however. Reason beeing that words are adressed in memory so that they fit into registers perfectly. On x86(_64) unaligned data requires a shift operation to work with it. On other paltforms like sun sparc unaligned data will cause a bus exception. If you want to remove padding try adding __attribute__((packed)) (GCC) to the struct definition.

    – Nefrin
    Apr 30 at 12:20




    1




    1





    @Nefrin Do you have a reference for that? I am not aware of any such behavior for the C or C++ built-in datatypes.

    – ComicSansMS
    Apr 30 at 12:22





    @Nefrin Do you have a reference for that? I am not aware of any such behavior for the C or C++ built-in datatypes.

    – ComicSansMS
    Apr 30 at 12:22




    2




    2





    Also this is not C/C++ Language behaviour but rather compiler behaviour

    – Nefrin
    Apr 30 at 12:50





    Also this is not C/C++ Language behaviour but rather compiler behaviour

    – Nefrin
    Apr 30 at 12:50




    2




    2





    @Nefrin: x86-64 has efficient unaligned loads that handle the required shift in hardware. Yes it's normal for types wider than an integer register to only get aligned to the register width in some ABIs (like i386 System V, and 32-bit Windows). But for x86-64 System V, alignof(__int128) = 16 so it can be copied with SSE vectors, or for lock cmpxchg16b. But as far as the C++ standard is concerned, that's all up to the implementation. And the struct-packing rules are allowed to be different from what you'd expect based on just alignof(member), as P.W's answer shows is the case for MSVC.

    – Peter Cordes
    Apr 30 at 19:58






    @Nefrin: x86-64 has efficient unaligned loads that handle the required shift in hardware. Yes it's normal for types wider than an integer register to only get aligned to the register width in some ABIs (like i386 System V, and 32-bit Windows). But for x86-64 System V, alignof(__int128) = 16 so it can be copied with SSE vectors, or for lock cmpxchg16b. But as far as the C++ standard is concerned, that's all up to the implementation. And the struct-packing rules are allowed to be different from what you'd expect based on just alignof(member), as P.W's answer shows is the case for MSVC.

    – Peter Cordes
    Apr 30 at 19:58





    2




    2





    @ComicSansMS: alignof(int64_t) == 8 in 32-bit MSVC, but it doesn't actually bother to ensure that for locals on the stack so that's not really the minimum alignment necessary for any int64_t object. If you use alignas(8) int64_t tmp;, you get extra instructions to align the stack pointer which you don't get with just int64_t tmp. godbolt.org/z/lsuXAQ. The struct-packing rules are allowed to be more complicated than just padding to alignof(T) relative to the start of the struct, as @P.W's answer shows.

    – Peter Cordes
    Apr 30 at 20:13






    @ComicSansMS: alignof(int64_t) == 8 in 32-bit MSVC, but it doesn't actually bother to ensure that for locals on the stack so that's not really the minimum alignment necessary for any int64_t object. If you use alignas(8) int64_t tmp;, you get extra instructions to align the stack pointer which you don't get with just int64_t tmp. godbolt.org/z/lsuXAQ. The struct-packing rules are allowed to be more complicated than just padding to alignof(T) relative to the start of the struct, as @P.W's answer shows.

    – Peter Cordes
    Apr 30 at 20:13












    8














    This is a matter of alignment requirement of the data type as specified in
    Padding and Alignment of Structure Members




    Every data object has an alignment-requirement. The alignment-requirement for all data except structures, unions, and arrays is either the size of the object or the current packing size (specified with either /Zp or the pack pragma, whichever is less).




    And the default value for structure member alignment is specified in /Zp (Struct Member Alignment)




    The available packing values are described in the following table:



    /Zp argument Effect

    1 Packs structures on 1-byte boundaries. Same as /Zp.

    2 Packs structures on 2-byte boundaries.

    4 Packs structures on 4-byte boundaries.
    8 Packs structures on 8-byte boundaries (default for x86, ARM, and ARM64).

    16 Packs structures on 16-byte boundaries (default for x64).




    Since the default for x86 is /Zp8 which is 8 bytes, the output is 16.



    However, you can specify a different packing size with /Zp option.

    Here is a Live Demo with /Zp4 which gives the output as 12 instead of 16.






    share|improve this answer





























      8














      This is a matter of alignment requirement of the data type as specified in
      Padding and Alignment of Structure Members




      Every data object has an alignment-requirement. The alignment-requirement for all data except structures, unions, and arrays is either the size of the object or the current packing size (specified with either /Zp or the pack pragma, whichever is less).




      And the default value for structure member alignment is specified in /Zp (Struct Member Alignment)




      The available packing values are described in the following table:



      /Zp argument Effect

      1 Packs structures on 1-byte boundaries. Same as /Zp.

      2 Packs structures on 2-byte boundaries.

      4 Packs structures on 4-byte boundaries.
      8 Packs structures on 8-byte boundaries (default for x86, ARM, and ARM64).

      16 Packs structures on 16-byte boundaries (default for x64).




      Since the default for x86 is /Zp8 which is 8 bytes, the output is 16.



      However, you can specify a different packing size with /Zp option.

      Here is a Live Demo with /Zp4 which gives the output as 12 instead of 16.






      share|improve this answer



























        8












        8








        8







        This is a matter of alignment requirement of the data type as specified in
        Padding and Alignment of Structure Members




        Every data object has an alignment-requirement. The alignment-requirement for all data except structures, unions, and arrays is either the size of the object or the current packing size (specified with either /Zp or the pack pragma, whichever is less).




        And the default value for structure member alignment is specified in /Zp (Struct Member Alignment)




        The available packing values are described in the following table:



        /Zp argument Effect

        1 Packs structures on 1-byte boundaries. Same as /Zp.

        2 Packs structures on 2-byte boundaries.

        4 Packs structures on 4-byte boundaries.
        8 Packs structures on 8-byte boundaries (default for x86, ARM, and ARM64).

        16 Packs structures on 16-byte boundaries (default for x64).




        Since the default for x86 is /Zp8 which is 8 bytes, the output is 16.



        However, you can specify a different packing size with /Zp option.

        Here is a Live Demo with /Zp4 which gives the output as 12 instead of 16.






        share|improve this answer















        This is a matter of alignment requirement of the data type as specified in
        Padding and Alignment of Structure Members




        Every data object has an alignment-requirement. The alignment-requirement for all data except structures, unions, and arrays is either the size of the object or the current packing size (specified with either /Zp or the pack pragma, whichever is less).




        And the default value for structure member alignment is specified in /Zp (Struct Member Alignment)




        The available packing values are described in the following table:



        /Zp argument Effect

        1 Packs structures on 1-byte boundaries. Same as /Zp.

        2 Packs structures on 2-byte boundaries.

        4 Packs structures on 4-byte boundaries.
        8 Packs structures on 8-byte boundaries (default for x86, ARM, and ARM64).

        16 Packs structures on 16-byte boundaries (default for x64).




        Since the default for x86 is /Zp8 which is 8 bytes, the output is 16.



        However, you can specify a different packing size with /Zp option.

        Here is a Live Demo with /Zp4 which gives the output as 12 instead of 16.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Apr 30 at 12:28

























        answered Apr 30 at 11:57









        P.WP.W

        19.9k41961




        19.9k41961





















            -2














            A struct's alignment is the size of its largest member.



            That means if you have an 8-byte(64bit) member in the struct, then the struct will align to 8 bytes.



            In the case that you are describing, if the compiler allows the struct to align to 4 bytes, it possibly leads to an 8-byte member lying across the cache line boundary.






            Say we have a CPU that has a 16-byte cache line.
            Consider a struct like this:



            struct Z

            char s; // 1-4 byte
            __int64 i; // 5-12 byte
            __int64 i2; // 13-20 byte, need two cache line fetches to read this variable
            ;





            share|improve this answer


















            • 3





              Nope, a struct's alignment is the alignment of its most-aligned member. Not all C types have a size that matches their alignment, notably other composite types like a struct or array inside another struct, but primitive types aren't guaranteed to have alignof(T) == sizeof(T). On an ABI like i386 System V (32-bit x86 Linux), alignof(int64_t) == 4, so the OP would see their expected sizeof(struct)==12 and alignof(struct)==4.

              – Peter Cordes
              Apr 30 at 19:08
















            -2














            A struct's alignment is the size of its largest member.



            That means if you have an 8-byte(64bit) member in the struct, then the struct will align to 8 bytes.



            In the case that you are describing, if the compiler allows the struct to align to 4 bytes, it possibly leads to an 8-byte member lying across the cache line boundary.






            Say we have a CPU that has a 16-byte cache line.
            Consider a struct like this:



            struct Z

            char s; // 1-4 byte
            __int64 i; // 5-12 byte
            __int64 i2; // 13-20 byte, need two cache line fetches to read this variable
            ;





            share|improve this answer


















            • 3





              Nope, a struct's alignment is the alignment of its most-aligned member. Not all C types have a size that matches their alignment, notably other composite types like a struct or array inside another struct, but primitive types aren't guaranteed to have alignof(T) == sizeof(T). On an ABI like i386 System V (32-bit x86 Linux), alignof(int64_t) == 4, so the OP would see their expected sizeof(struct)==12 and alignof(struct)==4.

              – Peter Cordes
              Apr 30 at 19:08














            -2












            -2








            -2







            A struct's alignment is the size of its largest member.



            That means if you have an 8-byte(64bit) member in the struct, then the struct will align to 8 bytes.



            In the case that you are describing, if the compiler allows the struct to align to 4 bytes, it possibly leads to an 8-byte member lying across the cache line boundary.






            Say we have a CPU that has a 16-byte cache line.
            Consider a struct like this:



            struct Z

            char s; // 1-4 byte
            __int64 i; // 5-12 byte
            __int64 i2; // 13-20 byte, need two cache line fetches to read this variable
            ;





            share|improve this answer













            A struct's alignment is the size of its largest member.



            That means if you have an 8-byte(64bit) member in the struct, then the struct will align to 8 bytes.



            In the case that you are describing, if the compiler allows the struct to align to 4 bytes, it possibly leads to an 8-byte member lying across the cache line boundary.






            Say we have a CPU that has a 16-byte cache line.
            Consider a struct like this:



            struct Z

            char s; // 1-4 byte
            __int64 i; // 5-12 byte
            __int64 i2; // 13-20 byte, need two cache line fetches to read this variable
            ;






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Apr 30 at 15:02









            mitubamituba

            255




            255







            • 3





              Nope, a struct's alignment is the alignment of its most-aligned member. Not all C types have a size that matches their alignment, notably other composite types like a struct or array inside another struct, but primitive types aren't guaranteed to have alignof(T) == sizeof(T). On an ABI like i386 System V (32-bit x86 Linux), alignof(int64_t) == 4, so the OP would see their expected sizeof(struct)==12 and alignof(struct)==4.

              – Peter Cordes
              Apr 30 at 19:08













            • 3





              Nope, a struct's alignment is the alignment of its most-aligned member. Not all C types have a size that matches their alignment, notably other composite types like a struct or array inside another struct, but primitive types aren't guaranteed to have alignof(T) == sizeof(T). On an ABI like i386 System V (32-bit x86 Linux), alignof(int64_t) == 4, so the OP would see their expected sizeof(struct)==12 and alignof(struct)==4.

              – Peter Cordes
              Apr 30 at 19:08








            3




            3





            Nope, a struct's alignment is the alignment of its most-aligned member. Not all C types have a size that matches their alignment, notably other composite types like a struct or array inside another struct, but primitive types aren't guaranteed to have alignof(T) == sizeof(T). On an ABI like i386 System V (32-bit x86 Linux), alignof(int64_t) == 4, so the OP would see their expected sizeof(struct)==12 and alignof(struct)==4.

            – Peter Cordes
            Apr 30 at 19:08






            Nope, a struct's alignment is the alignment of its most-aligned member. Not all C types have a size that matches their alignment, notably other composite types like a struct or array inside another struct, but primitive types aren't guaranteed to have alignof(T) == sizeof(T). On an ABI like i386 System V (32-bit x86 Linux), alignof(int64_t) == 4, so the OP would see their expected sizeof(struct)==12 and alignof(struct)==4.

            – Peter Cordes
            Apr 30 at 19:08











            Shen Yuan is a new contributor. Be nice, and check out our Code of Conduct.









            draft saved

            draft discarded


















            Shen Yuan is a new contributor. Be nice, and check out our Code of Conduct.












            Shen Yuan is a new contributor. Be nice, and check out our Code of Conduct.











            Shen Yuan is a new contributor. Be nice, and check out our Code of Conduct.














            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55920103%2fwhy-is-the-alignment-the-same-on-32-bit-and-64-bit-systems%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

            Circuit construction for execution of conditional statements using least significant bitHow are two different registers being used as “control”?How exactly is the stated composite state of the two registers being produced using the $R_zz$ controlled rotations?Efficiently performing controlled rotations in HHLWould this quantum algorithm implementation work?How to prepare a superposed states of odd integers from $1$ to $sqrtN$?Why is this implementation of the order finding algorithm not working?Circuit construction for Hamiltonian simulationHow can I invert the least significant bit of a certain term of a superposed state?Implementing an oracleImplementing a controlled sum operation

            Magento 2 “No Payment Methods” in Admin New OrderHow to integrate Paypal Express Checkout with the Magento APIMagento 1.5 - Sales > Order > edit order and shipping methods disappearAuto Invoice Check/Money Order Payment methodAdd more simple payment methods?Shipping methods not showingWhat should I do to change payment methods if changing the configuration has no effects?1.9 - No Payment Methods showing upMy Payment Methods not Showing for downloadable/virtual product when checkout?Magento2 API to access internal payment methodHow to call an existing payment methods in the registration form?