What is the most efficient way to write 'for' loops in Matlab?What are the advantages and disadvantages of the particle decomposition and domain decomposition parallelization algorithms?Trying to implement a simple/efficient combinations function in MATLABPicking grain size or efficiently solving nonlinear equation via another way in MATLABWindows C++ library for operations on mesh with a mex-interface in Matlab?For loops in matlab with a specific starting pointCurve Fitting problem MATLABFast computation on 2D rectangular cartesian grid using matlabMATLAB Matrix Multiply EfficiencyAccelerate computation speed by using a different syntaxHow to make this parfor loop work properly in Matlab?

Are certificates without DNS fundamentally flawed?

Probably terminated or laid off soon; confront or not?

Making pause in a diagram

I was contacted by a private bank overseas to get my inheritance

How to draw a flow chart?

Can I enter a rental property without giving notice if I'm afraid a tenant may be hurt?

What is a Casino Word™?

Our group keeps dying during the Lost Mine of Phandelver campaign. What are we doing wrong?

Where to pee in London?

How can I refer to something in a book?

Purchased new computer from DELL with pre-installed Ubuntu. Won't boot. Should assume its an error from DELL?

Should I take out a personal loan to pay off credit card debt?

Does the United States guarantee any unique freedoms?

What are the examples (applications) of the MIPs in which the objective function has nonzero coefficients for only continuous variables?

Did Captain America make out with his niece?

Print only the last three columns from file

Deadlock Priority High Chosen as deadlock victim

Does a 4 bladed prop have almost twice the thrust of a 2 bladed prop?

Does bottle color affect mold growth?

How to check a file was encrypted (really & correctly)

"How do you solve a problem like Maria?"

Differentiability of operator norm

Is it true that control+alt+delete only became a thing because IBM would not build Bill Gates a computer with a task manager button?

Game schedule where each player meets only once



What is the most efficient way to write 'for' loops in Matlab?


What are the advantages and disadvantages of the particle decomposition and domain decomposition parallelization algorithms?Trying to implement a simple/efficient combinations function in MATLABPicking grain size or efficiently solving nonlinear equation via another way in MATLABWindows C++ library for operations on mesh with a mex-interface in Matlab?For loops in matlab with a specific starting pointCurve Fitting problem MATLABFast computation on 2D rectangular cartesian grid using matlabMATLAB Matrix Multiply EfficiencyAccelerate computation speed by using a different syntaxHow to make this parfor loop work properly in Matlab?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








11












$begingroup$


I have read that if, for example, I have a double for loop that runs over the indexes of a matrix, then putting the column running index in the outer loop is more efficient. For example:



a=zeros(1000);
for j=1:1000
for i=1:1000
a(i,j)=1;
end
end


What is the most efficient way to code it if I have three or more for loops?



For example:



a=zeros(100,100,100);
for j=1:100
for i=1:100
for k=1:100
a(i,j,k)=1;
end
end
end









share|cite|improve this question











$endgroup$









  • 4




    $begingroup$
    For loops are very slow in MATLAB. You should avoid explicit loops in MATLAB whenever possible. Instead, usually a problem can expressed in terms of matrix/vector operations. That is the MATLABic way. There are also a lot of built-in functions to initialise matrices, etc. For instance, there is a function, ones(), that will set all elements of a matrix to 1 (by extension, to any value by multiplication (a scalar multiplied by the all-ones matrix)). It also works on 3-D arrays (which I think covers the example here).
    $endgroup$
    – Peter Mortensen
    Jul 28 at 16:33






  • 3




    $begingroup$
    @PeterMortensen By what factor (roughly) the efficiency of loops in Matlab is smaller compared to C and Python? And why is that? Also, havn't the efficiency of loops in Matlab gotten better in the last several years?
    $endgroup$
    – TensoR
    Jul 28 at 17:56






  • 3




    $begingroup$
    @PeterMortensen “usually a problem can be expressed in terms of matrix/vector operations” – for certain values of “usually”, yes. IMO it's more accurate to say that people working in Matlab and the like have a decades-long culture of ignoring all the things that can't be done with matrix/vector operations, so much that everything looks like a nail to them for that hammer. And we shouldn't just say “for loops in Matlab are slow” but “Matlab is slow” (it only happens to be linked to a fast library of LA primitives written in C and Fortran).
    $endgroup$
    – leftaroundabout
    Jul 29 at 9:33







  • 5




    $begingroup$
    The performance of for loops is controversial: matlabtips.com/matlab-is-no-longer-slow-at-for-loops
    $endgroup$
    – ohreally
    Jul 29 at 10:58










  • $begingroup$
    @leftaroundabout True. Being concerned about speed in an interpreted (or semi-interpreted) language is a pretty clear indication you have an X-Y problem where the actual solution is "don't use this language". The exception of course is if you're using code generation in Simulink, but then the question is what C the code generator builds and how efficient that is.
    $endgroup$
    – Graham
    Jul 30 at 9:04

















11












$begingroup$


I have read that if, for example, I have a double for loop that runs over the indexes of a matrix, then putting the column running index in the outer loop is more efficient. For example:



a=zeros(1000);
for j=1:1000
for i=1:1000
a(i,j)=1;
end
end


What is the most efficient way to code it if I have three or more for loops?



For example:



a=zeros(100,100,100);
for j=1:100
for i=1:100
for k=1:100
a(i,j,k)=1;
end
end
end









share|cite|improve this question











$endgroup$









  • 4




    $begingroup$
    For loops are very slow in MATLAB. You should avoid explicit loops in MATLAB whenever possible. Instead, usually a problem can expressed in terms of matrix/vector operations. That is the MATLABic way. There are also a lot of built-in functions to initialise matrices, etc. For instance, there is a function, ones(), that will set all elements of a matrix to 1 (by extension, to any value by multiplication (a scalar multiplied by the all-ones matrix)). It also works on 3-D arrays (which I think covers the example here).
    $endgroup$
    – Peter Mortensen
    Jul 28 at 16:33






  • 3




    $begingroup$
    @PeterMortensen By what factor (roughly) the efficiency of loops in Matlab is smaller compared to C and Python? And why is that? Also, havn't the efficiency of loops in Matlab gotten better in the last several years?
    $endgroup$
    – TensoR
    Jul 28 at 17:56






  • 3




    $begingroup$
    @PeterMortensen “usually a problem can be expressed in terms of matrix/vector operations” – for certain values of “usually”, yes. IMO it's more accurate to say that people working in Matlab and the like have a decades-long culture of ignoring all the things that can't be done with matrix/vector operations, so much that everything looks like a nail to them for that hammer. And we shouldn't just say “for loops in Matlab are slow” but “Matlab is slow” (it only happens to be linked to a fast library of LA primitives written in C and Fortran).
    $endgroup$
    – leftaroundabout
    Jul 29 at 9:33







  • 5




    $begingroup$
    The performance of for loops is controversial: matlabtips.com/matlab-is-no-longer-slow-at-for-loops
    $endgroup$
    – ohreally
    Jul 29 at 10:58










  • $begingroup$
    @leftaroundabout True. Being concerned about speed in an interpreted (or semi-interpreted) language is a pretty clear indication you have an X-Y problem where the actual solution is "don't use this language". The exception of course is if you're using code generation in Simulink, but then the question is what C the code generator builds and how efficient that is.
    $endgroup$
    – Graham
    Jul 30 at 9:04













11












11








11


4



$begingroup$


I have read that if, for example, I have a double for loop that runs over the indexes of a matrix, then putting the column running index in the outer loop is more efficient. For example:



a=zeros(1000);
for j=1:1000
for i=1:1000
a(i,j)=1;
end
end


What is the most efficient way to code it if I have three or more for loops?



For example:



a=zeros(100,100,100);
for j=1:100
for i=1:100
for k=1:100
a(i,j,k)=1;
end
end
end









share|cite|improve this question











$endgroup$




I have read that if, for example, I have a double for loop that runs over the indexes of a matrix, then putting the column running index in the outer loop is more efficient. For example:



a=zeros(1000);
for j=1:1000
for i=1:1000
a(i,j)=1;
end
end


What is the most efficient way to code it if I have three or more for loops?



For example:



a=zeros(100,100,100);
for j=1:100
for i=1:100
for k=1:100
a(i,j,k)=1;
end
end
end






matlab efficiency






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Jul 28 at 17:45









Anton Menshov

5,6333 gold badges22 silver badges80 bronze badges




5,6333 gold badges22 silver badges80 bronze badges










asked Jul 28 at 4:56









TensoRTensoR

818 bronze badges




818 bronze badges










  • 4




    $begingroup$
    For loops are very slow in MATLAB. You should avoid explicit loops in MATLAB whenever possible. Instead, usually a problem can expressed in terms of matrix/vector operations. That is the MATLABic way. There are also a lot of built-in functions to initialise matrices, etc. For instance, there is a function, ones(), that will set all elements of a matrix to 1 (by extension, to any value by multiplication (a scalar multiplied by the all-ones matrix)). It also works on 3-D arrays (which I think covers the example here).
    $endgroup$
    – Peter Mortensen
    Jul 28 at 16:33






  • 3




    $begingroup$
    @PeterMortensen By what factor (roughly) the efficiency of loops in Matlab is smaller compared to C and Python? And why is that? Also, havn't the efficiency of loops in Matlab gotten better in the last several years?
    $endgroup$
    – TensoR
    Jul 28 at 17:56






  • 3




    $begingroup$
    @PeterMortensen “usually a problem can be expressed in terms of matrix/vector operations” – for certain values of “usually”, yes. IMO it's more accurate to say that people working in Matlab and the like have a decades-long culture of ignoring all the things that can't be done with matrix/vector operations, so much that everything looks like a nail to them for that hammer. And we shouldn't just say “for loops in Matlab are slow” but “Matlab is slow” (it only happens to be linked to a fast library of LA primitives written in C and Fortran).
    $endgroup$
    – leftaroundabout
    Jul 29 at 9:33







  • 5




    $begingroup$
    The performance of for loops is controversial: matlabtips.com/matlab-is-no-longer-slow-at-for-loops
    $endgroup$
    – ohreally
    Jul 29 at 10:58










  • $begingroup$
    @leftaroundabout True. Being concerned about speed in an interpreted (or semi-interpreted) language is a pretty clear indication you have an X-Y problem where the actual solution is "don't use this language". The exception of course is if you're using code generation in Simulink, but then the question is what C the code generator builds and how efficient that is.
    $endgroup$
    – Graham
    Jul 30 at 9:04












  • 4




    $begingroup$
    For loops are very slow in MATLAB. You should avoid explicit loops in MATLAB whenever possible. Instead, usually a problem can expressed in terms of matrix/vector operations. That is the MATLABic way. There are also a lot of built-in functions to initialise matrices, etc. For instance, there is a function, ones(), that will set all elements of a matrix to 1 (by extension, to any value by multiplication (a scalar multiplied by the all-ones matrix)). It also works on 3-D arrays (which I think covers the example here).
    $endgroup$
    – Peter Mortensen
    Jul 28 at 16:33






  • 3




    $begingroup$
    @PeterMortensen By what factor (roughly) the efficiency of loops in Matlab is smaller compared to C and Python? And why is that? Also, havn't the efficiency of loops in Matlab gotten better in the last several years?
    $endgroup$
    – TensoR
    Jul 28 at 17:56






  • 3




    $begingroup$
    @PeterMortensen “usually a problem can be expressed in terms of matrix/vector operations” – for certain values of “usually”, yes. IMO it's more accurate to say that people working in Matlab and the like have a decades-long culture of ignoring all the things that can't be done with matrix/vector operations, so much that everything looks like a nail to them for that hammer. And we shouldn't just say “for loops in Matlab are slow” but “Matlab is slow” (it only happens to be linked to a fast library of LA primitives written in C and Fortran).
    $endgroup$
    – leftaroundabout
    Jul 29 at 9:33







  • 5




    $begingroup$
    The performance of for loops is controversial: matlabtips.com/matlab-is-no-longer-slow-at-for-loops
    $endgroup$
    – ohreally
    Jul 29 at 10:58










  • $begingroup$
    @leftaroundabout True. Being concerned about speed in an interpreted (or semi-interpreted) language is a pretty clear indication you have an X-Y problem where the actual solution is "don't use this language". The exception of course is if you're using code generation in Simulink, but then the question is what C the code generator builds and how efficient that is.
    $endgroup$
    – Graham
    Jul 30 at 9:04







4




4




$begingroup$
For loops are very slow in MATLAB. You should avoid explicit loops in MATLAB whenever possible. Instead, usually a problem can expressed in terms of matrix/vector operations. That is the MATLABic way. There are also a lot of built-in functions to initialise matrices, etc. For instance, there is a function, ones(), that will set all elements of a matrix to 1 (by extension, to any value by multiplication (a scalar multiplied by the all-ones matrix)). It also works on 3-D arrays (which I think covers the example here).
$endgroup$
– Peter Mortensen
Jul 28 at 16:33




$begingroup$
For loops are very slow in MATLAB. You should avoid explicit loops in MATLAB whenever possible. Instead, usually a problem can expressed in terms of matrix/vector operations. That is the MATLABic way. There are also a lot of built-in functions to initialise matrices, etc. For instance, there is a function, ones(), that will set all elements of a matrix to 1 (by extension, to any value by multiplication (a scalar multiplied by the all-ones matrix)). It also works on 3-D arrays (which I think covers the example here).
$endgroup$
– Peter Mortensen
Jul 28 at 16:33




3




3




$begingroup$
@PeterMortensen By what factor (roughly) the efficiency of loops in Matlab is smaller compared to C and Python? And why is that? Also, havn't the efficiency of loops in Matlab gotten better in the last several years?
$endgroup$
– TensoR
Jul 28 at 17:56




$begingroup$
@PeterMortensen By what factor (roughly) the efficiency of loops in Matlab is smaller compared to C and Python? And why is that? Also, havn't the efficiency of loops in Matlab gotten better in the last several years?
$endgroup$
– TensoR
Jul 28 at 17:56




3




3




$begingroup$
@PeterMortensen “usually a problem can be expressed in terms of matrix/vector operations” – for certain values of “usually”, yes. IMO it's more accurate to say that people working in Matlab and the like have a decades-long culture of ignoring all the things that can't be done with matrix/vector operations, so much that everything looks like a nail to them for that hammer. And we shouldn't just say “for loops in Matlab are slow” but “Matlab is slow” (it only happens to be linked to a fast library of LA primitives written in C and Fortran).
$endgroup$
– leftaroundabout
Jul 29 at 9:33





$begingroup$
@PeterMortensen “usually a problem can be expressed in terms of matrix/vector operations” – for certain values of “usually”, yes. IMO it's more accurate to say that people working in Matlab and the like have a decades-long culture of ignoring all the things that can't be done with matrix/vector operations, so much that everything looks like a nail to them for that hammer. And we shouldn't just say “for loops in Matlab are slow” but “Matlab is slow” (it only happens to be linked to a fast library of LA primitives written in C and Fortran).
$endgroup$
– leftaroundabout
Jul 29 at 9:33





5




5




$begingroup$
The performance of for loops is controversial: matlabtips.com/matlab-is-no-longer-slow-at-for-loops
$endgroup$
– ohreally
Jul 29 at 10:58




$begingroup$
The performance of for loops is controversial: matlabtips.com/matlab-is-no-longer-slow-at-for-loops
$endgroup$
– ohreally
Jul 29 at 10:58












$begingroup$
@leftaroundabout True. Being concerned about speed in an interpreted (or semi-interpreted) language is a pretty clear indication you have an X-Y problem where the actual solution is "don't use this language". The exception of course is if you're using code generation in Simulink, but then the question is what C the code generator builds and how efficient that is.
$endgroup$
– Graham
Jul 30 at 9:04




$begingroup$
@leftaroundabout True. Being concerned about speed in an interpreted (or semi-interpreted) language is a pretty clear indication you have an X-Y problem where the actual solution is "don't use this language". The exception of course is if you're using code generation in Simulink, but then the question is what C the code generator builds and how efficient that is.
$endgroup$
– Graham
Jul 30 at 9:04










2 Answers
2






active

oldest

votes


















17












$begingroup$

Short answer, you want to have the leftmost index on the innermost loop. In your example, the loop indices would go k, j, i and the array indices would be i, j, k. This has to do with how MATLAB stores the different dimensions in memory. For more, see #13 of this reddit post.






share|cite|improve this answer









$endgroup$










  • 2




    $begingroup$
    Or use the built-in function ones().
    $endgroup$
    – Peter Mortensen
    Jul 28 at 16:28






  • 4




    $begingroup$
    @Peter OP's example is almost certainly just a toy example of a for loop that does something and not the actual use case.
    $endgroup$
    – Matt
    Jul 28 at 17:09










  • $begingroup$
    @Matt You are correct.
    $endgroup$
    – TensoR
    Jul 28 at 17:57


















11












$begingroup$

A somewhat longer answer that explains why it's more efficient to have the left most index varying most rapidly. There are two key things that you need to understand.



First, MATLAB (and Fortran, but not C and most other programming languages) stores arrays in memory in "column major order." e.g. if A is a 2 by 3 by 10 matrix, then the entries will be stored in memory in the order



A(1,1,1)



A(2,1,1)



A(1,2,1)



A(2,2,1)



A(1,3,1)



A(2,3,1)



A(1,1,2)



A(2,1,2)



...



A(2,3,10)



This choice of column major order is arbitrary- we could just easily adopt a "row major order" convention, and in fact that's what is done in C and some other programming languages.



The second important thing that you need to understand is that modern processors don't access memory one location at a time, but rather load and store "cache lines" of 64 or even 128 contiguous bytes (8 or 16 double precision floating point numbers) at a time from memory. These chunks of data are temporarily stored in a fast memory cache and written back out as needed. (In practice the cache architecture is now quite complicated with as many as 3 or 4 levels of cache memory, but the basic idea can be explained with a one-level cache of the sort that computers had in my younger days.)



Now, suppose that $A$ is an array with 10,000 rows and columns, and I'm looping over all of the entries.



If the loops are nested so that the innermost loop updates the row subscript, then the array entries will be accessed in the order A(1,1), A(2,1), A(3,1), ... When the first entry A(1,1) is accessed, the system will bring a cache line containing A(1,1), A(2,1), ..., A(8,1) into the cache from main memory. The next 8 iterations of the innermost loop work on this data without any additional main memory transfers.



If in the alternative, we structure the loops so that the column index varies in the innermost loop, then the entries of A would be accessed in the order A(1,1), A(1,2), A(1,3), ... In this case, the first access would bring A(1,1), A(2,1), ..., A(8,1) into the cache from main memory, but 7/8 of these entries wouldn't be used. The access to A(1,2) in the second iteration would then bring another 8 entries in from main memory, and so on. By the time the code got around to working on row 2 of the matrix, the A(2,1) entry might well be flushed out of the cache to make way for other needed data. As a result, the code is generating 8 times as much traffic as necessary.



Some optimizing compilers are capable of automatically restructuring loops to avoid this problem.



Many numerical linear algebra algorithms for matrix multiplication and factorization can be optimized to work efficiently with the row-major or column-major ordering scheme depending on the programming language. Doing this the wrong way can have a significant negative impact on performance.






share|cite|improve this answer









$endgroup$

















    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "363"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fscicomp.stackexchange.com%2fquestions%2f33129%2fwhat-is-the-most-efficient-way-to-write-for-loops-in-matlab%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    17












    $begingroup$

    Short answer, you want to have the leftmost index on the innermost loop. In your example, the loop indices would go k, j, i and the array indices would be i, j, k. This has to do with how MATLAB stores the different dimensions in memory. For more, see #13 of this reddit post.






    share|cite|improve this answer









    $endgroup$










    • 2




      $begingroup$
      Or use the built-in function ones().
      $endgroup$
      – Peter Mortensen
      Jul 28 at 16:28






    • 4




      $begingroup$
      @Peter OP's example is almost certainly just a toy example of a for loop that does something and not the actual use case.
      $endgroup$
      – Matt
      Jul 28 at 17:09










    • $begingroup$
      @Matt You are correct.
      $endgroup$
      – TensoR
      Jul 28 at 17:57















    17












    $begingroup$

    Short answer, you want to have the leftmost index on the innermost loop. In your example, the loop indices would go k, j, i and the array indices would be i, j, k. This has to do with how MATLAB stores the different dimensions in memory. For more, see #13 of this reddit post.






    share|cite|improve this answer









    $endgroup$










    • 2




      $begingroup$
      Or use the built-in function ones().
      $endgroup$
      – Peter Mortensen
      Jul 28 at 16:28






    • 4




      $begingroup$
      @Peter OP's example is almost certainly just a toy example of a for loop that does something and not the actual use case.
      $endgroup$
      – Matt
      Jul 28 at 17:09










    • $begingroup$
      @Matt You are correct.
      $endgroup$
      – TensoR
      Jul 28 at 17:57













    17












    17








    17





    $begingroup$

    Short answer, you want to have the leftmost index on the innermost loop. In your example, the loop indices would go k, j, i and the array indices would be i, j, k. This has to do with how MATLAB stores the different dimensions in memory. For more, see #13 of this reddit post.






    share|cite|improve this answer









    $endgroup$



    Short answer, you want to have the leftmost index on the innermost loop. In your example, the loop indices would go k, j, i and the array indices would be i, j, k. This has to do with how MATLAB stores the different dimensions in memory. For more, see #13 of this reddit post.







    share|cite|improve this answer












    share|cite|improve this answer



    share|cite|improve this answer










    answered Jul 28 at 7:16









    whpowell96whpowell96

    3311 silver badge3 bronze badges




    3311 silver badge3 bronze badges










    • 2




      $begingroup$
      Or use the built-in function ones().
      $endgroup$
      – Peter Mortensen
      Jul 28 at 16:28






    • 4




      $begingroup$
      @Peter OP's example is almost certainly just a toy example of a for loop that does something and not the actual use case.
      $endgroup$
      – Matt
      Jul 28 at 17:09










    • $begingroup$
      @Matt You are correct.
      $endgroup$
      – TensoR
      Jul 28 at 17:57












    • 2




      $begingroup$
      Or use the built-in function ones().
      $endgroup$
      – Peter Mortensen
      Jul 28 at 16:28






    • 4




      $begingroup$
      @Peter OP's example is almost certainly just a toy example of a for loop that does something and not the actual use case.
      $endgroup$
      – Matt
      Jul 28 at 17:09










    • $begingroup$
      @Matt You are correct.
      $endgroup$
      – TensoR
      Jul 28 at 17:57







    2




    2




    $begingroup$
    Or use the built-in function ones().
    $endgroup$
    – Peter Mortensen
    Jul 28 at 16:28




    $begingroup$
    Or use the built-in function ones().
    $endgroup$
    – Peter Mortensen
    Jul 28 at 16:28




    4




    4




    $begingroup$
    @Peter OP's example is almost certainly just a toy example of a for loop that does something and not the actual use case.
    $endgroup$
    – Matt
    Jul 28 at 17:09




    $begingroup$
    @Peter OP's example is almost certainly just a toy example of a for loop that does something and not the actual use case.
    $endgroup$
    – Matt
    Jul 28 at 17:09












    $begingroup$
    @Matt You are correct.
    $endgroup$
    – TensoR
    Jul 28 at 17:57




    $begingroup$
    @Matt You are correct.
    $endgroup$
    – TensoR
    Jul 28 at 17:57













    11












    $begingroup$

    A somewhat longer answer that explains why it's more efficient to have the left most index varying most rapidly. There are two key things that you need to understand.



    First, MATLAB (and Fortran, but not C and most other programming languages) stores arrays in memory in "column major order." e.g. if A is a 2 by 3 by 10 matrix, then the entries will be stored in memory in the order



    A(1,1,1)



    A(2,1,1)



    A(1,2,1)



    A(2,2,1)



    A(1,3,1)



    A(2,3,1)



    A(1,1,2)



    A(2,1,2)



    ...



    A(2,3,10)



    This choice of column major order is arbitrary- we could just easily adopt a "row major order" convention, and in fact that's what is done in C and some other programming languages.



    The second important thing that you need to understand is that modern processors don't access memory one location at a time, but rather load and store "cache lines" of 64 or even 128 contiguous bytes (8 or 16 double precision floating point numbers) at a time from memory. These chunks of data are temporarily stored in a fast memory cache and written back out as needed. (In practice the cache architecture is now quite complicated with as many as 3 or 4 levels of cache memory, but the basic idea can be explained with a one-level cache of the sort that computers had in my younger days.)



    Now, suppose that $A$ is an array with 10,000 rows and columns, and I'm looping over all of the entries.



    If the loops are nested so that the innermost loop updates the row subscript, then the array entries will be accessed in the order A(1,1), A(2,1), A(3,1), ... When the first entry A(1,1) is accessed, the system will bring a cache line containing A(1,1), A(2,1), ..., A(8,1) into the cache from main memory. The next 8 iterations of the innermost loop work on this data without any additional main memory transfers.



    If in the alternative, we structure the loops so that the column index varies in the innermost loop, then the entries of A would be accessed in the order A(1,1), A(1,2), A(1,3), ... In this case, the first access would bring A(1,1), A(2,1), ..., A(8,1) into the cache from main memory, but 7/8 of these entries wouldn't be used. The access to A(1,2) in the second iteration would then bring another 8 entries in from main memory, and so on. By the time the code got around to working on row 2 of the matrix, the A(2,1) entry might well be flushed out of the cache to make way for other needed data. As a result, the code is generating 8 times as much traffic as necessary.



    Some optimizing compilers are capable of automatically restructuring loops to avoid this problem.



    Many numerical linear algebra algorithms for matrix multiplication and factorization can be optimized to work efficiently with the row-major or column-major ordering scheme depending on the programming language. Doing this the wrong way can have a significant negative impact on performance.






    share|cite|improve this answer









    $endgroup$



















      11












      $begingroup$

      A somewhat longer answer that explains why it's more efficient to have the left most index varying most rapidly. There are two key things that you need to understand.



      First, MATLAB (and Fortran, but not C and most other programming languages) stores arrays in memory in "column major order." e.g. if A is a 2 by 3 by 10 matrix, then the entries will be stored in memory in the order



      A(1,1,1)



      A(2,1,1)



      A(1,2,1)



      A(2,2,1)



      A(1,3,1)



      A(2,3,1)



      A(1,1,2)



      A(2,1,2)



      ...



      A(2,3,10)



      This choice of column major order is arbitrary- we could just easily adopt a "row major order" convention, and in fact that's what is done in C and some other programming languages.



      The second important thing that you need to understand is that modern processors don't access memory one location at a time, but rather load and store "cache lines" of 64 or even 128 contiguous bytes (8 or 16 double precision floating point numbers) at a time from memory. These chunks of data are temporarily stored in a fast memory cache and written back out as needed. (In practice the cache architecture is now quite complicated with as many as 3 or 4 levels of cache memory, but the basic idea can be explained with a one-level cache of the sort that computers had in my younger days.)



      Now, suppose that $A$ is an array with 10,000 rows and columns, and I'm looping over all of the entries.



      If the loops are nested so that the innermost loop updates the row subscript, then the array entries will be accessed in the order A(1,1), A(2,1), A(3,1), ... When the first entry A(1,1) is accessed, the system will bring a cache line containing A(1,1), A(2,1), ..., A(8,1) into the cache from main memory. The next 8 iterations of the innermost loop work on this data without any additional main memory transfers.



      If in the alternative, we structure the loops so that the column index varies in the innermost loop, then the entries of A would be accessed in the order A(1,1), A(1,2), A(1,3), ... In this case, the first access would bring A(1,1), A(2,1), ..., A(8,1) into the cache from main memory, but 7/8 of these entries wouldn't be used. The access to A(1,2) in the second iteration would then bring another 8 entries in from main memory, and so on. By the time the code got around to working on row 2 of the matrix, the A(2,1) entry might well be flushed out of the cache to make way for other needed data. As a result, the code is generating 8 times as much traffic as necessary.



      Some optimizing compilers are capable of automatically restructuring loops to avoid this problem.



      Many numerical linear algebra algorithms for matrix multiplication and factorization can be optimized to work efficiently with the row-major or column-major ordering scheme depending on the programming language. Doing this the wrong way can have a significant negative impact on performance.






      share|cite|improve this answer









      $endgroup$

















        11












        11








        11





        $begingroup$

        A somewhat longer answer that explains why it's more efficient to have the left most index varying most rapidly. There are two key things that you need to understand.



        First, MATLAB (and Fortran, but not C and most other programming languages) stores arrays in memory in "column major order." e.g. if A is a 2 by 3 by 10 matrix, then the entries will be stored in memory in the order



        A(1,1,1)



        A(2,1,1)



        A(1,2,1)



        A(2,2,1)



        A(1,3,1)



        A(2,3,1)



        A(1,1,2)



        A(2,1,2)



        ...



        A(2,3,10)



        This choice of column major order is arbitrary- we could just easily adopt a "row major order" convention, and in fact that's what is done in C and some other programming languages.



        The second important thing that you need to understand is that modern processors don't access memory one location at a time, but rather load and store "cache lines" of 64 or even 128 contiguous bytes (8 or 16 double precision floating point numbers) at a time from memory. These chunks of data are temporarily stored in a fast memory cache and written back out as needed. (In practice the cache architecture is now quite complicated with as many as 3 or 4 levels of cache memory, but the basic idea can be explained with a one-level cache of the sort that computers had in my younger days.)



        Now, suppose that $A$ is an array with 10,000 rows and columns, and I'm looping over all of the entries.



        If the loops are nested so that the innermost loop updates the row subscript, then the array entries will be accessed in the order A(1,1), A(2,1), A(3,1), ... When the first entry A(1,1) is accessed, the system will bring a cache line containing A(1,1), A(2,1), ..., A(8,1) into the cache from main memory. The next 8 iterations of the innermost loop work on this data without any additional main memory transfers.



        If in the alternative, we structure the loops so that the column index varies in the innermost loop, then the entries of A would be accessed in the order A(1,1), A(1,2), A(1,3), ... In this case, the first access would bring A(1,1), A(2,1), ..., A(8,1) into the cache from main memory, but 7/8 of these entries wouldn't be used. The access to A(1,2) in the second iteration would then bring another 8 entries in from main memory, and so on. By the time the code got around to working on row 2 of the matrix, the A(2,1) entry might well be flushed out of the cache to make way for other needed data. As a result, the code is generating 8 times as much traffic as necessary.



        Some optimizing compilers are capable of automatically restructuring loops to avoid this problem.



        Many numerical linear algebra algorithms for matrix multiplication and factorization can be optimized to work efficiently with the row-major or column-major ordering scheme depending on the programming language. Doing this the wrong way can have a significant negative impact on performance.






        share|cite|improve this answer









        $endgroup$



        A somewhat longer answer that explains why it's more efficient to have the left most index varying most rapidly. There are two key things that you need to understand.



        First, MATLAB (and Fortran, but not C and most other programming languages) stores arrays in memory in "column major order." e.g. if A is a 2 by 3 by 10 matrix, then the entries will be stored in memory in the order



        A(1,1,1)



        A(2,1,1)



        A(1,2,1)



        A(2,2,1)



        A(1,3,1)



        A(2,3,1)



        A(1,1,2)



        A(2,1,2)



        ...



        A(2,3,10)



        This choice of column major order is arbitrary- we could just easily adopt a "row major order" convention, and in fact that's what is done in C and some other programming languages.



        The second important thing that you need to understand is that modern processors don't access memory one location at a time, but rather load and store "cache lines" of 64 or even 128 contiguous bytes (8 or 16 double precision floating point numbers) at a time from memory. These chunks of data are temporarily stored in a fast memory cache and written back out as needed. (In practice the cache architecture is now quite complicated with as many as 3 or 4 levels of cache memory, but the basic idea can be explained with a one-level cache of the sort that computers had in my younger days.)



        Now, suppose that $A$ is an array with 10,000 rows and columns, and I'm looping over all of the entries.



        If the loops are nested so that the innermost loop updates the row subscript, then the array entries will be accessed in the order A(1,1), A(2,1), A(3,1), ... When the first entry A(1,1) is accessed, the system will bring a cache line containing A(1,1), A(2,1), ..., A(8,1) into the cache from main memory. The next 8 iterations of the innermost loop work on this data without any additional main memory transfers.



        If in the alternative, we structure the loops so that the column index varies in the innermost loop, then the entries of A would be accessed in the order A(1,1), A(1,2), A(1,3), ... In this case, the first access would bring A(1,1), A(2,1), ..., A(8,1) into the cache from main memory, but 7/8 of these entries wouldn't be used. The access to A(1,2) in the second iteration would then bring another 8 entries in from main memory, and so on. By the time the code got around to working on row 2 of the matrix, the A(2,1) entry might well be flushed out of the cache to make way for other needed data. As a result, the code is generating 8 times as much traffic as necessary.



        Some optimizing compilers are capable of automatically restructuring loops to avoid this problem.



        Many numerical linear algebra algorithms for matrix multiplication and factorization can be optimized to work efficiently with the row-major or column-major ordering scheme depending on the programming language. Doing this the wrong way can have a significant negative impact on performance.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered Jul 28 at 19:05









        Brian BorchersBrian Borchers

        13.8k1 gold badge22 silver badges53 bronze badges




        13.8k1 gold badge22 silver badges53 bronze badges






























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Computational Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fscicomp.stackexchange.com%2fquestions%2f33129%2fwhat-is-the-most-efficient-way-to-write-for-loops-in-matlab%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

            Circuit construction for execution of conditional statements using least significant bitHow are two different registers being used as “control”?How exactly is the stated composite state of the two registers being produced using the $R_zz$ controlled rotations?Efficiently performing controlled rotations in HHLWould this quantum algorithm implementation work?How to prepare a superposed states of odd integers from $1$ to $sqrtN$?Why is this implementation of the order finding algorithm not working?Circuit construction for Hamiltonian simulationHow can I invert the least significant bit of a certain term of a superposed state?Implementing an oracleImplementing a controlled sum operation

            Magento 2 “No Payment Methods” in Admin New OrderHow to integrate Paypal Express Checkout with the Magento APIMagento 1.5 - Sales > Order > edit order and shipping methods disappearAuto Invoice Check/Money Order Payment methodAdd more simple payment methods?Shipping methods not showingWhat should I do to change payment methods if changing the configuration has no effects?1.9 - No Payment Methods showing upMy Payment Methods not Showing for downloadable/virtual product when checkout?Magento2 API to access internal payment methodHow to call an existing payment methods in the registration form?