What is the most efficient way to write 'for' loops in Matlab?What are the advantages and disadvantages of the particle decomposition and domain decomposition parallelization algorithms?Trying to implement a simple/efficient combinations function in MATLABPicking grain size or efficiently solving nonlinear equation via another way in MATLABWindows C++ library for operations on mesh with a mex-interface in Matlab?For loops in matlab with a specific starting pointCurve Fitting problem MATLABFast computation on 2D rectangular cartesian grid using matlabMATLAB Matrix Multiply EfficiencyAccelerate computation speed by using a different syntaxHow to make this parfor loop work properly in Matlab?
Are certificates without DNS fundamentally flawed?
Probably terminated or laid off soon; confront or not?
Making pause in a diagram
I was contacted by a private bank overseas to get my inheritance
How to draw a flow chart?
Can I enter a rental property without giving notice if I'm afraid a tenant may be hurt?
What is a Casino Word™?
Our group keeps dying during the Lost Mine of Phandelver campaign. What are we doing wrong?
Where to pee in London?
How can I refer to something in a book?
Purchased new computer from DELL with pre-installed Ubuntu. Won't boot. Should assume its an error from DELL?
Should I take out a personal loan to pay off credit card debt?
Does the United States guarantee any unique freedoms?
What are the examples (applications) of the MIPs in which the objective function has nonzero coefficients for only continuous variables?
Did Captain America make out with his niece?
Print only the last three columns from file
Deadlock Priority High Chosen as deadlock victim
Does a 4 bladed prop have almost twice the thrust of a 2 bladed prop?
Does bottle color affect mold growth?
How to check a file was encrypted (really & correctly)
"How do you solve a problem like Maria?"
Differentiability of operator norm
Is it true that control+alt+delete only became a thing because IBM would not build Bill Gates a computer with a task manager button?
Game schedule where each player meets only once
What is the most efficient way to write 'for' loops in Matlab?
What are the advantages and disadvantages of the particle decomposition and domain decomposition parallelization algorithms?Trying to implement a simple/efficient combinations function in MATLABPicking grain size or efficiently solving nonlinear equation via another way in MATLABWindows C++ library for operations on mesh with a mex-interface in Matlab?For loops in matlab with a specific starting pointCurve Fitting problem MATLABFast computation on 2D rectangular cartesian grid using matlabMATLAB Matrix Multiply EfficiencyAccelerate computation speed by using a different syntaxHow to make this parfor loop work properly in Matlab?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
I have read that if, for example, I have a double for
loop that runs over the indexes of a matrix, then putting the column running index in the outer loop is more efficient. For example:
a=zeros(1000);
for j=1:1000
for i=1:1000
a(i,j)=1;
end
end
What is the most efficient way to code it if I have three or more for
loops?
For example:
a=zeros(100,100,100);
for j=1:100
for i=1:100
for k=1:100
a(i,j,k)=1;
end
end
end
matlab efficiency
$endgroup$
|
show 3 more comments
$begingroup$
I have read that if, for example, I have a double for
loop that runs over the indexes of a matrix, then putting the column running index in the outer loop is more efficient. For example:
a=zeros(1000);
for j=1:1000
for i=1:1000
a(i,j)=1;
end
end
What is the most efficient way to code it if I have three or more for
loops?
For example:
a=zeros(100,100,100);
for j=1:100
for i=1:100
for k=1:100
a(i,j,k)=1;
end
end
end
matlab efficiency
$endgroup$
4
$begingroup$
For
loops are very slow in MATLAB. You should avoid explicit loops in MATLAB whenever possible. Instead, usually a problem can expressed in terms of matrix/vector operations. That is the MATLABic way. There are also a lot of built-in functions to initialise matrices, etc. For instance, there is a function, ones(), that will set all elements of a matrix to 1 (by extension, to any value by multiplication (a scalar multiplied by the all-ones matrix)). It also works on 3-D arrays (which I think covers the example here).
$endgroup$
– Peter Mortensen
Jul 28 at 16:33
3
$begingroup$
@PeterMortensen By what factor (roughly) the efficiency of loops in Matlab is smaller compared to C and Python? And why is that? Also, havn't the efficiency of loops in Matlab gotten better in the last several years?
$endgroup$
– TensoR
Jul 28 at 17:56
3
$begingroup$
@PeterMortensen “usually a problem can be expressed in terms of matrix/vector operations” – for certain values of “usually”, yes. IMO it's more accurate to say that people working in Matlab and the like have a decades-long culture of ignoring all the things that can't be done with matrix/vector operations, so much that everything looks like a nail to them for that hammer. And we shouldn't just say “for loops in Matlab are slow” but “Matlab is slow” (it only happens to be linked to a fast library of LA primitives written in C and Fortran).
$endgroup$
– leftaroundabout
Jul 29 at 9:33
5
$begingroup$
The performance of for loops is controversial: matlabtips.com/matlab-is-no-longer-slow-at-for-loops
$endgroup$
– ohreally
Jul 29 at 10:58
$begingroup$
@leftaroundabout True. Being concerned about speed in an interpreted (or semi-interpreted) language is a pretty clear indication you have an X-Y problem where the actual solution is "don't use this language". The exception of course is if you're using code generation in Simulink, but then the question is what C the code generator builds and how efficient that is.
$endgroup$
– Graham
Jul 30 at 9:04
|
show 3 more comments
$begingroup$
I have read that if, for example, I have a double for
loop that runs over the indexes of a matrix, then putting the column running index in the outer loop is more efficient. For example:
a=zeros(1000);
for j=1:1000
for i=1:1000
a(i,j)=1;
end
end
What is the most efficient way to code it if I have three or more for
loops?
For example:
a=zeros(100,100,100);
for j=1:100
for i=1:100
for k=1:100
a(i,j,k)=1;
end
end
end
matlab efficiency
$endgroup$
I have read that if, for example, I have a double for
loop that runs over the indexes of a matrix, then putting the column running index in the outer loop is more efficient. For example:
a=zeros(1000);
for j=1:1000
for i=1:1000
a(i,j)=1;
end
end
What is the most efficient way to code it if I have three or more for
loops?
For example:
a=zeros(100,100,100);
for j=1:100
for i=1:100
for k=1:100
a(i,j,k)=1;
end
end
end
matlab efficiency
matlab efficiency
edited Jul 28 at 17:45
Anton Menshov♦
5,6333 gold badges22 silver badges80 bronze badges
5,6333 gold badges22 silver badges80 bronze badges
asked Jul 28 at 4:56
TensoRTensoR
818 bronze badges
818 bronze badges
4
$begingroup$
For
loops are very slow in MATLAB. You should avoid explicit loops in MATLAB whenever possible. Instead, usually a problem can expressed in terms of matrix/vector operations. That is the MATLABic way. There are also a lot of built-in functions to initialise matrices, etc. For instance, there is a function, ones(), that will set all elements of a matrix to 1 (by extension, to any value by multiplication (a scalar multiplied by the all-ones matrix)). It also works on 3-D arrays (which I think covers the example here).
$endgroup$
– Peter Mortensen
Jul 28 at 16:33
3
$begingroup$
@PeterMortensen By what factor (roughly) the efficiency of loops in Matlab is smaller compared to C and Python? And why is that? Also, havn't the efficiency of loops in Matlab gotten better in the last several years?
$endgroup$
– TensoR
Jul 28 at 17:56
3
$begingroup$
@PeterMortensen “usually a problem can be expressed in terms of matrix/vector operations” – for certain values of “usually”, yes. IMO it's more accurate to say that people working in Matlab and the like have a decades-long culture of ignoring all the things that can't be done with matrix/vector operations, so much that everything looks like a nail to them for that hammer. And we shouldn't just say “for loops in Matlab are slow” but “Matlab is slow” (it only happens to be linked to a fast library of LA primitives written in C and Fortran).
$endgroup$
– leftaroundabout
Jul 29 at 9:33
5
$begingroup$
The performance of for loops is controversial: matlabtips.com/matlab-is-no-longer-slow-at-for-loops
$endgroup$
– ohreally
Jul 29 at 10:58
$begingroup$
@leftaroundabout True. Being concerned about speed in an interpreted (or semi-interpreted) language is a pretty clear indication you have an X-Y problem where the actual solution is "don't use this language". The exception of course is if you're using code generation in Simulink, but then the question is what C the code generator builds and how efficient that is.
$endgroup$
– Graham
Jul 30 at 9:04
|
show 3 more comments
4
$begingroup$
For
loops are very slow in MATLAB. You should avoid explicit loops in MATLAB whenever possible. Instead, usually a problem can expressed in terms of matrix/vector operations. That is the MATLABic way. There are also a lot of built-in functions to initialise matrices, etc. For instance, there is a function, ones(), that will set all elements of a matrix to 1 (by extension, to any value by multiplication (a scalar multiplied by the all-ones matrix)). It also works on 3-D arrays (which I think covers the example here).
$endgroup$
– Peter Mortensen
Jul 28 at 16:33
3
$begingroup$
@PeterMortensen By what factor (roughly) the efficiency of loops in Matlab is smaller compared to C and Python? And why is that? Also, havn't the efficiency of loops in Matlab gotten better in the last several years?
$endgroup$
– TensoR
Jul 28 at 17:56
3
$begingroup$
@PeterMortensen “usually a problem can be expressed in terms of matrix/vector operations” – for certain values of “usually”, yes. IMO it's more accurate to say that people working in Matlab and the like have a decades-long culture of ignoring all the things that can't be done with matrix/vector operations, so much that everything looks like a nail to them for that hammer. And we shouldn't just say “for loops in Matlab are slow” but “Matlab is slow” (it only happens to be linked to a fast library of LA primitives written in C and Fortran).
$endgroup$
– leftaroundabout
Jul 29 at 9:33
5
$begingroup$
The performance of for loops is controversial: matlabtips.com/matlab-is-no-longer-slow-at-for-loops
$endgroup$
– ohreally
Jul 29 at 10:58
$begingroup$
@leftaroundabout True. Being concerned about speed in an interpreted (or semi-interpreted) language is a pretty clear indication you have an X-Y problem where the actual solution is "don't use this language". The exception of course is if you're using code generation in Simulink, but then the question is what C the code generator builds and how efficient that is.
$endgroup$
– Graham
Jul 30 at 9:04
4
4
$begingroup$
For
loops are very slow in MATLAB. You should avoid explicit loops in MATLAB whenever possible. Instead, usually a problem can expressed in terms of matrix/vector operations. That is the MATLABic way. There are also a lot of built-in functions to initialise matrices, etc. For instance, there is a function, ones(), that will set all elements of a matrix to 1 (by extension, to any value by multiplication (a scalar multiplied by the all-ones matrix)). It also works on 3-D arrays (which I think covers the example here).$endgroup$
– Peter Mortensen
Jul 28 at 16:33
$begingroup$
For
loops are very slow in MATLAB. You should avoid explicit loops in MATLAB whenever possible. Instead, usually a problem can expressed in terms of matrix/vector operations. That is the MATLABic way. There are also a lot of built-in functions to initialise matrices, etc. For instance, there is a function, ones(), that will set all elements of a matrix to 1 (by extension, to any value by multiplication (a scalar multiplied by the all-ones matrix)). It also works on 3-D arrays (which I think covers the example here).$endgroup$
– Peter Mortensen
Jul 28 at 16:33
3
3
$begingroup$
@PeterMortensen By what factor (roughly) the efficiency of loops in Matlab is smaller compared to C and Python? And why is that? Also, havn't the efficiency of loops in Matlab gotten better in the last several years?
$endgroup$
– TensoR
Jul 28 at 17:56
$begingroup$
@PeterMortensen By what factor (roughly) the efficiency of loops in Matlab is smaller compared to C and Python? And why is that? Also, havn't the efficiency of loops in Matlab gotten better in the last several years?
$endgroup$
– TensoR
Jul 28 at 17:56
3
3
$begingroup$
@PeterMortensen “usually a problem can be expressed in terms of matrix/vector operations” – for certain values of “usually”, yes. IMO it's more accurate to say that people working in Matlab and the like have a decades-long culture of ignoring all the things that can't be done with matrix/vector operations, so much that everything looks like a nail to them for that hammer. And we shouldn't just say “for loops in Matlab are slow” but “Matlab is slow” (it only happens to be linked to a fast library of LA primitives written in C and Fortran).
$endgroup$
– leftaroundabout
Jul 29 at 9:33
$begingroup$
@PeterMortensen “usually a problem can be expressed in terms of matrix/vector operations” – for certain values of “usually”, yes. IMO it's more accurate to say that people working in Matlab and the like have a decades-long culture of ignoring all the things that can't be done with matrix/vector operations, so much that everything looks like a nail to them for that hammer. And we shouldn't just say “for loops in Matlab are slow” but “Matlab is slow” (it only happens to be linked to a fast library of LA primitives written in C and Fortran).
$endgroup$
– leftaroundabout
Jul 29 at 9:33
5
5
$begingroup$
The performance of for loops is controversial: matlabtips.com/matlab-is-no-longer-slow-at-for-loops
$endgroup$
– ohreally
Jul 29 at 10:58
$begingroup$
The performance of for loops is controversial: matlabtips.com/matlab-is-no-longer-slow-at-for-loops
$endgroup$
– ohreally
Jul 29 at 10:58
$begingroup$
@leftaroundabout True. Being concerned about speed in an interpreted (or semi-interpreted) language is a pretty clear indication you have an X-Y problem where the actual solution is "don't use this language". The exception of course is if you're using code generation in Simulink, but then the question is what C the code generator builds and how efficient that is.
$endgroup$
– Graham
Jul 30 at 9:04
$begingroup$
@leftaroundabout True. Being concerned about speed in an interpreted (or semi-interpreted) language is a pretty clear indication you have an X-Y problem where the actual solution is "don't use this language". The exception of course is if you're using code generation in Simulink, but then the question is what C the code generator builds and how efficient that is.
$endgroup$
– Graham
Jul 30 at 9:04
|
show 3 more comments
2 Answers
2
active
oldest
votes
$begingroup$
Short answer, you want to have the leftmost index on the innermost loop. In your example, the loop indices would go k, j, i and the array indices would be i, j, k. This has to do with how MATLAB stores the different dimensions in memory. For more, see #13 of this reddit post.
$endgroup$
2
$begingroup$
Or use the built-in function ones().
$endgroup$
– Peter Mortensen
Jul 28 at 16:28
4
$begingroup$
@Peter OP's example is almost certainly just a toy example of a for loop that does something and not the actual use case.
$endgroup$
– Matt
Jul 28 at 17:09
$begingroup$
@Matt You are correct.
$endgroup$
– TensoR
Jul 28 at 17:57
add a comment |
$begingroup$
A somewhat longer answer that explains why it's more efficient to have the left most index varying most rapidly. There are two key things that you need to understand.
First, MATLAB (and Fortran, but not C and most other programming languages) stores arrays in memory in "column major order." e.g. if A is a 2 by 3 by 10 matrix, then the entries will be stored in memory in the order
A(1,1,1)
A(2,1,1)
A(1,2,1)
A(2,2,1)
A(1,3,1)
A(2,3,1)
A(1,1,2)
A(2,1,2)
...
A(2,3,10)
This choice of column major order is arbitrary- we could just easily adopt a "row major order" convention, and in fact that's what is done in C and some other programming languages.
The second important thing that you need to understand is that modern processors don't access memory one location at a time, but rather load and store "cache lines" of 64 or even 128 contiguous bytes (8 or 16 double precision floating point numbers) at a time from memory. These chunks of data are temporarily stored in a fast memory cache and written back out as needed. (In practice the cache architecture is now quite complicated with as many as 3 or 4 levels of cache memory, but the basic idea can be explained with a one-level cache of the sort that computers had in my younger days.)
Now, suppose that $A$ is an array with 10,000 rows and columns, and I'm looping over all of the entries.
If the loops are nested so that the innermost loop updates the row subscript, then the array entries will be accessed in the order A(1,1), A(2,1), A(3,1), ... When the first entry A(1,1) is accessed, the system will bring a cache line containing A(1,1), A(2,1), ..., A(8,1) into the cache from main memory. The next 8 iterations of the innermost loop work on this data without any additional main memory transfers.
If in the alternative, we structure the loops so that the column index varies in the innermost loop, then the entries of A would be accessed in the order A(1,1), A(1,2), A(1,3), ... In this case, the first access would bring A(1,1), A(2,1), ..., A(8,1) into the cache from main memory, but 7/8 of these entries wouldn't be used. The access to A(1,2) in the second iteration would then bring another 8 entries in from main memory, and so on. By the time the code got around to working on row 2 of the matrix, the A(2,1) entry might well be flushed out of the cache to make way for other needed data. As a result, the code is generating 8 times as much traffic as necessary.
Some optimizing compilers are capable of automatically restructuring loops to avoid this problem.
Many numerical linear algebra algorithms for matrix multiplication and factorization can be optimized to work efficiently with the row-major or column-major ordering scheme depending on the programming language. Doing this the wrong way can have a significant negative impact on performance.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "363"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fscicomp.stackexchange.com%2fquestions%2f33129%2fwhat-is-the-most-efficient-way-to-write-for-loops-in-matlab%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Short answer, you want to have the leftmost index on the innermost loop. In your example, the loop indices would go k, j, i and the array indices would be i, j, k. This has to do with how MATLAB stores the different dimensions in memory. For more, see #13 of this reddit post.
$endgroup$
2
$begingroup$
Or use the built-in function ones().
$endgroup$
– Peter Mortensen
Jul 28 at 16:28
4
$begingroup$
@Peter OP's example is almost certainly just a toy example of a for loop that does something and not the actual use case.
$endgroup$
– Matt
Jul 28 at 17:09
$begingroup$
@Matt You are correct.
$endgroup$
– TensoR
Jul 28 at 17:57
add a comment |
$begingroup$
Short answer, you want to have the leftmost index on the innermost loop. In your example, the loop indices would go k, j, i and the array indices would be i, j, k. This has to do with how MATLAB stores the different dimensions in memory. For more, see #13 of this reddit post.
$endgroup$
2
$begingroup$
Or use the built-in function ones().
$endgroup$
– Peter Mortensen
Jul 28 at 16:28
4
$begingroup$
@Peter OP's example is almost certainly just a toy example of a for loop that does something and not the actual use case.
$endgroup$
– Matt
Jul 28 at 17:09
$begingroup$
@Matt You are correct.
$endgroup$
– TensoR
Jul 28 at 17:57
add a comment |
$begingroup$
Short answer, you want to have the leftmost index on the innermost loop. In your example, the loop indices would go k, j, i and the array indices would be i, j, k. This has to do with how MATLAB stores the different dimensions in memory. For more, see #13 of this reddit post.
$endgroup$
Short answer, you want to have the leftmost index on the innermost loop. In your example, the loop indices would go k, j, i and the array indices would be i, j, k. This has to do with how MATLAB stores the different dimensions in memory. For more, see #13 of this reddit post.
answered Jul 28 at 7:16
whpowell96whpowell96
3311 silver badge3 bronze badges
3311 silver badge3 bronze badges
2
$begingroup$
Or use the built-in function ones().
$endgroup$
– Peter Mortensen
Jul 28 at 16:28
4
$begingroup$
@Peter OP's example is almost certainly just a toy example of a for loop that does something and not the actual use case.
$endgroup$
– Matt
Jul 28 at 17:09
$begingroup$
@Matt You are correct.
$endgroup$
– TensoR
Jul 28 at 17:57
add a comment |
2
$begingroup$
Or use the built-in function ones().
$endgroup$
– Peter Mortensen
Jul 28 at 16:28
4
$begingroup$
@Peter OP's example is almost certainly just a toy example of a for loop that does something and not the actual use case.
$endgroup$
– Matt
Jul 28 at 17:09
$begingroup$
@Matt You are correct.
$endgroup$
– TensoR
Jul 28 at 17:57
2
2
$begingroup$
Or use the built-in function ones().
$endgroup$
– Peter Mortensen
Jul 28 at 16:28
$begingroup$
Or use the built-in function ones().
$endgroup$
– Peter Mortensen
Jul 28 at 16:28
4
4
$begingroup$
@Peter OP's example is almost certainly just a toy example of a for loop that does something and not the actual use case.
$endgroup$
– Matt
Jul 28 at 17:09
$begingroup$
@Peter OP's example is almost certainly just a toy example of a for loop that does something and not the actual use case.
$endgroup$
– Matt
Jul 28 at 17:09
$begingroup$
@Matt You are correct.
$endgroup$
– TensoR
Jul 28 at 17:57
$begingroup$
@Matt You are correct.
$endgroup$
– TensoR
Jul 28 at 17:57
add a comment |
$begingroup$
A somewhat longer answer that explains why it's more efficient to have the left most index varying most rapidly. There are two key things that you need to understand.
First, MATLAB (and Fortran, but not C and most other programming languages) stores arrays in memory in "column major order." e.g. if A is a 2 by 3 by 10 matrix, then the entries will be stored in memory in the order
A(1,1,1)
A(2,1,1)
A(1,2,1)
A(2,2,1)
A(1,3,1)
A(2,3,1)
A(1,1,2)
A(2,1,2)
...
A(2,3,10)
This choice of column major order is arbitrary- we could just easily adopt a "row major order" convention, and in fact that's what is done in C and some other programming languages.
The second important thing that you need to understand is that modern processors don't access memory one location at a time, but rather load and store "cache lines" of 64 or even 128 contiguous bytes (8 or 16 double precision floating point numbers) at a time from memory. These chunks of data are temporarily stored in a fast memory cache and written back out as needed. (In practice the cache architecture is now quite complicated with as many as 3 or 4 levels of cache memory, but the basic idea can be explained with a one-level cache of the sort that computers had in my younger days.)
Now, suppose that $A$ is an array with 10,000 rows and columns, and I'm looping over all of the entries.
If the loops are nested so that the innermost loop updates the row subscript, then the array entries will be accessed in the order A(1,1), A(2,1), A(3,1), ... When the first entry A(1,1) is accessed, the system will bring a cache line containing A(1,1), A(2,1), ..., A(8,1) into the cache from main memory. The next 8 iterations of the innermost loop work on this data without any additional main memory transfers.
If in the alternative, we structure the loops so that the column index varies in the innermost loop, then the entries of A would be accessed in the order A(1,1), A(1,2), A(1,3), ... In this case, the first access would bring A(1,1), A(2,1), ..., A(8,1) into the cache from main memory, but 7/8 of these entries wouldn't be used. The access to A(1,2) in the second iteration would then bring another 8 entries in from main memory, and so on. By the time the code got around to working on row 2 of the matrix, the A(2,1) entry might well be flushed out of the cache to make way for other needed data. As a result, the code is generating 8 times as much traffic as necessary.
Some optimizing compilers are capable of automatically restructuring loops to avoid this problem.
Many numerical linear algebra algorithms for matrix multiplication and factorization can be optimized to work efficiently with the row-major or column-major ordering scheme depending on the programming language. Doing this the wrong way can have a significant negative impact on performance.
$endgroup$
add a comment |
$begingroup$
A somewhat longer answer that explains why it's more efficient to have the left most index varying most rapidly. There are two key things that you need to understand.
First, MATLAB (and Fortran, but not C and most other programming languages) stores arrays in memory in "column major order." e.g. if A is a 2 by 3 by 10 matrix, then the entries will be stored in memory in the order
A(1,1,1)
A(2,1,1)
A(1,2,1)
A(2,2,1)
A(1,3,1)
A(2,3,1)
A(1,1,2)
A(2,1,2)
...
A(2,3,10)
This choice of column major order is arbitrary- we could just easily adopt a "row major order" convention, and in fact that's what is done in C and some other programming languages.
The second important thing that you need to understand is that modern processors don't access memory one location at a time, but rather load and store "cache lines" of 64 or even 128 contiguous bytes (8 or 16 double precision floating point numbers) at a time from memory. These chunks of data are temporarily stored in a fast memory cache and written back out as needed. (In practice the cache architecture is now quite complicated with as many as 3 or 4 levels of cache memory, but the basic idea can be explained with a one-level cache of the sort that computers had in my younger days.)
Now, suppose that $A$ is an array with 10,000 rows and columns, and I'm looping over all of the entries.
If the loops are nested so that the innermost loop updates the row subscript, then the array entries will be accessed in the order A(1,1), A(2,1), A(3,1), ... When the first entry A(1,1) is accessed, the system will bring a cache line containing A(1,1), A(2,1), ..., A(8,1) into the cache from main memory. The next 8 iterations of the innermost loop work on this data without any additional main memory transfers.
If in the alternative, we structure the loops so that the column index varies in the innermost loop, then the entries of A would be accessed in the order A(1,1), A(1,2), A(1,3), ... In this case, the first access would bring A(1,1), A(2,1), ..., A(8,1) into the cache from main memory, but 7/8 of these entries wouldn't be used. The access to A(1,2) in the second iteration would then bring another 8 entries in from main memory, and so on. By the time the code got around to working on row 2 of the matrix, the A(2,1) entry might well be flushed out of the cache to make way for other needed data. As a result, the code is generating 8 times as much traffic as necessary.
Some optimizing compilers are capable of automatically restructuring loops to avoid this problem.
Many numerical linear algebra algorithms for matrix multiplication and factorization can be optimized to work efficiently with the row-major or column-major ordering scheme depending on the programming language. Doing this the wrong way can have a significant negative impact on performance.
$endgroup$
add a comment |
$begingroup$
A somewhat longer answer that explains why it's more efficient to have the left most index varying most rapidly. There are two key things that you need to understand.
First, MATLAB (and Fortran, but not C and most other programming languages) stores arrays in memory in "column major order." e.g. if A is a 2 by 3 by 10 matrix, then the entries will be stored in memory in the order
A(1,1,1)
A(2,1,1)
A(1,2,1)
A(2,2,1)
A(1,3,1)
A(2,3,1)
A(1,1,2)
A(2,1,2)
...
A(2,3,10)
This choice of column major order is arbitrary- we could just easily adopt a "row major order" convention, and in fact that's what is done in C and some other programming languages.
The second important thing that you need to understand is that modern processors don't access memory one location at a time, but rather load and store "cache lines" of 64 or even 128 contiguous bytes (8 or 16 double precision floating point numbers) at a time from memory. These chunks of data are temporarily stored in a fast memory cache and written back out as needed. (In practice the cache architecture is now quite complicated with as many as 3 or 4 levels of cache memory, but the basic idea can be explained with a one-level cache of the sort that computers had in my younger days.)
Now, suppose that $A$ is an array with 10,000 rows and columns, and I'm looping over all of the entries.
If the loops are nested so that the innermost loop updates the row subscript, then the array entries will be accessed in the order A(1,1), A(2,1), A(3,1), ... When the first entry A(1,1) is accessed, the system will bring a cache line containing A(1,1), A(2,1), ..., A(8,1) into the cache from main memory. The next 8 iterations of the innermost loop work on this data without any additional main memory transfers.
If in the alternative, we structure the loops so that the column index varies in the innermost loop, then the entries of A would be accessed in the order A(1,1), A(1,2), A(1,3), ... In this case, the first access would bring A(1,1), A(2,1), ..., A(8,1) into the cache from main memory, but 7/8 of these entries wouldn't be used. The access to A(1,2) in the second iteration would then bring another 8 entries in from main memory, and so on. By the time the code got around to working on row 2 of the matrix, the A(2,1) entry might well be flushed out of the cache to make way for other needed data. As a result, the code is generating 8 times as much traffic as necessary.
Some optimizing compilers are capable of automatically restructuring loops to avoid this problem.
Many numerical linear algebra algorithms for matrix multiplication and factorization can be optimized to work efficiently with the row-major or column-major ordering scheme depending on the programming language. Doing this the wrong way can have a significant negative impact on performance.
$endgroup$
A somewhat longer answer that explains why it's more efficient to have the left most index varying most rapidly. There are two key things that you need to understand.
First, MATLAB (and Fortran, but not C and most other programming languages) stores arrays in memory in "column major order." e.g. if A is a 2 by 3 by 10 matrix, then the entries will be stored in memory in the order
A(1,1,1)
A(2,1,1)
A(1,2,1)
A(2,2,1)
A(1,3,1)
A(2,3,1)
A(1,1,2)
A(2,1,2)
...
A(2,3,10)
This choice of column major order is arbitrary- we could just easily adopt a "row major order" convention, and in fact that's what is done in C and some other programming languages.
The second important thing that you need to understand is that modern processors don't access memory one location at a time, but rather load and store "cache lines" of 64 or even 128 contiguous bytes (8 or 16 double precision floating point numbers) at a time from memory. These chunks of data are temporarily stored in a fast memory cache and written back out as needed. (In practice the cache architecture is now quite complicated with as many as 3 or 4 levels of cache memory, but the basic idea can be explained with a one-level cache of the sort that computers had in my younger days.)
Now, suppose that $A$ is an array with 10,000 rows and columns, and I'm looping over all of the entries.
If the loops are nested so that the innermost loop updates the row subscript, then the array entries will be accessed in the order A(1,1), A(2,1), A(3,1), ... When the first entry A(1,1) is accessed, the system will bring a cache line containing A(1,1), A(2,1), ..., A(8,1) into the cache from main memory. The next 8 iterations of the innermost loop work on this data without any additional main memory transfers.
If in the alternative, we structure the loops so that the column index varies in the innermost loop, then the entries of A would be accessed in the order A(1,1), A(1,2), A(1,3), ... In this case, the first access would bring A(1,1), A(2,1), ..., A(8,1) into the cache from main memory, but 7/8 of these entries wouldn't be used. The access to A(1,2) in the second iteration would then bring another 8 entries in from main memory, and so on. By the time the code got around to working on row 2 of the matrix, the A(2,1) entry might well be flushed out of the cache to make way for other needed data. As a result, the code is generating 8 times as much traffic as necessary.
Some optimizing compilers are capable of automatically restructuring loops to avoid this problem.
Many numerical linear algebra algorithms for matrix multiplication and factorization can be optimized to work efficiently with the row-major or column-major ordering scheme depending on the programming language. Doing this the wrong way can have a significant negative impact on performance.
answered Jul 28 at 19:05
Brian BorchersBrian Borchers
13.8k1 gold badge22 silver badges53 bronze badges
13.8k1 gold badge22 silver badges53 bronze badges
add a comment |
add a comment |
Thanks for contributing an answer to Computational Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fscicomp.stackexchange.com%2fquestions%2f33129%2fwhat-is-the-most-efficient-way-to-write-for-loops-in-matlab%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
4
$begingroup$
For
loops are very slow in MATLAB. You should avoid explicit loops in MATLAB whenever possible. Instead, usually a problem can expressed in terms of matrix/vector operations. That is the MATLABic way. There are also a lot of built-in functions to initialise matrices, etc. For instance, there is a function, ones(), that will set all elements of a matrix to 1 (by extension, to any value by multiplication (a scalar multiplied by the all-ones matrix)). It also works on 3-D arrays (which I think covers the example here).$endgroup$
– Peter Mortensen
Jul 28 at 16:33
3
$begingroup$
@PeterMortensen By what factor (roughly) the efficiency of loops in Matlab is smaller compared to C and Python? And why is that? Also, havn't the efficiency of loops in Matlab gotten better in the last several years?
$endgroup$
– TensoR
Jul 28 at 17:56
3
$begingroup$
@PeterMortensen “usually a problem can be expressed in terms of matrix/vector operations” – for certain values of “usually”, yes. IMO it's more accurate to say that people working in Matlab and the like have a decades-long culture of ignoring all the things that can't be done with matrix/vector operations, so much that everything looks like a nail to them for that hammer. And we shouldn't just say “for loops in Matlab are slow” but “Matlab is slow” (it only happens to be linked to a fast library of LA primitives written in C and Fortran).
$endgroup$
– leftaroundabout
Jul 29 at 9:33
5
$begingroup$
The performance of for loops is controversial: matlabtips.com/matlab-is-no-longer-slow-at-for-loops
$endgroup$
– ohreally
Jul 29 at 10:58
$begingroup$
@leftaroundabout True. Being concerned about speed in an interpreted (or semi-interpreted) language is a pretty clear indication you have an X-Y problem where the actual solution is "don't use this language". The exception of course is if you're using code generation in Simulink, but then the question is what C the code generator builds and how efficient that is.
$endgroup$
– Graham
Jul 30 at 9:04