Huge performance difference of the command find with and without using %M option to show permissions Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Community Moderator Election Results Why I closed the “Why is Kali so hard” questionPermissions for making some some (but not all) files visible directly under a directoryThe relationship between execute permission on a directory and its inode structureFile inheriting permission of directory it is copied in?python vs bc in evaluating 6^6^6Why does find -inum iterate through the whole filesystem tree?Why does chmod succeed on a file when the user does not have write permission on parent directory?Find files with group permissions more restrictive than owner permissionsIs it possible to run ls or find and pipe it through stat?KVM guest I/O hangs randomly“permission denied” when appending with echo, but working with vi

Need a suitable toxic chemical for a murder plot in my novel

What do I do if technical issues prevent me from filing my return on time?

Can a zero nonce be safely used with AES-GCM if the key is random and never used again?

How to say 'striped' in Latin

Replacing HDD with SSD; what about non-APFS/APFS?

Jazz greats knew nothing of modes. Why are they used to improvise on standards?

What did Darwin mean by 'squib' here?

What is the order of Mitzvot in Rambam's Sefer Hamitzvot?

Is there a service that would inform me whenever a new direct route is scheduled from a given airport?

The following signatures were invalid: EXPKEYSIG 1397BC53640DB551

Cauchy Sequence Characterized only By Directly Neighbouring Sequence Members

Limit for e and 1/e

Was credit for the black hole image misattributed?

When communicating altitude with a '9' in it, should it be pronounced "nine hundred" or "niner hundred"?

What do you call a plan that's an alternative plan in case your initial plan fails?

Should you tell Jews they are breaking a commandment?

Blender game recording at the wrong time

Stopping real property loss from eroding embankment

Who can trigger ship-wide alerts in Star Trek?

Why is "Captain Marvel" translated as male in Portugal?

When is phishing education going too far?

How to rotate it perfectly?

What items from the Roman-age tech-level could be used to deter all creatures from entering a small area?

Can I throw a longsword at someone?



Huge performance difference of the command find with and without using %M option to show permissions



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Community Moderator Election Results
Why I closed the “Why is Kali so hard” questionPermissions for making some some (but not all) files visible directly under a directoryThe relationship between execute permission on a directory and its inode structureFile inheriting permission of directory it is copied in?python vs bc in evaluating 6^6^6Why does find -inum iterate through the whole filesystem tree?Why does chmod succeed on a file when the user does not have write permission on parent directory?Find files with group permissions more restrictive than owner permissionsIs it possible to run ls or find and pipe it through stat?KVM guest I/O hangs randomly“permission denied” when appending with echo, but working with vi



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








5















On my CentOS 7.6, I have created a folder (called many_files) with 3,000,000 files, by running:



for i in 1..3000000; do echo $i>$i; done;


I am using the command find to write the information about files in this directory into a file. This works surprisingly fast:



$ time find many_files -printf '%i %y %pn'>info_file

real 0m6.970s
user 0m3.812s
sys 0m0.904s


Now if I add %M to get the permissions:



$ time find many_files -printf '%i %y %M %pn'>info_file

real 2m30.677s
user 0m5.148s
sys 0m37.338s


The command takes much longer. This is very surprising to me, since in a C program we can use struct stat to get inode and permission information of a file and in the kernel the struct inode saves both these information.



My Questions:



  1. What causes this behavior?

  2. Is there a faster way to get file permissions for so many files?









share|improve this question
























  • The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use -perm with find to pick out the files with the permissions you're looking for.

    – Kusalananda
    2 days ago












  • @Kusalananda, Why is it wrong to ask that? If you're faced with an unexpected 20x slowdown, then surely you want to know if it can be avoided? find -perm will still need to look at the permissions, even if not output them, so would using it affect the slowdown in any way?

    – ilkkachu
    yesterday











  • @ilkkachu You are correct. I assumed that the slowdown was due to the extra data produced, just like 0xSheepdog initially thought (which seems to not be the case). I would still not want to get the permissions as text like that if the intention is to process the files based on the permissions though.

    – Kusalananda
    yesterday

















5















On my CentOS 7.6, I have created a folder (called many_files) with 3,000,000 files, by running:



for i in 1..3000000; do echo $i>$i; done;


I am using the command find to write the information about files in this directory into a file. This works surprisingly fast:



$ time find many_files -printf '%i %y %pn'>info_file

real 0m6.970s
user 0m3.812s
sys 0m0.904s


Now if I add %M to get the permissions:



$ time find many_files -printf '%i %y %M %pn'>info_file

real 2m30.677s
user 0m5.148s
sys 0m37.338s


The command takes much longer. This is very surprising to me, since in a C program we can use struct stat to get inode and permission information of a file and in the kernel the struct inode saves both these information.



My Questions:



  1. What causes this behavior?

  2. Is there a faster way to get file permissions for so many files?









share|improve this question
























  • The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use -perm with find to pick out the files with the permissions you're looking for.

    – Kusalananda
    2 days ago












  • @Kusalananda, Why is it wrong to ask that? If you're faced with an unexpected 20x slowdown, then surely you want to know if it can be avoided? find -perm will still need to look at the permissions, even if not output them, so would using it affect the slowdown in any way?

    – ilkkachu
    yesterday











  • @ilkkachu You are correct. I assumed that the slowdown was due to the extra data produced, just like 0xSheepdog initially thought (which seems to not be the case). I would still not want to get the permissions as text like that if the intention is to process the files based on the permissions though.

    – Kusalananda
    yesterday













5












5








5








On my CentOS 7.6, I have created a folder (called many_files) with 3,000,000 files, by running:



for i in 1..3000000; do echo $i>$i; done;


I am using the command find to write the information about files in this directory into a file. This works surprisingly fast:



$ time find many_files -printf '%i %y %pn'>info_file

real 0m6.970s
user 0m3.812s
sys 0m0.904s


Now if I add %M to get the permissions:



$ time find many_files -printf '%i %y %M %pn'>info_file

real 2m30.677s
user 0m5.148s
sys 0m37.338s


The command takes much longer. This is very surprising to me, since in a C program we can use struct stat to get inode and permission information of a file and in the kernel the struct inode saves both these information.



My Questions:



  1. What causes this behavior?

  2. Is there a faster way to get file permissions for so many files?









share|improve this question
















On my CentOS 7.6, I have created a folder (called many_files) with 3,000,000 files, by running:



for i in 1..3000000; do echo $i>$i; done;


I am using the command find to write the information about files in this directory into a file. This works surprisingly fast:



$ time find many_files -printf '%i %y %pn'>info_file

real 0m6.970s
user 0m3.812s
sys 0m0.904s


Now if I add %M to get the permissions:



$ time find many_files -printf '%i %y %M %pn'>info_file

real 2m30.677s
user 0m5.148s
sys 0m37.338s


The command takes much longer. This is very surprising to me, since in a C program we can use struct stat to get inode and permission information of a file and in the kernel the struct inode saves both these information.



My Questions:



  1. What causes this behavior?

  2. Is there a faster way to get file permissions for so many files?






linux files permissions find performance






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 days ago









Jeff Schaller

45k1164147




45k1164147










asked 2 days ago









BahramBahram

334




334












  • The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use -perm with find to pick out the files with the permissions you're looking for.

    – Kusalananda
    2 days ago












  • @Kusalananda, Why is it wrong to ask that? If you're faced with an unexpected 20x slowdown, then surely you want to know if it can be avoided? find -perm will still need to look at the permissions, even if not output them, so would using it affect the slowdown in any way?

    – ilkkachu
    yesterday











  • @ilkkachu You are correct. I assumed that the slowdown was due to the extra data produced, just like 0xSheepdog initially thought (which seems to not be the case). I would still not want to get the permissions as text like that if the intention is to process the files based on the permissions though.

    – Kusalananda
    yesterday

















  • The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use -perm with find to pick out the files with the permissions you're looking for.

    – Kusalananda
    2 days ago












  • @Kusalananda, Why is it wrong to ask that? If you're faced with an unexpected 20x slowdown, then surely you want to know if it can be avoided? find -perm will still need to look at the permissions, even if not output them, so would using it affect the slowdown in any way?

    – ilkkachu
    yesterday











  • @ilkkachu You are correct. I assumed that the slowdown was due to the extra data produced, just like 0xSheepdog initially thought (which seems to not be the case). I would still not want to get the permissions as text like that if the intention is to process the files based on the permissions though.

    – Kusalananda
    yesterday
















The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use -perm with find to pick out the files with the permissions you're looking for.

– Kusalananda
2 days ago






The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use -perm with find to pick out the files with the permissions you're looking for.

– Kusalananda
2 days ago














@Kusalananda, Why is it wrong to ask that? If you're faced with an unexpected 20x slowdown, then surely you want to know if it can be avoided? find -perm will still need to look at the permissions, even if not output them, so would using it affect the slowdown in any way?

– ilkkachu
yesterday





@Kusalananda, Why is it wrong to ask that? If you're faced with an unexpected 20x slowdown, then surely you want to know if it can be avoided? find -perm will still need to look at the permissions, even if not output them, so would using it affect the slowdown in any way?

– ilkkachu
yesterday













@ilkkachu You are correct. I assumed that the slowdown was due to the extra data produced, just like 0xSheepdog initially thought (which seems to not be the case). I would still not want to get the permissions as text like that if the intention is to process the files based on the permissions though.

– Kusalananda
yesterday





@ilkkachu You are correct. I assumed that the slowdown was due to the extra data produced, just like 0xSheepdog initially thought (which seems to not be the case). I would still not want to get the permissions as text like that if the intention is to process the files based on the permissions though.

– Kusalananda
yesterday










1 Answer
1






active

oldest

votes


















10














The first version requires only to readdir(3)/getdents(2) the directory, when run on a filesystem supporting this feature (ext4: filetype feature displayed with tune2fs -l /dev/xxx, xfs: ftype=1 displayed with xfs_info /mount/point ...).



The second version in addition also requires to stat(2) each file, requiring an additional inode lookup, and thus more seeks on the filesystem and device, possibly quite slower if it's a rotating disk and cache wasn't kept. This stat is not required when looking only for name, inode and filetype because the directory entry is enough:




 The linux_dirent structure is declared as follows:

struct linux_dirent
unsigned long d_ino; /* Inode number */
unsigned long d_off; /* Offset to next linux_dirent */
unsigned short d_reclen; /* Length of this linux_dirent */
char d_name[]; /* Filename (null-terminated) */
/* length is actually (d_reclen - 2 -
offsetof(struct linux_dirent, d_name)) */
/*
char pad; // Zero padding byte
char d_type; // File type (only since Linux
// 2.6.4); offset is (d_reclen - 1)
*/




the same informations are available to readdir(3):




struct dirent 
ino_t d_ino; /* Inode number */
off_t d_off; /* Not an offset; see below */
unsigned short d_reclen; /* Length of this record */
unsigned char d_type; /* Type of file; not supported
by all filesystem types */
char d_name[256]; /* Null-terminated filename */
;



Suspected but confirmed by comparing (on a smaller sample...) the two outputs of:



strace -o v1 find many_files -printf '%i %y %pn'>info_file
strace -o v2 find many_files -printf '%i %y %M %pn'>info_file


Which on my Linux amd64 kernel 5.0.x just shows as main difference:



[...]



 getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0
fcntl(5, F_DUPFD_CLOEXEC, 0) = 4
-write(1, "25499894 d many_filesn25502410 f"..., 4096) = 4096
-write(1, "iles/844n25502253 f many_files/8"..., 4096) = 4096
-write(1, "096 f many_files/686n25502095 f "..., 4096) = 4096
-write(1, "es/529n25501938 f many_files/528"..., 4096) = 4096
-write(1, "1 f many_files/371n25501780 f ma"..., 4096) = 4096
-write(1, "/214n25497527 f many_files/213n2"..., 4096) = 4096
-brk(0x55b29a933000) = 0x55b29a933000
+newfstatat(5, "1000", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "999", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "998", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "997", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "996", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "995", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "994", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "993", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "992", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "991", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "990", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0


[...]



+newfstatat(5, "891", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+write(1, "25499894 d drwxr-xr-x many_files"..., 4096) = 4096
+newfstatat(5, "890", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0


[...]






share|improve this answer

























  • Unfortunately, the d_type field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).

    – mosvy
    2 days ago












  • @mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ

    – A.B
    2 days ago












  • Hum actually xfs (CentOS' default) support isn't quite clear...

    – A.B
    2 days ago











  • added how to check if the filetype feature is present on xfs, in case xfs is in use.

    – A.B
    2 days ago











  • I think it's supported on xfs -- when I was making a testcase for a glibc glob(3) that only triggered when the d_type field was absent, I had to use either minixfs or use the GLOB_ALTDIRFUNC.

    – mosvy
    2 days ago











Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f512167%2fhuge-performance-difference-of-the-command-find-with-and-without-using-m-option%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









10














The first version requires only to readdir(3)/getdents(2) the directory, when run on a filesystem supporting this feature (ext4: filetype feature displayed with tune2fs -l /dev/xxx, xfs: ftype=1 displayed with xfs_info /mount/point ...).



The second version in addition also requires to stat(2) each file, requiring an additional inode lookup, and thus more seeks on the filesystem and device, possibly quite slower if it's a rotating disk and cache wasn't kept. This stat is not required when looking only for name, inode and filetype because the directory entry is enough:




 The linux_dirent structure is declared as follows:

struct linux_dirent
unsigned long d_ino; /* Inode number */
unsigned long d_off; /* Offset to next linux_dirent */
unsigned short d_reclen; /* Length of this linux_dirent */
char d_name[]; /* Filename (null-terminated) */
/* length is actually (d_reclen - 2 -
offsetof(struct linux_dirent, d_name)) */
/*
char pad; // Zero padding byte
char d_type; // File type (only since Linux
// 2.6.4); offset is (d_reclen - 1)
*/




the same informations are available to readdir(3):




struct dirent 
ino_t d_ino; /* Inode number */
off_t d_off; /* Not an offset; see below */
unsigned short d_reclen; /* Length of this record */
unsigned char d_type; /* Type of file; not supported
by all filesystem types */
char d_name[256]; /* Null-terminated filename */
;



Suspected but confirmed by comparing (on a smaller sample...) the two outputs of:



strace -o v1 find many_files -printf '%i %y %pn'>info_file
strace -o v2 find many_files -printf '%i %y %M %pn'>info_file


Which on my Linux amd64 kernel 5.0.x just shows as main difference:



[...]



 getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0
fcntl(5, F_DUPFD_CLOEXEC, 0) = 4
-write(1, "25499894 d many_filesn25502410 f"..., 4096) = 4096
-write(1, "iles/844n25502253 f many_files/8"..., 4096) = 4096
-write(1, "096 f many_files/686n25502095 f "..., 4096) = 4096
-write(1, "es/529n25501938 f many_files/528"..., 4096) = 4096
-write(1, "1 f many_files/371n25501780 f ma"..., 4096) = 4096
-write(1, "/214n25497527 f many_files/213n2"..., 4096) = 4096
-brk(0x55b29a933000) = 0x55b29a933000
+newfstatat(5, "1000", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "999", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "998", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "997", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "996", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "995", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "994", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "993", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "992", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "991", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "990", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0


[...]



+newfstatat(5, "891", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+write(1, "25499894 d drwxr-xr-x many_files"..., 4096) = 4096
+newfstatat(5, "890", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0


[...]






share|improve this answer

























  • Unfortunately, the d_type field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).

    – mosvy
    2 days ago












  • @mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ

    – A.B
    2 days ago












  • Hum actually xfs (CentOS' default) support isn't quite clear...

    – A.B
    2 days ago











  • added how to check if the filetype feature is present on xfs, in case xfs is in use.

    – A.B
    2 days ago











  • I think it's supported on xfs -- when I was making a testcase for a glibc glob(3) that only triggered when the d_type field was absent, I had to use either minixfs or use the GLOB_ALTDIRFUNC.

    – mosvy
    2 days ago















10














The first version requires only to readdir(3)/getdents(2) the directory, when run on a filesystem supporting this feature (ext4: filetype feature displayed with tune2fs -l /dev/xxx, xfs: ftype=1 displayed with xfs_info /mount/point ...).



The second version in addition also requires to stat(2) each file, requiring an additional inode lookup, and thus more seeks on the filesystem and device, possibly quite slower if it's a rotating disk and cache wasn't kept. This stat is not required when looking only for name, inode and filetype because the directory entry is enough:




 The linux_dirent structure is declared as follows:

struct linux_dirent
unsigned long d_ino; /* Inode number */
unsigned long d_off; /* Offset to next linux_dirent */
unsigned short d_reclen; /* Length of this linux_dirent */
char d_name[]; /* Filename (null-terminated) */
/* length is actually (d_reclen - 2 -
offsetof(struct linux_dirent, d_name)) */
/*
char pad; // Zero padding byte
char d_type; // File type (only since Linux
// 2.6.4); offset is (d_reclen - 1)
*/




the same informations are available to readdir(3):




struct dirent 
ino_t d_ino; /* Inode number */
off_t d_off; /* Not an offset; see below */
unsigned short d_reclen; /* Length of this record */
unsigned char d_type; /* Type of file; not supported
by all filesystem types */
char d_name[256]; /* Null-terminated filename */
;



Suspected but confirmed by comparing (on a smaller sample...) the two outputs of:



strace -o v1 find many_files -printf '%i %y %pn'>info_file
strace -o v2 find many_files -printf '%i %y %M %pn'>info_file


Which on my Linux amd64 kernel 5.0.x just shows as main difference:



[...]



 getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0
fcntl(5, F_DUPFD_CLOEXEC, 0) = 4
-write(1, "25499894 d many_filesn25502410 f"..., 4096) = 4096
-write(1, "iles/844n25502253 f many_files/8"..., 4096) = 4096
-write(1, "096 f many_files/686n25502095 f "..., 4096) = 4096
-write(1, "es/529n25501938 f many_files/528"..., 4096) = 4096
-write(1, "1 f many_files/371n25501780 f ma"..., 4096) = 4096
-write(1, "/214n25497527 f many_files/213n2"..., 4096) = 4096
-brk(0x55b29a933000) = 0x55b29a933000
+newfstatat(5, "1000", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "999", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "998", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "997", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "996", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "995", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "994", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "993", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "992", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "991", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "990", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0


[...]



+newfstatat(5, "891", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+write(1, "25499894 d drwxr-xr-x many_files"..., 4096) = 4096
+newfstatat(5, "890", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0


[...]






share|improve this answer

























  • Unfortunately, the d_type field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).

    – mosvy
    2 days ago












  • @mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ

    – A.B
    2 days ago












  • Hum actually xfs (CentOS' default) support isn't quite clear...

    – A.B
    2 days ago











  • added how to check if the filetype feature is present on xfs, in case xfs is in use.

    – A.B
    2 days ago











  • I think it's supported on xfs -- when I was making a testcase for a glibc glob(3) that only triggered when the d_type field was absent, I had to use either minixfs or use the GLOB_ALTDIRFUNC.

    – mosvy
    2 days ago













10












10








10







The first version requires only to readdir(3)/getdents(2) the directory, when run on a filesystem supporting this feature (ext4: filetype feature displayed with tune2fs -l /dev/xxx, xfs: ftype=1 displayed with xfs_info /mount/point ...).



The second version in addition also requires to stat(2) each file, requiring an additional inode lookup, and thus more seeks on the filesystem and device, possibly quite slower if it's a rotating disk and cache wasn't kept. This stat is not required when looking only for name, inode and filetype because the directory entry is enough:




 The linux_dirent structure is declared as follows:

struct linux_dirent
unsigned long d_ino; /* Inode number */
unsigned long d_off; /* Offset to next linux_dirent */
unsigned short d_reclen; /* Length of this linux_dirent */
char d_name[]; /* Filename (null-terminated) */
/* length is actually (d_reclen - 2 -
offsetof(struct linux_dirent, d_name)) */
/*
char pad; // Zero padding byte
char d_type; // File type (only since Linux
// 2.6.4); offset is (d_reclen - 1)
*/




the same informations are available to readdir(3):




struct dirent 
ino_t d_ino; /* Inode number */
off_t d_off; /* Not an offset; see below */
unsigned short d_reclen; /* Length of this record */
unsigned char d_type; /* Type of file; not supported
by all filesystem types */
char d_name[256]; /* Null-terminated filename */
;



Suspected but confirmed by comparing (on a smaller sample...) the two outputs of:



strace -o v1 find many_files -printf '%i %y %pn'>info_file
strace -o v2 find many_files -printf '%i %y %M %pn'>info_file


Which on my Linux amd64 kernel 5.0.x just shows as main difference:



[...]



 getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0
fcntl(5, F_DUPFD_CLOEXEC, 0) = 4
-write(1, "25499894 d many_filesn25502410 f"..., 4096) = 4096
-write(1, "iles/844n25502253 f many_files/8"..., 4096) = 4096
-write(1, "096 f many_files/686n25502095 f "..., 4096) = 4096
-write(1, "es/529n25501938 f many_files/528"..., 4096) = 4096
-write(1, "1 f many_files/371n25501780 f ma"..., 4096) = 4096
-write(1, "/214n25497527 f many_files/213n2"..., 4096) = 4096
-brk(0x55b29a933000) = 0x55b29a933000
+newfstatat(5, "1000", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "999", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "998", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "997", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "996", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "995", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "994", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "993", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "992", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "991", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "990", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0


[...]



+newfstatat(5, "891", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+write(1, "25499894 d drwxr-xr-x many_files"..., 4096) = 4096
+newfstatat(5, "890", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0


[...]






share|improve this answer















The first version requires only to readdir(3)/getdents(2) the directory, when run on a filesystem supporting this feature (ext4: filetype feature displayed with tune2fs -l /dev/xxx, xfs: ftype=1 displayed with xfs_info /mount/point ...).



The second version in addition also requires to stat(2) each file, requiring an additional inode lookup, and thus more seeks on the filesystem and device, possibly quite slower if it's a rotating disk and cache wasn't kept. This stat is not required when looking only for name, inode and filetype because the directory entry is enough:




 The linux_dirent structure is declared as follows:

struct linux_dirent
unsigned long d_ino; /* Inode number */
unsigned long d_off; /* Offset to next linux_dirent */
unsigned short d_reclen; /* Length of this linux_dirent */
char d_name[]; /* Filename (null-terminated) */
/* length is actually (d_reclen - 2 -
offsetof(struct linux_dirent, d_name)) */
/*
char pad; // Zero padding byte
char d_type; // File type (only since Linux
// 2.6.4); offset is (d_reclen - 1)
*/




the same informations are available to readdir(3):




struct dirent 
ino_t d_ino; /* Inode number */
off_t d_off; /* Not an offset; see below */
unsigned short d_reclen; /* Length of this record */
unsigned char d_type; /* Type of file; not supported
by all filesystem types */
char d_name[256]; /* Null-terminated filename */
;



Suspected but confirmed by comparing (on a smaller sample...) the two outputs of:



strace -o v1 find many_files -printf '%i %y %pn'>info_file
strace -o v2 find many_files -printf '%i %y %M %pn'>info_file


Which on my Linux amd64 kernel 5.0.x just shows as main difference:



[...]



 getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0
fcntl(5, F_DUPFD_CLOEXEC, 0) = 4
-write(1, "25499894 d many_filesn25502410 f"..., 4096) = 4096
-write(1, "iles/844n25502253 f many_files/8"..., 4096) = 4096
-write(1, "096 f many_files/686n25502095 f "..., 4096) = 4096
-write(1, "es/529n25501938 f many_files/528"..., 4096) = 4096
-write(1, "1 f many_files/371n25501780 f ma"..., 4096) = 4096
-write(1, "/214n25497527 f many_files/213n2"..., 4096) = 4096
-brk(0x55b29a933000) = 0x55b29a933000
+newfstatat(5, "1000", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "999", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "998", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "997", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "996", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "995", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "994", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "993", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "992", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "991", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "990", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0


[...]



+newfstatat(5, "891", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+write(1, "25499894 d drwxr-xr-x many_files"..., 4096) = 4096
+newfstatat(5, "890", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0


[...]







share|improve this answer














share|improve this answer



share|improve this answer








edited 2 days ago

























answered 2 days ago









A.BA.B

6,14711131




6,14711131












  • Unfortunately, the d_type field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).

    – mosvy
    2 days ago












  • @mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ

    – A.B
    2 days ago












  • Hum actually xfs (CentOS' default) support isn't quite clear...

    – A.B
    2 days ago











  • added how to check if the filetype feature is present on xfs, in case xfs is in use.

    – A.B
    2 days ago











  • I think it's supported on xfs -- when I was making a testcase for a glibc glob(3) that only triggered when the d_type field was absent, I had to use either minixfs or use the GLOB_ALTDIRFUNC.

    – mosvy
    2 days ago

















  • Unfortunately, the d_type field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).

    – mosvy
    2 days ago












  • @mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ

    – A.B
    2 days ago












  • Hum actually xfs (CentOS' default) support isn't quite clear...

    – A.B
    2 days ago











  • added how to check if the filetype feature is present on xfs, in case xfs is in use.

    – A.B
    2 days ago











  • I think it's supported on xfs -- when I was making a testcase for a glibc glob(3) that only triggered when the d_type field was absent, I had to use either minixfs or use the GLOB_ALTDIRFUNC.

    – mosvy
    2 days ago
















Unfortunately, the d_type field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).

– mosvy
2 days ago






Unfortunately, the d_type field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).

– mosvy
2 days ago














@mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ

– A.B
2 days ago






@mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ

– A.B
2 days ago














Hum actually xfs (CentOS' default) support isn't quite clear...

– A.B
2 days ago





Hum actually xfs (CentOS' default) support isn't quite clear...

– A.B
2 days ago













added how to check if the filetype feature is present on xfs, in case xfs is in use.

– A.B
2 days ago





added how to check if the filetype feature is present on xfs, in case xfs is in use.

– A.B
2 days ago













I think it's supported on xfs -- when I was making a testcase for a glibc glob(3) that only triggered when the d_type field was absent, I had to use either minixfs or use the GLOB_ALTDIRFUNC.

– mosvy
2 days ago





I think it's supported on xfs -- when I was making a testcase for a glibc glob(3) that only triggered when the d_type field was absent, I had to use either minixfs or use the GLOB_ALTDIRFUNC.

– mosvy
2 days ago

















draft saved

draft discarded
















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f512167%2fhuge-performance-difference-of-the-command-find-with-and-without-using-m-option%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

Circuit construction for execution of conditional statements using least significant bitHow are two different registers being used as “control”?How exactly is the stated composite state of the two registers being produced using the $R_zz$ controlled rotations?Efficiently performing controlled rotations in HHLWould this quantum algorithm implementation work?How to prepare a superposed states of odd integers from $1$ to $sqrtN$?Why is this implementation of the order finding algorithm not working?Circuit construction for Hamiltonian simulationHow can I invert the least significant bit of a certain term of a superposed state?Implementing an oracleImplementing a controlled sum operation

Magento 2 “No Payment Methods” in Admin New OrderHow to integrate Paypal Express Checkout with the Magento APIMagento 1.5 - Sales > Order > edit order and shipping methods disappearAuto Invoice Check/Money Order Payment methodAdd more simple payment methods?Shipping methods not showingWhat should I do to change payment methods if changing the configuration has no effects?1.9 - No Payment Methods showing upMy Payment Methods not Showing for downloadable/virtual product when checkout?Magento2 API to access internal payment methodHow to call an existing payment methods in the registration form?