Posted by & filed under developers.

I needed to list all files in a directory, but ls, find, and os.listdir all hung. This is my story.

NOTE: there is no good reason that you should ever have 8 million files in the same directory, but if you do, this is your solution :-) .

TLDR: Write a C program that calls the syscall getdents directly, with a large buffer size, ignore entries with inode == 0.

Why doesn’t ls work?

ls and practically every other method of listing a directory (including python os.listdir, find .) rely on libc readdir(). However readdir() only reads 32K of directory entries at a time, which means that if you have a lot of files in the same directory (i.e. 500M of directory entries) it is going to take an insanely long time to read all the directory entries, especially on a slow disk. For directories containing a large number of files, you’ll need to dig deeper than tools that rely on readdir(). You will need to use the getdents() syscall directly, rather than helper methods from libc.

How to quickly list a directory with 8 million files

The trick is to understand getdents(), the low level system call that reads directory entries from disk, and returns a directory entry (dirent) data structure.

GETDENTS (filehandle for directory entries, *directory entry pointer, number of bytes to read)

Luckily the man page has a lot of detail, and provides C source code to list all the files in a directory. Read it here and copy the source code to a file, like listdir.c.

There are two modifications you will need to do in order quickly list all the files in a directory. First, increase the buffer size from X to something like 5 megabytes.
#define BUF_SIZE 1024*1024*5

Then modify the main loop where it prints out the information about each file in the directory to skip entries with inode == 0. I did this by adding
if (dp->d_ino != 0) printf(...);

In my case I also really only cared about the file names in the directory so I also rewrote the printf() statement to only print the filename.
if(d->d_ino) printf("%sn ", (char *) d->d_name);

Compile it (it doesn’t need any external libraries, so it’s super simple to do)
gcc listdir.c -o listdir

Now just run
./listdir [directory with insane number of files]

Presto, you should see all the files in your insanely large directory :-) .

In my case I did
./listdir [directory with insane number of files] > output.txt
and then used the contents of output.txt to process the files that were previously “unlistable”.

Why did we run into this problem in the first place?

Without going into too many details, we needed a simple datastore where we could find an entry based on a key, append values to each key, and expire keys that were older than a certain threshold, while performing a cleanup operation on the stored values. The filesystem actually works really well for this cases as long as there aren’t that many keys active at the same time. (i.e. on the file system listing all keys is the slowest operation, so if there aren’t many keys there’s nothing to worry about).

Something got screwed up, and caused the expiration action to fail. So instead of keeping a small number of files in the directory, we started generating a lot of entries that were not being cleaned up. The next time the cleanup operation ran, it could no longer list all the keys in a reasonable amount of time (reading 32K of directory entries at a time). This compounded the problem and lead to a state where new files were being created, but old files were not cleaned up. Soon we had 8 million files in a single directory. We were able to fix the root cause fairly quickly by resolving a bug in our “sweeper” and creating a new cache directory, but we still have 8 million files that needed to be processed. Keep in mind the timescale here is a matter of hours. (Luckily this problem happened on a weekend, was caught fast, and had very little effect on our customers). One important thing to keep in mind, is at this point we had no idea how many files needed to to be processed, all we knew was that ls and os.listdir() were both hanging when trying to list all the files in a directory.

I asked in IRC, and tried searching google without much luck. Other people had run into this problem before, but no one had a good solution. In fact I recall running into a similar problem listing a mailqueue in Qmail many years ago, but I have no idea how we solved it. (If you like solving this sort of problem, we are hiring: jobs@olark.com)

Diagnosing the unknown

Whenever I see something hang without debugging output I turn to strace. strace lets you watch the system calls made during program execution and let’s you see what’s going on when no output is being printed.

I tried
strace find .
strace ls
and
strace python
import os
os.listdir(".")

All of these functions produced basically the same strace output:

First they open the file containing information about a directory
open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5

Then they make the getdents syscall which returns directory entries
getdents(5, /* 586 entries */, 32768)  = 32752

It took me a while to figure this out, but basically the getdents call
http://www.kernel.org/doc/man-pages/online/pages/man2/getdents.2.html

looks like this:
int getdents(unsigned int fd, struct linux_dirent *dirp,                     unsigned int count);

where count is really size of buffer. (in this case 32K)

Of course I didn’t notice this until someone in IRC mention that you could do
ls -dl .
to see the size of the file storing directory entries.

In my directory I got:
drwxr-xr-x 2 root root 537919488 Jul 29 04:55 . (513M)

Putting two and two together I could see that the reason it was taking forever to list the directory was because ls was reading the directory entries file 32K at a time, and the file was 513M.  So it would take around 16416 system calls of getdents() to list the directory.  That is a lot of calls, especially on a slow virtualized disk.

This lead me to the solution above, increase the read buffer size for getdents() to decrease the number of system calls and speed up all the disk access :-) .

Take aways:

1) It is possible to list a directory with 8 million files in it.

2) strace is your friend

3) Don’t be afraid to compile code and modify it (hell, simple C compiles so fast it could be interpreted)

4) There is no good reason to have 8 million files in a directory :-) , but this was a good learning experience (and possibly a good interview question).

Appendix:

After all this I was still a little curious about where the 32K buffer size in ls came from so I downloaded to source to coreutils (contains ls) and poked around. (NOTE: this is mostly just some notes I took, and not a detailed analysis)

Search through the code, and you’ll find that ls called readdir() with a directory pointer.
while (1)
{
/* Set errno to zero so we can distinguish between a readdir failure
and when readdir simply finds that there are no more entries.  */
errno = 0;
next = readdir (dirp);
if (next)

readdir() is defined in libc. So I continued my search and downloaded the source to libc to figure out how the buffer size for readdir() was determined.

getdents() is called inside of readdir() (as expected)
bytes = __GETDENTS (dirp->fd, dirp->data, maxread);
if (bytes <= 0)

And the byte size comes from the size of the directory entry struct, or the dirp->allocation. Given that we were reading multiple entries, I am pretty sure the maxread variable was being set from dirp->allocation.
/* Fixed-size struct; must read one at a time (see below).  */
maxread = sizeof *dp;
#else
maxread = dirp->allocation;
#endif

If you poke around in
sysdeps/unix/dirstream.h
sysdeps/unix/opendir.c

You can see where this value gets set. However, it certainly doesn’t appear to changed based on the size of the directory entry file. (Perhaps the buffer should be dynamically set based on the size of the directory entry file)

Finally, to learn about the trick for skipping deleted files, I noticed a line:
if (dp->d_ino == 0) .. inside of ls, which is used to filter out deleted files from appearing when a directory is listed.

7 Responses to “You can list a directory containing 8 million files! But not with ls..”

  1. George Cooke

    You can use this guys delents.c source code to use your method to list/delete files in one directory quickly: http://serverfault.com/a/328305/117043

    Also I set my ext3 mount options to data=writeback (temporarily) and noatime to improve a high volume deletion (be careful if your / partition), 11million, 4k files in one directory, directory entry 730MB, it has taken 10 hours so far to delete 3.5million (dedicated server, Xeon, 16GB RAM, 2TB SATA etc). So slow :(

    Reply
  2. Sergey

    LOL!

    ls -f > output.txt

    -f disable sorting, so ls doesn’t store all the list in memory.

    Reply
    • Petr Tesařík

      You’re as wrong as you can be. Yes, it will skip sorting, but it will still read the complete directory content before printing out anything (tested with coreutils-8.16).

      Reply
  3. Nick Welch

    I tweaked the code into a recursive version that will descend into subdirectories. It prints out less information; just the file type and the filename.

    Instead of allocating the buffer on the stack (since that will easily exhaust the stack), it allocates another 5MB dynamically each time it descends into a subdirectory.

    Here’s the code: https://gist.github.com/4229933

    Reply

Leave a Reply

  • (will not be published)