Linux readdir怎样实现目录遍历优化

DIR readdir entry include 432 来源： 2025-05-07

Linux readdir 目录遍历优化方案

基本 readdir 使用

在 Linux 中，readdir 是用于遍历目录内容的常用函数，基本用法如下：

#include <dirent.h>
#include <stdio.h>

void traverse_directory(const char *path) {
    DIR *dir = opendir(path);
    if (!dir) {
        perror("opendir");
        return;
    }

    struct dirent *entry;
    while ((entry = readdir(dir)) != NULL) {
        // 过滤 . 和 ..
        if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0) {
            continue;
        }

        printf("%s\n", entry->d_name);
    }

    closedir(dir);
}

优化方案

1. 使用 readdir_r (线程安全版本)

在多线程环境中，应使用 readdir_r 替代 readdir 以避免竞争条件：

struct dirent entry, *result;
while (readdir_r(dir, &entry, &result) == 0 && result != NULL) {
    // 处理 entry
}

注意：在较新的 glibc 版本中(>= 2.24)，readdir_r 已被弃用，因为 readdir 本身已通过内部锁实现了线程安全。

2. 批量读取优化

getdents 系统调用比 readdir 更底层，可以一次性获取更多目录项：

#include <sys/syscall.h>
#include <linux/dirent.h>

#define BUF_SIZE 1024*1024

void fast_traverse(const char *path) {
    int fd = open(path, O_RDONLY | O_DIRECTORY);
    if (fd < 0) {
        perror("open");
        return;
    }

    char buf[BUF_SIZE];
    while (1) {
        int nread = syscall(SYS_getdents, fd, buf, BUF_SIZE);
        if (nread == -1) {
            perror("getdents");
            break;
        }
        if (nread == 0) break;

        for (int bpos = 0; bpos < nread;) {
            struct linux_dirent *d = (struct linux_dirent *)(buf + bpos);
            char d_type = *(buf + bpos + d->d_reclen - 1);

            // 处理目录项 d->d_name

            bpos += d->d_reclen;
        }
    }
    close(fd);
}

3. 文件类型缓存

利用 dirent 结构中的 d_type 字段（如果支持），避免不必要的 stat 调用：

if (entry->d_type == DT_DIR) {
    // 是目录，递归处理
} else if (entry->d_type == DT_REG) {
    // 是普通文件
}

4. 目录遍历策略优化

广度优先搜索(BFS): 使用队列而非递归实现，减少内存使用
并行遍历: 对子目录使用多线程处理（注意线程数限制）

5. 文件系统特性利用

对于 ext4 等现代文件系统，可以使用 O_DIRECTORY | O_NOATIME 打开目录
考虑使用 posix_fadvise 预读提示

最佳实践示例

#include <stdio.h>
#include <dirent.h>
#include <queue>
#include <string>
#include <thread>
#include <mutex>
#include <condition_variable>

std::mutex mtx;
std::condition_variable cv;
bool done = false;

void worker(std::queue<std::string>& dir_queue) {
    while (true) {
        std::string dir_path;
        {
            std::unique_lock<std::mutex> lock(mtx);
            cv.wait(lock, [&]{ return !dir_queue.empty() || done; });

            if (done && dir_queue.empty()) return;

            dir_path = dir_queue.front();
            dir_queue.pop();
        }

        DIR *dir = opendir(dir_path.c_str());
        if (!dir) continue;

        struct dirent *entry;
        while ((entry = readdir(dir)) != NULL) {
            if (entry->d_name[0] == '.') continue;

            std::string full_path = dir_path + "/" + entry->d_name;

            if (entry->d_type == DT_DIR) {
                std::lock_guard<std::mutex> lock(mtx);
                dir_queue.push(full_path);
                cv.notify_one();
            } else {
                // 处理文件
                printf("%s\n", full_path.c_str());
            }
        }
        closedir(dir);
    }
}

void parallel_traverse(const char *root, int num_threads) {
    std::queue<std::string> dir_queue;
    dir_queue.push(root);

    std::vector<std::thread> threads;
    for (int i = 0; i < num_threads; ++i) {
        threads.emplace_back(worker, std::ref(dir_queue));
    }

    {
        std::lock_guard<std::mutex> lock(mtx);
        done = true;
        cv.notify_all();
    }

    for (auto& t : threads) {
        t.join();
    }
}

性能对比

方法	优点	缺点	适用场景
基本 readdir	简单易用	性能一般	简单目录遍历
getdents	高性能	实现复杂	需要极致性能
多线程	利用多核	同步复杂	深层目录结构
广度优先	内存友好	实现稍复杂	超大目录

选择优化方案时应根据具体场景权衡实现复杂度和性能收益。

上一篇：nginx日志中的“timeout while reading response header from upstream”怎么解决

下一篇：Linux strings命令在安全领域有何用途