2024-04-05
原文作者:文先生的博客 原文地址: http://wenfh2020.com/2020/01/15/redis-sds/

为了节省内存空间,灵活处理不同长度范围的字符串,redis 定义了几种 sdshdr(X) 数据结构,对不同长度的字符串数据进行存储。

1. 数据结构

202404052231164191.png

为了节省内存空间,灵活处理不同长度范围的字符串

  • redis 定义了sdshdr(X)几种数据结构,可以查看数据结构大小
  • 同时 struct 数据结构没有进行内存对齐。
 
    typedef char *sds;
    #define SDS_HDR(T,s) ((struct sdshdr##T *)((s)-(sizeof(struct sdshdr##T))))
    #define SDS_HDR_VAR(T,s) struct sdshdr##T *sh = (void*)((s)-(sizeof(struct sdshdr##T)));
    
    /* Note: sdshdr5 is never used, we just access the flags byte directly.
     * However is here to document the layout of type 5 SDS strings. */
    struct __attribute__ ((__packed__)) sdshdr5 {
        // 当字符串很小时, `flags` 是一个8 个字节的组合字符,前 3 bit 是字符串类型,后面5bit是字符串长度。
        unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
        char buf[];
    };
    struct __attribute__ ((__packed__)) sdshdr8 {
        uint8_t len; /* used */
        uint8_t alloc; /* excluding the header and null terminator */
        unsigned char flags; /* 3 lsb of type, 5 unused bits */
        char buf[];
    };
    struct __attribute__ ((__packed__)) sdshdr16 {
        uint16_t len; /* used */
        uint16_t alloc; /* excluding the header and null terminator */
        unsigned char flags; /* 3 lsb of type, 5 unused bits */
        char buf[];
    };
    struct __attribute__ ((__packed__)) sdshdr32 {
        uint32_t len; /* used */
        uint32_t alloc; /* excluding the header and null terminator */
        unsigned char flags; /* 3 lsb of type, 5 unused bits */
        char buf[];
    };
    struct __attribute__ ((__packed__)) sdshdr64 {
        uint64_t len; /* used */
        uint64_t alloc; /* excluding the header and null terminator */
        unsigned char flags; /* 3 lsb of type, 5 unused bits */
        char buf[];
    };
成员 描述
len 当前已使用的内存空间长度
alloc 分配的内存空间长度
flags 数据结构类型或者(数据结构类型+字符串长度例如:sdshdr5)
buf uf如果有数据,是以‘\0’结束的字符串。

1.1. sds 数据结构视图

202404052231173522.png

制作图表方法可以用 processon,参考视频

1.2. 结构大小

可以通过函数 sdsReqType 知道,sds 数据结构,是根据数据长度范围去确定数据结构类型的。下面列出的数据结构的比较。

结构类型 大小 字符串长度
sdshdr5 1 1« 5-1
sdshdr8 3 1« 8-1
sdshdr16 5 1« 16-1
sdshdr32 9 1« 32-1
sdshdr64 17 大于1« 32
 
    static inline char sdsReqType(size_t string_size) {
        if (string_size < 1<<5)
            // 1 << 5 == 32,所以长度最大 31,二进制 11111,占 5 位。结合数据结构可以查看 flags 的组合,左移 5 位,存储字符串长度,右边3位存储字符串长度。
            return SDS_TYPE_5;
        if (string_size < 1<<8)
            return SDS_TYPE_8;
        if (string_size < 1<<16)
            return SDS_TYPE_16;
    #if (LONG_MAX == LLONG_MAX)
        if (string_size < 1ll<<32)
            return SDS_TYPE_32;
        return SDS_TYPE_64;
    #else
        return SDS_TYPE_32;
    #endif
    }
  • 例如 sdshdr32 数据结构, sizeof(sdshdr32) == 9 ,如果是字节对齐,应该 12 才对。
 
    struct __attribute__((__packed__)) sdshdr32 {
        uint32_t len;        /* used */
        uint32_t alloc;      /* excluding the header and null terminator */
        unsigned char flags; /* 3 lsb of type, 5 unused bits */
        char buf[];
    };

2. 核心接口

sds 主要的逻辑是对字符串内存管理。可以参考下面接口进行理解。

接口 描述
sdsnew 创建字符串对象
sdsfree 释放字符串结构对象
sdsavail 查询字符串对象空闲内存大小
sdsnewlen 根据字符串长度,分配合适的内存空间,设置数据结构的相关的成员数据
sdsMakeRoomFor 为对象分配增长的空间,增长小于1M,newlen*=2,否则newlen+=SDS_MAX_PREALLOC

3. 工作流程

我们依旧可以用 gdb 对 sds 进行调试,熟悉它对工作流程。作者在 sds.c 文件就设置了测试宏SDS_TEST_MAIN,我们可以编译一个文件进行调试。

 
    gcc -g  -DSDS_TEST_MAIN sds.c zmalloc.c -o sds

调试方法,可以参考视频

  • 堆栈信息
 
    #0  sdsnewlen (init=0x100006a71, initlen=3) at sds.c:99
    #1  0x00000001000018a6 in sdsnew (init=0x100006a71 "foo") at sds.c:156
    #2  0x0000000100004cb7 in sdsTest () at sds.c:1130
    #3  0x0000000100006124 in main () at sds.c:1294
  • sdsnewlen 根据字符串长度,用不同数据结构进行存储,每个数据结构有不同类型。
 
    /* Create a new sds string starting from a null terminated C string. */
    sds sdsnew(const char *init) {
        size_t initlen = (init == NULL) ? 0 : strlen(init);
        return sdsnewlen(init, initlen);
    }
  • 根据字符串长度,分配合适的内存空间,设置数据结构的相关的成员数据
 
    /* Create a new sds string with the content specified by the 'init' pointer
     * and 'initlen'.
     * If NULL is used for 'init' the string is initialized with zero bytes.
     * If SDS_NOINIT is used, the buffer is left uninitialized;
     *
     * The string is always null-termined (all the sds strings are, always) so
     * even if you create an sds string with:
     *
     * mystring = sdsnewlen("abc",3);
     *
     * You can print the string with printf() as there is an implicit \0 at the
     * end of the string. However the string is binary safe and can contain
     * \0 characters in the middle, as the length is stored in the sds header. */
    sds sdsnewlen(const void *init, size_t initlen) {
        void *sh;
        sds s;
        char type = sdsReqType(initlen);
        /* Empty strings are usually created in order to append. Use type 8
         * since type 5 is not good at this. */
        if (type == SDS_TYPE_5 && initlen == 0) type = SDS_TYPE_8;
        int hdrlen = sdsHdrSize(type);
        unsigned char *fp; /* flags pointer. */
    
        // 申请数据结构内存。+ 1 是为了字符串的结束符 '\0'。
        sh = s_malloc(hdrlen+initlen+1);
        if (init==SDS_NOINIT)
            init = NULL;
        else if (!init)
            memset(sh, 0, hdrlen+initlen+1);
        if (sh == NULL) return NULL;
        s = (char*)sh+hdrlen;
        fp = ((unsigned char*)s)-1;
        switch(type) {
            case SDS_TYPE_5: {
                // SDS_TYPE_BITS 
                *fp = type | (initlen << SDS_TYPE_BITS);
                break;
            }
            case SDS_TYPE_8: {
                SDS_HDR_VAR(8,s);
                sh->len = initlen;
                sh->alloc = initlen;
                *fp = type;
                break;
            }
            case SDS_TYPE_16: {
                SDS_HDR_VAR(16,s);
                sh->len = initlen;
                sh->alloc = initlen;
                *fp = type;
                break;
            }
            case SDS_TYPE_32: {
                SDS_HDR_VAR(32,s);
                sh->len = initlen;
                sh->alloc = initlen;
                *fp = type;
                break;
            }
            case SDS_TYPE_64: {
                SDS_HDR_VAR(64,s);
                sh->len = initlen;
                sh->alloc = initlen;
                *fp = type;
                break;
            }
        }
        if (initlen && init)
            memcpy(s, init, initlen);
        s[initlen] = '\0';
        return s;
    }
  • 获取字符串长度
 
    #define SDS_TYPE_5_LEN(f) ((f)>>SDS_TYPE_BITS)
    
    static inline size_t sdslen(const sds s) {
        unsigned char flags = s[-1];
        switch(flags&SDS_TYPE_MASK) {
            case SDS_TYPE_5:
                // 一个字节高 5 位是长度,通过向右移动 3 位获得大小。
                return SDS_TYPE_5_LEN(flags);
            case SDS_TYPE_8:
                return SDS_HDR(8,s)->len;
            case SDS_TYPE_16:
                return SDS_HDR(16,s)->len;
            case SDS_TYPE_32:
                return SDS_HDR(32,s)->len;
            case SDS_TYPE_64:
                return SDS_HDR(64,s)->len;
        }
        return 0;
    }
  • 释放内存,因为 sds struct 是一个连续的内存数据结构,根据 sds 指向的 buf,往回找 struct 的起始地址,进行释放。

看看 sdsnewlen 是如何申请内存的。

 
    /* Free an sds string. No operation is performed if 's' is NULL. */
    void sdsfree(sds s) {
        if (s == NULL) return;
        s_free((char*)s-sdsHdrSize(s[-1]));
    }
 
    static inline int sdsHdrSize(char type) {
        switch(type&SDS_TYPE_MASK) {
            case SDS_TYPE_5:
                return sizeof(struct sdshdr5);
            case SDS_TYPE_8:
                return sizeof(struct sdshdr8);
            case SDS_TYPE_16:
                return sizeof(struct sdshdr16);
            case SDS_TYPE_32:
                return sizeof(struct sdshdr32);
            case SDS_TYPE_64:
                return sizeof(struct sdshdr64);
        }
        return 0;
    }
  • 查询数据结构多少空闲内存空间
 
    static inline size_t sdsavail(const sds s) {
        unsigned char flags = s[-1];
        switch(flags&SDS_TYPE_MASK) {
            case SDS_TYPE_5: {
                // 小于 32 长度的内存,都是直接申请的,没有空余内存。
                return 0;
            }
            case SDS_TYPE_8: {
                SDS_HDR_VAR(8,s);
                return sh->alloc - sh->len;
            }
            case SDS_TYPE_16: {
                SDS_HDR_VAR(16,s);
                return sh->alloc - sh->len;
            }
            case SDS_TYPE_32: {
                SDS_HDR_VAR(32,s);
                return sh->alloc - sh->len;
            }
            case SDS_TYPE_64: {
                SDS_HDR_VAR(64,s);
                return sh->alloc - sh->len;
            }
        }
        return 0;
    }
  • 追加内存
 
    /* Append the specified null termianted C string to the sds string 's'.
     *
     * After the call, the passed sds string is no longer valid and all the
     * references must be substituted with the new pointer returned by the call. */
    sds sdscat(sds s, const char *t) {
        return sdscatlen(s, t, strlen(t));
    }
  • redis sds 习惯先根据长度,分配合适的内存,再进行数据拷贝等操作。
 
    /* Append the specified binary-safe string pointed by 't' of 'len' bytes to the
     * end of the specified sds string 's'.
     *
     * After the call, the passed sds string is no longer valid and all the
     * references must be substituted with the new pointer returned by the call. */
    sds sdscatlen(sds s, const void *t, size_t len) {
        size_t curlen = sdslen(s);
    
        // 根据当前数据和追加的数据,分配合适长度的内存资源。
        s = sdsMakeRoomFor(s,len);
        if (s == NULL) return NULL;
        memcpy(s+curlen, t, len);
        sdssetlen(s, curlen+len);
        s[curlen+len] = '\0';
        return s;
    }
  • 根据增长的长度,为 sds 申请合适长度的空间。
 
    /* Enlarge the free space at the end of the sds string so that the caller
     * is sure that after calling this function can overwrite up to addlen
     * bytes after the end of the string, plus one more byte for nul term.
     *
     * Note: this does not change the *length* of the sds string as returned
     * by sdslen(), but only the free buffer space we have. */
    sds sdsMakeRoomFor(sds s, size_t addlen) {
        void *sh, *newsh;
        // 获取剩余的内存
        size_t avail = sdsavail(s);
        size_t len, newlen;
        char type, oldtype = s[-1] & SDS_TYPE_MASK;
        int hdrlen;
    
        /* Return ASAP if there is enough space left. */
        if (avail >= addlen) return s;
    
        len = sdslen(s);
        sh = (char*)s-sdsHdrSize(oldtype);
        newlen = (len+addlen);
        // 小于 1 M 内存的翻倍增加,否则每次增加 1M
        if (newlen < SDS_MAX_PREALLOC)
            newlen *= 2;
        else
            newlen += SDS_MAX_PREALLOC;
    
        type = sdsReqType(newlen);
    
        // 如果小数据,遇到 cat 操作,类型升级到 SDS_TYPE_8,方便 cat 的后续操作。这里作者估计是根据很多场景结合的经验得出的结论。
        /* Don't use type 5: the user is appending to the string and type 5 is
         * not able to remember empty space, so sdsMakeRoomFor() must be called
         * at every appending operation. */
        if (type == SDS_TYPE_5) type = SDS_TYPE_8;
    
        // 根据对应类型的对象申请相应的空间。
        hdrlen = sdsHdrSize(type);
        if (oldtype==type) {
            newsh = s_realloc(sh, hdrlen+newlen+1);
            if (newsh == NULL) return NULL;
            s = (char*)newsh+hdrlen;
        } else {
            /* Since the header size changes, need to move the string forward,
             * and can't use realloc */
            newsh = s_malloc(hdrlen+newlen+1);
            if (newsh == NULL) return NULL;
            memcpy((char*)newsh+hdrlen, s, len+1);
            s_free(sh);
            s = (char*)newsh+hdrlen;
            s[-1] = type;
            sdssetlen(s, len);
        }
        sdssetalloc(s, newlen);
        return s;
    }
  • 空数据结构 sdsempty(),一些不定长的字符串,例如 sdscatprintf,格式化的字符串,经常性有很长的字符串。所以在 sdsnewlen 中给申请 SDS_TYPE_8 类型进行处理。
 
    sds sdsnewlen(const void *init, size_t initlen) {
        if (type == SDS_TYPE_5 && initlen == 0) type = SDS_TYPE_8;
    }
  • 去掉字符串头尾出现在字串的所有字符
 
    /* Remove the part of the string from left and from right composed just of
     * contiguous characters found in 'cset', that is a null terminted C string.
     *
     * After the call, the modified sds string is no longer valid and all the
     * references must be substituted with the new pointer returned by the call.
     *
     * Example:
     *
     * s = sdsnew("AA...AA.a.aa.aHelloWorld     :::");
     * s = sdstrim(s,"Aa. :");
     * printf("%s\n", s);
     *
     * Output will be just "HelloWorld".
     */
    sds sdstrim(sds s, const char *cset) {
        char *start, *end, *sp, *ep;
        size_t len;
    
        // 通过两次遍历,从头向尾,
        sp = start = s;
        ep = end = s+sdslen(s)-1;
        while(sp <= end && strchr(cset, *sp)) sp++;
        while(ep > sp && strchr(cset, *ep)) ep--;
        len = (sp > ep) ? 0 : ((ep-sp)+1);
        if (s != sp) memmove(s, sp, len);
        s[len] = '\0';
        sdssetlen(s,len);
        return s;
    }
  • 两个 sds 字符串比较

取最短字符串的长度,该长度的两个字符串相比较,在这个条件基础上,对余下字符串进行比较。返回相应结果。

 
    /* Compare two sds strings s1 and s2 with memcmp().
     *
     * Return value:
     *
     *     positive if s1 > s2.
     *     negative if s1 < s2.
     *     0 if s1 and s2 are exactly the same binary string.
     *
     * If two strings share exactly the same prefix, but one of the two has
     * additional characters, the longer string is considered to be greater than
     * the smaller one. */
    int sdscmp(const sds s1, const sds s2) {
        size_t l1, l2, minlen;
        int cmp;
    
        l1 = sdslen(s1);
        l2 = sdslen(s2);
        minlen = (l1 < l2) ? l1 : l2;
        cmp = memcmp(s1, s2, minlen);
        if (cmp == 0) return l1 > l2 ? 1 : (l1 < l2 ? -1 : 0);
        return cmp;
    }

4. 后记

源码走读系列,通过调试手段,走读源码,是自己流水账式的记录,从而理解了更多的实现细节。

阅读全文