cpp:S5000 too broad

The cpp:S5000 rule (“memcmp” should only be called with pointers to trivially copyable types with no padding), I feel, is overly broad, detecting two very different classes of potential problem - one I would agree is a bug, the other I would classify more as a code smell.

The first class is detecting memcmp being called on pointers to types that are NOT trivially copyable. E.g.

struct S1
{
    std::string str;
};

bool is_equal(const S1& a, const S1& b)
{
    return memcmp(&a, &b, sizeof(a)) == 0;    // cpp:s5000 raised - useful
}

The second is on trivially copyable types that have padding. Because trivially copyable types can be default initialised and the padding is required to be zero initialised (meaning that memcmp can safely be called) - this is less useful. E.g.

struct S2
{
    unsigned char a;
    unsigned int b;
};

bool is_equal(const S2& a, const S2& b)
{
    return memcmp(&a, &b, sizeof(a)) == 0;    // cpp:s5000 raised - but this is less useful
}

I think it would make sense for these different cases to be different rules with the latter S2 case being a code smell rather than a bug by default.

At present, I want to keep the rule to detect instances of the first S1 case - but that means manually resolving all the instances of S2 that I consider to be false positives.

Using SonarQube Enterprise v9.9.7

Hi @giles45,

To clarify, trivially constructible types can be either:

  • default initialized (no initializer), where the values or members and padding bits are left uninitialized
  • value initialized (e.g. T()) , that will indeed trigger zero-initialization, and set padding bits to zero
  • aggregate initialization (e.g. T{'a', 10}) that is not required to set padding bits.

So, indeed, it is possible to set padding bits to zero. However, it requires that every variable is value-initialized first, and members to be assigned later. Furthermore, the values of padding are not required to be preserved by the copy operations of such object, and for example:

void foo() {
   S1 s1{}; // zero-initialization, sets padding bits to zero
   S2 s2 = s1; // padding bits has unspecified values
   is_equal(s1, s2); // may return false
   S3 s3{'\0', 0}; // padding bits have unspecified values
}

As a consequence, the user of the memcpy is valid only if the programmer guarantees that the objects passed as arguments to this constructor are never aggregate initialized, copied or passed by value. Such a construct can be easily introduced into the program, and this is why we consider such uses to be bugs in either case.

For the standard reference:

Interestingly there are special requirements for std::atomic<T>::compare_exchange functions to ignore (zero) padding bits, as seen here and here.

Thanks for your reply. To clarify, I wasn’t suggesting that scenario S2 wasn’t a potential issue. I have been stung by that particular problem before - just that it is different from scenario S1 for which it is difficult to think of any occasion when that could be valid.
In our code base, we are dealing with many structures that are or were shared with C code and it is perfectly possible to use memcmp with such structures in ways that are well-defined so my point was that having a different rule would make it possible keep the rule for the former whilst disabling the latter if it was not useful for a particular codebase.

I was also interested in some of your examples.

S2 s2 = s1;

Is it really the case that if S2 is trivially copyable that this not equivalent to a memcpy?

For the question regarding:

S2 s2 = s1;

The standard does not require the padding bit to be copied, only the value of member. This for example allows passing the object by the registers when returned from functions (i.e. Scalar Replacement for the Aggregates):

S create();
S2 s3 = create(); // May pass members via registers
S2 s2 = s1; 

Furthermore, if values of s1 members are already know to be in register, the compiler may just store them in memory for s2, without calling the memcpy. However, I am not sure if the later is performed by compiler.
In both cases, padding bits may not match source, as only actual member values need to kept in register.

Regarding, the split, it is possible (and I would say even more robust) to make a memcmp valid for type that does not have trivial-constructor, just by implementing them by performing memcpy. So as you can see depending on the case, each of them can be see as FP, and different project would want to split on different level.

Finally, you have mentioned that this classes are shared with C files. Are the functions that uses memcmp inside extern "C"?
Because, I think I would agree, that we should consider not raising this rule in C and also by extension functions declared inside extern "C". In C is it normal to copy larger object by using memcpy, that preserve the values of bit fields.

Thanks for the explanation even though it makes me feel quite despondent and wondering how much of our code just happens to work!

Regarding extern “C” functions - it varies, there are some functions that have it and others that don’t. I haven’t checked specifically, but my general impression was that Sonar didn’t change it’s behaviour for extern “C” blocks. But yes, I think it would useful to be able to mark certain types and functions as C compatible not just for this but, e.g., to avoid ‘use std::array’ issues and the like.

I have created a following ticket to capture the problem with the rule that you have raised CPP-5938, and you may use it to track the progress. However, we do not have plans to work on this issue in near future.

2 Likes