Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Int a = 5; a = a++ + ++a; a =? (2011) (gynvael.coldwind.pl)
131 points by e-topy 22 hours ago | hide | past | favorite | 216 comments
 help



The code in the post seems very similar to the one in my own post from 2010: https://susam.net/sequence-points.html

  int a = 5;
  a += a++ + a++;
I do remember that this particular code snippet (with a = 5, even) used to be popular as an interview question. I found such questions quite annoying because most interviewers who posed them seemed to believe that whatever output they saw with their compiler version was the correct answer. If you tried explaining that the code has undefined behaviour, the reactions generally ranged from mild disagreement to serious confusion. Most of them neither cared about nor understood 'undefined behaviour' or 'sequence points'.

I remember one particular interviewer who, after I explained that this was undefined behaviour and why, listened patiently to me and then explained to me that the correct answer was 17, because the two post-increments leave the variable as 6, so adding 6 twice to the original 5 gives 17.

I am very glad these types of interview questions have become less prevalent these days. They have, right? Right?


IMO, The only reasonable answer if asked this in an interview is “I would not write code where I have to know the answer to this question”

These sorts of things are neat trivia to learn about things like sequence points but 99.9% of the time if it matters in your codebase you're writing something unmaintainable.


> IMO, The only reasonable answer if asked this in an interview is “I would not write code where I have to know the answer to this question”

That's half of a reasonable answer. The other half is "but I do know the answer so if I see it when reviewing or working on someone else's code I can flag it or rewrite it, and explain to them why it is bad".


  > The other half is "but I do know the answer
Except you don't!

If you claim to know the answer you've made a grave mistake and fooled yourself.

If you ran the code in a compiler and used that to conclude "this is the answer" rather than "this is an answer" then now is a great time to learn how easy it is to fool yourself. You just need you ask yourself what assumptions you made. I'll wager you assumed all compilers process this line in the same way.

Or just RTFA, or Susam's, as that's exactly what they are about. They explain why this is undefined behavior.

  | The first principle is that you must not fool yourself — and you are the easiest person to fool.
  - Feynman

> I'll wager you assumed all compilers process this line in the same way

You would lose that wager.

What I mean by "I do know the answer" is that I know that this is undefined behavior and why it is undefined behavior and that different compilers can give different results and also that even if I test the compiler I use to see what it does I can't count on that not changing any time the compiler gets updated.


Fair, but that was not clear to me that that's what you meant. It's clear that others interpreted it the way you intended too (since there are multiple replies saying exactly what you said) but I also don't think I'm the only one who misinterpreted. Sorry that I did.

> Except you don't!

Except you can do, because "The answer is that this isn't a valid C program." is a sentence you can know.


I think you're misinterpreting "I know the answer". The GP is suggesting rewriting it, so the know the issue.

No it isn't. You don't need to know the answer to know that it is bad code. The very fact that it isn't clear shows that.

Right, the feedback I'd expect in a code review interview is something like "This is unclear or wrong, write what you actually meant".

That's the feedback I would want, and it's the feedback I give to my colleagues in reviews. Actually I tend to be too verbose, so you might get a full paragraph explaining what the ISO document says and that you shouldn't assume it does whatever it is your compiler says.

My actual feelings for this specific case are that the language is defective, but if we're wedded to a defective language then the reviews need to call out such usage.


  > Actually I tend to be too verbose, so you might get a full paragraph explaining what the ISO document
I'm verbose too, but I love it when others are. Honestly, it's usually easy to triage (and I write to try to make it easy). I like verbosity because learning why means I not only won't make that mistake again but I won't make any similar mistakes again.

Verbosity isn't bad. Not everything needs to be a fucking tweet


If you know is this code is bad, but don't know that it is UB, I thing you are rating code on feelings and cargo culting.

I mean the good answer is:

I am not sure this code could be interpreted the same by different programmers and compilers alike. So I would never write it.


You might still make a mistake, even if you think you know the answer. It's much better to instrument the code to figure it out, or write a short test program.

It's Undefined Behavior. So you can instrument all you want, the answer will still be wrong. You'll capture what your particular compiler does under some particular conditions (opt flags, surrounding code, etc.) but that will not be representative of what can happen in the general case (hint : anything can happen with UB).

Not nasal demons in this case (https://groups.google.com/g/comp.std.c/c/ycpVKxTZkgw/m/S2hHd...): thaumasiotes shows that we can expect a numeric answer.

I don't see the name "thaumasiotes" at that link, nor do I see anything relevant to the code in the title.

The behavior of "int a = 5; a = a++ + ++a;" is undefined. There is no guarantee of a numeric result, because there is no guarantee of anything.


I believe they were referring to thaumasiotes's thread here: https://news.ycombinator.com/item?id=48141294

I think the objection thaumasiotes has raised there is valid and I have made an attempt to answer it as well in the same thread.


It's only the order of evaluation that is undefined.

No, the behavior is undefined. That means, quoting the ISO C standard, "behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this document imposes no requirements".

A conforming implementation could reject it at compile time, or generate code that traps, or generate code that set a to 137, or, in principle, generate code that reformats your hard drive. Some of these behaviors are unlikely, but none are forbidden by the language standard.


I was wrong.

I was looking at this:

https://en.cppreference.com/cpp/language/eval_order

I'm not sure where precisely this sequencing exception to the default "eval order undefined" rule is given, but after the 24(!) sequencing rules they do give this "++i + i++" as an explicit example of undefined behavior.

Interestingly that page says that since C++17 f(++i, ++i) is "unspecified" rather than "undefined", whatever that means, and presumably plus(++i, i++) would be too, which seems a bit inconsistent.


Nope, there is no sequence point in the middle and modifying an object more than once between sequence points is undefined behavior.

It doesn't matter if the answer is wrong. You run the test program and then replace the code by the answer. This basically weeds out the UB.

That's a valid approach, if you only use high-level language to generate assembly faster, and the assembly is your source of truth.

But since it is a UB, there's no guarantee that your test program produces the same result as the same code running on production, even if you have the same compiler.

That's very unlikely, and in the worst case you've reduced a difficult bug into an easier to understand bug.

> It's Undefined Behavior.

Susam's post doesn't make this clear. The quotes from K&R say that the modifications to the variable may take place in any order, but they don't directly say that doing this is Undefined Behavior, which would make it permissible to do anything, including e.g. interpreting the increments as decrements.

The C99 standard is quoted saying this:

>> Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression.

It's possible that something else in the standard defines noncompliance with this clause as Undefined Behavior. But that's not the most intuitive interpretation; what this seems to say, to me, is that the line of code `a = a++ + ++a` should fail to compile, because it's not in compliance with a requirement of the language. Compilers that produce any result at all are suffering from a bug.

(It seems more likely that the actual intent is to specify that, given the line of code `b = a++ + ++a`, with a initially equal to 5, the compiler is required to ensure that the value stored at the address of a is never equal to 6 - that it begins at 5, and at some indefinite point it becomes 7, but that there is no intermediate stage between them. But I find the 'compiler failure on attempt to put multiple modifications between two sequence points' interpretation preferable.)


The "shall" in the standard means it's undefined behavior. This is explained in the "Conformance" section,

> 2. If a ‘‘shall’’ or ‘‘shall not’’requirement that appears outside of a constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe ‘‘behavior that is undefined’’.

Compilers will not refuse to compile the code, indeed the blog post we are all commenting on reports the results from a bunch of different compilers. Historically the reason the C standard specified a lot of undefined behavior is that the actually existing C compilers at the time compiled the code but disagreed about the output.


> Compilers will not refuse to compile the code, indeed the blog post we are all commenting on reports the results from a bunch of different compilers.

Yes, I see that. I just said they should refuse.


Because this specific UB is static (not usually the case) both gcc and clang will flag it if Wsequence-point is enabled (and it is part of Wall) (technically the clang warning is Wunsequenced but aliased to the GCC version).

edit: apparently Wunsequenced is enabled by default so clang should warn you out of the box.


Compilers are not able to prevent you from violating must/shall in the general case. So they're not held to that bar. Unless the standard says not to compile it, it's not a compiler bug.

Also, imagine a situation where the line of code actually lists three different variables, but all three of them are passed in by address. It quickly becomes impossible for the compiler to know you violated the spec by reusing the same variable. And even optimizations that make sense here could corrupt the value pretty badly and possibly lead to worse errors.


> Also, imagine a situation where the line of code actually lists three different variables, but all three of them are passed in by address. It quickly becomes impossible for the compiler to know you violated the spec by reusing the same variable.

OK. What is the value of a spec to which compliance is impossible?


The compiler does comply to the spec. It's the program that fails to comply with the spec. It's definitely possible to write programs that have no undefined behavior.

The compiler is supposed to compile programs that comply with the spec, and not compile programs that don't.

The concept of "compiling a program that doesn't comply with the spec" doesn't even exist! A text file that doesn't comply with the C spec isn't a C program. That's what it means to be "the spec".


> The compiler is supposed to compile programs that comply with the spec

Yes.

> and not compile programs that don't.

No, there is no limitation on what a compiler does in this case.

> The concept of "compiling a program that doesn't comply with the spec" doesn't even exist!

It does, it is called "undefined behaviour".

> A text file that doesn't comply with the C spec isn't a C program.

That's the point. A program that contains UB is not a valid C program. That's what UB means.


> The concept of "compiling a program that doesn't comply with the spec" doesn't even exist!

Wrong. Lots of spec violations only happen at runtime and can't be predicted at compile time.

Here's an easy example. You're the compiler. I hand you what appears to be valid C code that allocates an array and then asks the user which slot to use. It doesn't verify the slot is in bounds, just puts a number in array[slot], does some math with it, and then prints the result. Does my program comply with the spec? Do you compile it?


This is "Wrong" in the sense that C++ does really work like this, but it's not wrong in the sense that this is somehow unavoidably the case.

For example if you attempt an equivalent mistake in WUFFS that will be rejected.

Your WUFFS compiler will say this variable named slot must be a non-negative integer smaller than the length of the array, but as far as it can tell you didn't ensure that was true, therefore this code is nonsense, do better.

As I explained in my sister reply, in a broader context some of these are semantic properties and so there's a dilemma and C++ chooses to resolve that dilemma by accepting nonsense programs, but that wasn't the only available resolution and I am confident it's the wrong choice.


No, there's a fun C++ talk - by I want to say Chandler Carruth - in which the speaker points out that C++ is a language defined to have false positives for the question "Is this a valid program?"

The mechanism in the ISO document is phrases of the form "Ill-formed, No Diagnostic Required" which is often shortened to IFNDR. Lets break that down. "Ill-formed" means this is not a valid C++ program. On its own that means the compiler should provide a diagnostic (an error messag) explaining that your program isn't valid. For example if the program text were to just consist of the word "fuck" that's ill-formed and will be diagnosed. "No Diagnostic Required" says in this case though, we don't require the compiler to report this problem.

Why do that? So originally there's a purely practical reason, but ultimately there's a philosophical one. C++ like C before it wants to translate many individual program files and then somehow cobble the resulting output into a single executable. So this means function A over here, using type T from a different file cannot know for sure about type T, instead C++ has a thing called the "One Definition Rule" which says you must somewhat define T each time it's needed, but all the definitions must be the same. What if you don't (by mistake or on purpose)? Well that will cause chaos, so, IFNDR.

Philosophically IFNDR is a way to resolve the dilemma from Rice's Theorem. Back in about 1950 this guy named Henry Rice got his PhD for proving that any non-trivial semantic property of a program is Undecidable. This isn't "Oh no, it's quite hard to do this" it's a straight up mathematical proof that it can't be done. Deciding reliably whether a program has any† semantic property isn't possible. Sometimes we're sure, and that's fine, but the dilemma is for the tricky cases: What do we do when we're not sure?

IFNDR is C++ choosing "Fuck it, it's fine" for this case. Maybe your program is nonsense, it might do absolutely anything, but you don't get even a warning from the compiler. This is Chandler's "false positive".

Rust chooses the opposite. When the compiler can't see why your program is sense it will be rejected, even if you and a room full of compiler experts agree it should work too bad, it doesn't compile. You get a diagnostic explaining why your program was rejected.

† Trivial means either all programs have the property or none do and so isn't interesting. As a result the restriction to "non-trivial" properties isn't much help.


> OK. What is the value of a spec to which compliance is impossible?

It lets you tell people you have a spec? It makes it easy for compiler developers to dismiss bug reports with "your code violated the spec"?


Welcome to C.

But more seriously it's the job of the program to not do undefined things.


It's the job of a language designer to define everything.

C should do better about the things that could be readily defined, but there's no way to have arbitrary pointers and define everything.

> but there's no way to have arbitrary pointers and define everything.

What's the undefined behavior in assembly?


Assembly is kind of at the crossroads of everything being defined and nothing being defined, when you consider things like writing random data to memory and executing it... But anyway here's the first thing I found to answer that: https://news.ycombinator.com/item?id=9578178

Probably more important, way too many things in assembly vary by exact model. Can you name a portable language that fits those criteria?


>What is the value of a spec to which compliance is impossible?

Are you saying, what's the value of a language spec that allows undefined behavior, as C does?

Well, it's that it allows for compiler implementations that aren't too hard to implement and maintain.

It allows for a language that's close enough to hardware (and allows you to do programming on a low level), while still offering a reasonable amount of abstraction to be useful (and usable).

It's also difficult to define a formal system that won't have undefined expressions. Mathematics itself is full of them (in logic, "this sentence is a lie" has no truth value; you can't define the set of all sets, or a set of sets that don't contain themselves; etc).

That said, I think we've settled on a rather silly choice here with the "++" operator.

Personally, I'd do away with the ++ operator in either pre- or post- increment forms, or at least disallow it in arithmetic expressions.

The only thing having it realistically accomplished is saving a few characters when writing a for-loop in C.

Even for that it's not necessary.

The problem with it is that, unlike normal arithmetic operators, it both returns a value and assigns one, which means that you can assign values to several variables in a single arithmetic expression, as in

     a = b++;
...which C, in general, allows, as in:

     a = (b = b + 1);
The result of these two expressions, of course, is different.

Now, I have the following religious belief, and it's that arithmetic operators shouldn't have side effects. That's to say, assignment and evaluation should be separate.

So that when I write

     x = (arithmetic);
..I could be sure that the only outcome of this computation is changing the value of x.

Perhaps calling the function sqrt(x) would summon Cthulhu — I'll read the documentation for it to be sure. But in general, I'd hope that calling abs(x) wouldn't change the value of x to |x| in addition to returning it.

But K&R decided to have fun by saying that "x = 5;" is both an assignment and an expression with a value. Which allows one to write:

      x = y = z = 5;
as a parlor trick.

That's it, that's the only utility.

Instead of defining this as a special initialization syntax and otherwise disallowing it (as Pyhthon does), they went YOLO and made assignment an expression rather than a mere statement.

Which means that the very useful statement "increase the value of this variable by one" became two expressions with different values.

In an ideal world, the following would be equivalent, and would not evaluate to anything you can assign to a variable:

     ++x;
     x += 1;
     x = x + 1;
...while "x++" would not exist at all (or would be equivalent to ++x).

And that's how it is in Go. Thompson fixed the design mistake after 4-5 decades of it giving everyone headaches.

Sadly, C++, Java, C# all wanted to be "like C" in basic syntax, so we're stuck with puzzles like this to this day.

TL;DR: if you're asking "what's the value of the spec that makes assignment an expression", i.e. why is making "a = (b = c + d);" valid syntax a good idea, the answer is:

It isn't. It's a bad decision made in 1970s that modern languages like Go no longer support.


Assigning to multiple variables in a single expression is fine and useful. Take

``` target[i++] = source1[j++] + source2[k++]; ``` That's idiomatic, it shows the intent to read and consume the value in a single expression. You can write it longer, but not more clearly.

It's only when you assign to the same variable multiple times, or read it after it was assigned, that it introduces ordering issues.

A single `i++` or `++i`/`i += 1` is safe and useful.


>A single `i++` or `++i`/`i += 1` is safe and useful

Sure, and you don't need the assignment to be an expression with a value for it to be useful.

>target[i++] = source1[j++] + source2[k++]; That's idiomatic

That's idiomatic to C for sure.

Also idiomatically horrible. Why are you using three index variables here?

>You can write it longer, but not more clearly.

    target[i] = source1[i] + source2[i];

    i++;
This is absolutely more clear to any sane person, and less prone to error.

You can't forget to increase one if the indices when all three are meant to go in lockstep.

It's longer by one semicolon, and requires far less cognitive overload to parse.

There's a reason why they did away with it in Go. What do you think that reason was if it's so useful?


> Why are you using three index variables here?

> You can't forget to increase one if the indices when all three are meant to go in lockstep.

Obviously they are not in this example.

The next line might contain:

    i++; j *= 42; k = srandom (k), random ();

> Well, it's that it allows for compiler implementations that aren't too hard to implement and maintain.

> It allows for a language that's close enough to hardware (and allows you to do programming on a low level), while still offering a reasonable amount of abstraction to be useful (and usable).

I can see the first of these. The second appears to be untrue; if you removed the concept of undefined behavior from C, it wouldn't get farther away from the hardware.

Is that first point actually something that somebody wants? Who benefits from the idea that it's easy to write a "standards-compliant" compiler, because you are technically "standards-compliant" whether you comply with the standard or not?

At that point, you've given up on having a standard, and the interviewers Susam calls out, who say that the correct answer is whatever their compiler says it is, are correct in fact. Susam is the one who's wrong, for reading the standard.

You can run a language that way just fine. I had the impression that Perl was defined by a reference implementation. But it's the opposite of having a standard.


>The second appears to be untrue; if you removed the concept of undefined behavior from C, it wouldn't get farther away from the hardware

My understanding is that even common CPU instruction sets can have undefined behavior[1].

When C was written, the CPU architectures were more of a Wild West. It might have made sense to leave some parts up to the compiler authors on a particular architecture.

>Is that first point actually something that somebody wants?

When C was written — absolutely.

Portability of C code is almost taken for granted these days.

Things were different then. Portability was a big challenge.

All that said, this is my non-authoritative understanding of the reasons why it's a thing. Take it with a grain of salt.

>At that point, you've given up on having a standard

Sure. Just treat C as a family of languages which have a common standardized part.

Proprietary compiler extensions are/were common anyway, so that's not an unusual situation.

[1] https://www.os2museum.com/wp/undefined-isnt-unpredictable/


I searched K&R to see if there is any language that implies a += a++ + a++ to be undefined. I couldn't find anything. I found the following excerpt which is closest to what I claim, in spirit. But still, it does not explicitly spell out that an object must not be modified more than once between sequence points. From § A.7 Expressions:

> The precedence and associativity of operators is fully specified, but the order of evaluation of expressions is, with certain exceptions, undefined, even if the subexpressions involve side effects. That is, unless the definition of the operator guarantees that its operands are evaluated in a particular order, the implementation is free to evaluate operands in any order, or even to interleave their evaluation. However, each operator combines the values produced by its operands in a way compatible with the parsing of the expression in which it appears. This rule revokes the previous freedom to reorder expressions with operators that are mathematically commutative and associative, but can fail to be computationally associative. The change affects only floating-point computations near the limits of their accuracy, and situations where overflow is possible.

So I think, the text in K&R serves as warning against writing such code, at best. The C99 draft has more relevant language. From § 4. Conformance:

> If a "shall" or "shall not" requirement that appears outside of a constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words "undefined behavior" or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe "behavior that is undefined".

This along with the § 6.5 excerpt already mentioned in my post implies a += a++ + a++ to be undefined. When I get some more time later, I'll make an update to my post to include the § 4. Conformance language too for completeness.

Thank you for the nice comment!


On one hand I've been using almost the exact statement 25 years ago in my Flash (ecmascript) tutorials to narrow down the point of operator precedence.

I still believe it's a good piece on your powerpoint if you want to teach. It's easy to fall, easy to grasp, and easy to unroll all the rules - that is, if the rules are actually set in stone.

On the other hand I've been through couple FAANG interviews, and twice I was presented with something similar and after I glanced at it for a half a minute the interviewer quickly proceed to "a ha!, you don't know! the interview is over , but I'm happy to tell you the right answer".

That part is not cool.


The answers to some questions must be known in order to be able to write a correct program.

In the vast majority of the programming languages, the order of evaluation for the actual parameters passed to a function is undefined. In the few programming languages where the order of evaluation is defined, that is actually a mistake in the design of that programming language.

This is something about which any programmer must be well aware, because when composing function invocations it is very easy to write a function invocation where the result would depend on the order of evaluation of the expressions passed as actual parameters. The arithmetic operators are also function invocations, so that applies to them too.


> when composing function invocations it is very easy to write a function invocation where the result would depend on the order of evaluation of the expressions passed as actual parameters

this simply means your functions aren't pure functions, and is doing side effects. If you rewrite those functions to not have side effects (including ones being used to generate the parameters), there would be zero issues of such nature.


In some sense, and without the interviewer knowing, that is actually a great scenario for an interview.

If you can convince someone in a position of authority that they’re wrong about something technical without upsetting them then you’re probably a good culture fit and someone who can raise the average effectiveness of your team.


Or, also, in the reverse direction, if the interviewer is wrong about it and can't be convinced otherwise, it's probably not a great place to work.

I know I did recommend someone after the interview because I looked it up and they were right. Great person to work with. Though I fully understand why most would hesitate.

The best interview questions spawn discussions. This is a pretty good one for that. We could dive into what makes it UB, why a particular compiler might do it a certain way, what results we'd likely see from other compilers, and why the standard might say that this sort of thing is UB.

"What does this produce?" and expecting an answer of "17" is a bad question even if UB didn't mean the expected answer is wrong.


I don’t work a ton with C, but I wonder how C programmers keep track of what behavior is and is not defined. It seems like there are many possible edge cases.

They don't. In the culture some kinds of undefined behaviour are taken seriously and some aren't. If you want to write code that "works", you emulate what popular performance benchmarks do (whether their code is undefined according to the standard or not), since those are the thing that C compiler developers actually care about.

Personally, when ever I write a modifying statement, I wonder about the domains of the input and ensure, that the condition necessary to stay in the existing range is evaluated. If it is not, I either write the condition, reduce the input domain, or increase the output domain.

We get by on a combination of matching patterns (any pointer cast gets a lot of scrutiny, for example), compiler warnings, tools like UBSan, debugging when things go wrong, and sheer dumb luck.

Having an understanding of how the code gets transformed into machine code helps. For this case, there's the basic idea that `a++` will boil down to three basic conceptual operations: fetch, add, and store, and those can be potentially interleaved with other parts of the statement. In something like `a++ + ++b` the interleaving doesn't affect the outcome no matter how it's done. In `a++ + ++b` the interleaving can affect the outcome, and that's your sign that something might be wrong.

Any memory safety issue in C code had to involve UB at some point. And you can see how prevalent those are, and deduce how not-particularly-great we are at keeping track of UB.


> Having an understanding of how the code gets transformed into machine code helps

I'm not sure about that. Knowing assembly is not a substitute for knowing how the language is defined. Sometimes C/C++ programmers with some assembly knowledge reason themselves into thinking that what they're asking of the language must have well-defined behaviour, when in fact it's undefined behaviour. It doesn't really matter whether interleaving order can change the output. (++i)++ is, apparently [0], undefined behaviour in C but has well defined behaviour in C++.

[0] https://stackoverflow.com/a/58841107


I don't mean assembly in this case, but something more like the compiler's view of the code. a++ can be broken down into more primitive operations, and might actually be, depending on how the compiler is implemented. The fact that the ordering of those more primitive operations with respect to other operations isn't very tightly constrained is something you'd just have to know about the language, I suppose.

They don't really. In fact there are many things that are technically UB but are so common that compilers can't really treat them as UB. E.g. type punning via unions.

Type punning via unions is not UB in C in general, but it is in C++ IIRC.

I write "in general" because, as with other forms of memory reinterpretation (memcpy or copy through a character type), evaluating a trap representation triggers UB.


The short version is that it's fine in C++ as long as you only read the member that was last written to or a char type.

And a slightly longer version is, that there are three types involved: the type of access, the effective type of the object[0], and the type of the variable. The type of the variable is only for the compiler to emit warnings, as long as the effective type and the type of access are equal, it isn't UB.

[0] the C meaning of an object, not the C++ one


Yeah, undefined behavior just means not defined in the specification.

I would argue that most languages only have one compiler so it doesn't matter what is in the specification.


Do you want a job at a place where someone who doesn't understand UB makes the hiring decisions?

Sometimes, even in tech, you just need a job.

In the land of the blind, the one eyed man is King.

I think your options are very limited if you look for places that have people that truly understand UB, even less so the hiring people.

Genuinely curious, so this is undefined behavior and depends on the compiler. I get that. Java, and other languages, can do these same operations but their compilers produce bytecode that runs on a virtual machine (JVM) compiled to machine code just-in-time. Would this same code in Java possibly yield different results based on the platform the JVM was running on because of the platform specific JIT compiler? Maybe that's part of the origin of the phrase "write once, test everywhere".

The UB comes from how C++ standard defines expression sequencing which is not relevant for Java. Languages other than C++ typically define such details more strictly so there is no UB or even concept of UB. JIT compilers don't change it as any non toy JIT will generate native instructions directly or through intermediate representation (instead of generating C++ text and passing that through regular C++ compiler) both of which should have much stricter semantics compared to what C++ guarantees.

> Would this same code in Java possibly yield different results based on the platform the JVM was running on because of the platform specific JIT compiler?

No, and it's also well defined in languages like C#.

If we're talking about this specific example at least. No sequence point issues like that in Java.


I'd be badly surprised if the jvm jit went through C, so if this monstrosity is well defined in Java it's well defined once well defined everywhere.

but still, if it were, it was and remained, as gp points out, bad practice...


It's been quite a while, but IIRC, in Java these statements actually do have a defined behavior.

The ++x is a "pre-increment", meaning the value of the variable is incremented prior to evaluating the expression, while the "post-increment" "x++" is the other way around: the expression evaluates to x, then x is incremented afterwards.

All expressions are left-to-right.


That behavior is inherited from C. The pre/post increment behavior is actually the same in every language that uses them. The priority of operation is also usually the same as well.

The reason the question is tricky is because those operators change the value of a as the full expression is progressively executed.

It's not immediately clear to me what the answer in Java would be.

Just take a++ + ++a for example:

If the value if `a` is hoisted by the jvm then it could be 5++ + ++5, so 5 + 6.

But if it's executed left to right and `a` is looked up every time, then it becomes 5++ + ++6, so 5 + 7.


The value of the variable is not hoisted by the Java compiler. (It's not that JVM, that only executes the byte code, what y doesn't have that kind of ambiguities.)

The semantics of Java is not undefined on multiple assignments to the same variable in an expression, so it can't hoist something if it would change the outcome.

Now, I don't actually know what the outcome is, because I don't remember whether `a += e` reads the value of `a` before or after evaluating `e`. The code is still confusing and unreadable to humans, so you shouldn't write it, but the compiler behavior is not undefined.

And if your variable is accessed from multiple threads, it may be undefined which intermediate values night be seen.


    $ cat a.java 
    class a {
        public static void main (String[] args) {
            int a = 4;
            int b = a++ + ++a;
            System.out.println(b);
            System.out.println(a);
        }
    }

    $ javac a.java 
    $ java a
    10
    6

> I found such questions quite annoying because most interviewers who posed them seemed to believe that whatever output they saw with their compiler version was the correct answer.

Other than the job for most programmers having nothing to do with whether they know the outcome, because hopefully they'd never write something like it or clean it up. And IF they found it they'd hopefully test it - given that it appears to be compiler dependent anyways.


Both major compilers yell at you for this nowadays... it's pretty unforgivable IMHO for somebody to be asking it as an exam or interview question if the right answer isn't "undefined":

    <source>:5:10: warning: multiple unsequenced modifications to 'a' [-Wunsequenced]
        5 |     a = a++ + ++a;
          |         


    <source>:5:7: warning: operation on 'a' may be undefined [-Wsequence-point]
        5 |     a = a++ + ++a;
          |     ~~^~~~~~~~~~~

How many tennis balls can fit in a bus?

Under what pressure?

obviously under the maximum allowable pressure that each surface of the bus can withstand.

> I am very glad these types of interview questions have become less prevalent these days. They have, right? Right?

Are you referring to the type of interview questions where the question is ill-defined and no one should know the answer, or the type where the question is reasonable and well-defined, but the interviewer doesn't know the answer?

I had a phone screen with Google once where they asked how to determine the length of a stretch of contiguous 1s within an infinite array of 0s. I suggested that, given the starting index i, you can check the index i+2 and then repeatedly square it until you find yourself among the zeroes, after which you can do binary search to find the transition from ones to zeroes.

The interviewer objected that this will grow the candidate end index too quickly, and the correct thing to do is to check index i+1 and then successively double it until you find the zeroes. We moved on.

I passed that phone screen. But I still resent it, because I checked the math later and "successive squaring followed by binary search" and "successive doubling followed by binary search" take exactly the same amount of time.


I meant the latter. I think the question is fine. It can lead to a good discussion, similar to what we are having in this thread. It has been a long time (almost 20 years), but I remember that most interviewers who asked this seemed to be convinced that the output they had seen with their compiler version was the correct answer. What could be a nice and relevant discussion, especially considering that some classes of bugs and security issues result from it, was seen only as a trivia quiz by the interviewers, with the expectation of an answer that was incorrect, no less.

Your phone screen story is quite nice. When I read your question, I would have answered with successive doubling as well. In fact, I faced the same question at an AWS interview a long time ago. The question was mathematically the same question but formulated differently. I answered with the doubling solution too, which leads to an O(log n) time solution, asymptotically. Your interviewer's immediate objection to your squaring solution seems like a major failure in their intuition. When I read your solution, purely by intuition, that is, without resorting to any rigorous reasoning, I felt: wow, that's interesting, your solution would land on the zero region in merely O(log log n) time. Why didn't I think of it? I think your solution should spark interest rather than dismissal in a curious person. Of course, the binary search after that to find the exact transition point blows up the time consumed back to O(log n).

Once again, thanks for these really interesting comments!


From first principles, it seems unlikely that interviewers selecting their own questions would be able to eliminate this class of question, since by definition they cannot know whether the answer they believe is correct really is correct or not.

I would be 100% behind a movement to replace interviewer freedom with externally-set, vetted questions.


Heh, one time when I got this style of question[1] (but for JavaScript), I took a glance at it and said "Um ... you really shouldn't write code like that." The interviewer replied, "Oh. Yeah. Fair point." And then went on to another question.

[1] By which I mean predicting the behavior of error-prone code that requires good knowledge of all the quirks of the language to correctly answer.


The interviewer asking stuff like that is a good sign to leave immediately.

Maybe the interviewer seeks to hear something like "This is UD, this code needs to be rewritten, should not pass code review. What prevents you from using -Wall when compiling?"

>I am very glad these types of interview questions have become less prevalent these days. They have, right? Right?

I just refuse to do interviews like that any more.


Well... tried it on macOS using vanilla gcc, the results surprised me:

  $ /bin/cat x.c; gcc -w -o x x.c; ./x
  #include <stdio.h>
  
  int main()
  {
      int a = 5;
      a += a++ + a++;
      printf("a = %d\n", a);
  }
  a = 18
Not what I expected. This must be how it works:

- The first a++ expression results in 5, after a = 6 - The second a++ expression results in 6, after a = 7 - Only then the LHS a is evaluated for the addition-assignment, so we get: a = 7 + 5 + 6 = 18


the original question has a=, you have a+=

They're using the version from the top comment, not the post. It also switches the pre-increment to post-increment.

I was just replying to the comment ¯\_(ツ)_/¯

What's the reason that C didn't define the order of this?

The horrible undefined behavior of signed integer overflow at least can be explained by the fact that multiple CPU architectures handling those differently existed (though the fact that C even 'attracts' its ill-defined signed integers when you're using unsigned ones by returning a signed int when left shifting an uint16_t by an uint16_t for example is not as forgivable imho)

But this here is something that could be completely defined at the language level, there's nothing CPU dependent here, they could have simply stated in the language specification that e.g. the order of execution of statements is from left to right (and/or other rules like post increment happens after the full statement is finished for example, my point is not whether the rule I type here is complete enough or not but that the language designers could have made it completely defined).


The short answer is because C was designed to give leeway to really dumb compilers on really diverse hardware.

This isn't quite the same case, but it's a good illustration of the effect: on gcc, if you have an expression f(a(), b()), the order that a and b get evaluated is [1] dependent on the architecture and calling-convention of f. If the calling convention wants you to push arguments from right to left, then b is evaluated first; otherwise, a is evaluated first. If you evaluate arguments in the right order, then after calling the function, you can immediately push the argument on the stack; in the wrong order, the result is now a live variable that needs to be carried over another function call, which is a couple more instructions. I don't have a specific example for increment/decrement instructions, but considering extremely register-poor machines and hardware instruction support for increment/decrement addressing modes, it's not hard to imagine that there are similar cases where forcing the compiler to insert the increment at the 'wrong' point is similarly expensive.

Now, with modern compilers using cross-architecture IRs as their main avenue of optimization, the benefit from this kind of flexibility is very limited, especially since the penalties on modern architectures for the 'wrong' order of things can be reduced to nothing with a bit more cleverness. But compiler developers tend to be loath to change observable behavior, and the standards committee unwilling to mandate that compiler developers have to modify their code, so the fact that some compilers have chosen to implement it in different manners means it's going to remain that way essentially forever. If you were making a new language from scratch, you could easily mandate a particular order of evaluation, and I imagine that every new language in the past several decades has in fact done that.

[1] Or at least was 20 years ago, when I was asked to look into this. GCC may have changed since then.


gcc used to do this back in the day. Parameter expressions left to right on x86, and right to left on Sparc. I spent a week modifying a bunch of source code, removing expressions with side effects from parameter lists, into my own temporary variables, so that they would all evaluate in the same order.

I'd say it's more like C was designed from really dumb compilers on really diverse hardware. The standard, at least the early versions of it, was more to codify what was out there than to declare what was correct. For most things like this in the standard, you can point to two pre-standardization compilers that did it differently.

Kind of both? There were pre-standard compilers, but when they created the standard, they tried to make it so that one could write really dumb compilers and still fulfill the standard.

I suspect it was also the case that if they didn't make it easy for platform vendors to implement compilers then they wouldn't do it.

Sethi-Ullman register allocation reorders subexpression evaluation to achieve efficient register allocation: https://dl.acm.org/doi/10.1145/321607.321620

With modern register allocators and larger register sets, code generation impact from following source evaluation is of course lower than it used to be. Some CPUs can even involve stack slots in register renaming: https://www.agner.org/forum/viewtopic.php?t=41

On the other hand, even modern Scheme leaves evaluation order undefined. It's not just a C issue.


Applying the increment or decrement operators over the same variable more than once on the same line should be a compile-time error.

Anyway, yes, this one example has an obvious order it should be applied. But still, something like it shouldn't be allowed.


> Applying the increment or decrement operators over the same variable more than once on the same line should be a compile-time error

That would be nice, but don't forget the more general case of pointers and aliasing:

    int a = 5;
    int *pa = &a;
    printf("%d", (a++ + ++*pa));
The compiler cannot statically catch every possible instance of a statement where a variable is updated more than once.

Well, aliased updates are undefined behavior already.

Not in C, unless at least one of the pointers were marked `restrict`.

Honestly, having increment in expressions rather than a statement feels like more of a footgun than a benefit. Expressions shouldn't mutate things.

I think the history of this is that these operations were common with assembly programmers, so when C came along, these were included in the language to allow these developers to feel they weren't leaving lots of performance behind.

Look at the addressing modes for the PDP-11 in https://en.wikipedia.org/wiki/PDP-11_architecture and you'll see you can write (R0)+ to read the contents of the location pointed to by R0, and then increment R0 afterwards (so a post increment).

Back in the day, compilers were simple and optimisations weren't that common, so folding two statements into one and working out that there were no dependencies would have been tough with single pass compilers.

You could argue that without such instructions, C wouldn't have been embraced quite so enthusiastically for systems programming, and the world would have looked rather different.


Additionally, those indirect memory instructions ended up disappearing because it complicated virtual memory implementations. It was a pain in the ass to describe the multiple places in memory an instruction could be accessing and which actually faulted to a fault handler, not to mention having to roll back all that state on more complex designs.

I worked on a more recent custom AI ISA that had that too. Pretty neat; I'm surprised it's not more common. I guess it doesn't matter so much now that memory is so much slower than ALU ops.

Python recently went the other way and added an assignment expression. I actually wish more languages would go further and add statement expressions instead of having to imitate them with IIFEs.

C just wouldn't be C without things like a[i++]


If the past few weeks of CVEs indicate anything, it's that C being C maybe isn't a good thing...

Those things are for pointer golf and writing your entire logic inside the if statement.

Both are favorite idioms of C developers. And they are ok if done correctly, clearer than the alternative. They are also unnecessary in modern languages, so those shouldn't copy it (yeah, Python specifically).


int d = foo ? bar() : baz();

I think if anything people have been leaning more and more into expressions over statements, because when everything is an expression you end up being able to walk the gradient of complexity a bit more nicely than when you end up with a thing that just has to be broken down to a bunch of statements.


Expressions are nice specifically because they don't tend to mutate things. The ternary operator is not at all the same as `a++` because you have the assign the result.

In any language where the practice of iteration isn't achieved via C-style for-loops, having an operator devoted to increment just doesn't make sense (let alone four operators, for each of pre/post-increment/decrement). This is one of those backwards things that just needs to be chucked in the bin for any language developed post-2010.

When used well it makes for compact readable code. I don't see what it has to do with for loops or operators specifically. For example you can do the same in scheme while iterating by means of tail recursion.

> I don't see what it has to do with for loops or operators specifically.

The reason that these operators pull their weight in C is because iteration over arrays is achieved by manual incrementation (usually via the leading clauses of the for-loop) followed by direct indexing. Languages with a first-class notion of iteration don't directly index in this way, which overwhelmingly eliminates not only the vast majority of array indexing operations in codebases but also the need to manually futz with the inductive loop variable. Case in point, Rust doesn't have `++` in any form, and it doesn't miss it, because Rust has first-class iteration; on the then relatively rare occasion where you want do want to increment, you can do `+=1`, which doesn't have the footguns of `++` due to assignment being a statement rather than expression, while leading to a simpler language due to leveraging the existing `+=` syntax rather than needing a whole new set of operators.


For loops are hardly the only usecase and built in iteration constructs frequently fall short. For example any mildly complex loop that involves pointer juggling can benefit.

> which doesn't have the footguns of `++` due to assignment being a statement rather than expression,

So then I implement the local equivalent of inc( v ) and ... same issue, right? Plus with rust macros is there any technical reason you can't trivially implement ++ for yourself? That's the case for most lisps that I touched on earlier.


> For example any mildly complex loop that involves pointer juggling can benefit.

I'd say that when you're writing a mildly complex loop that involves pointer juggling, one should prefer to be defensive and explicit rather than cleverly trying to compress everything into one-liners.

> So then I implement the local equivalent of inc( v ) and ... same issue, right?

This isn't done in Rust because there's no benefit. It's rare to find an occasion where it's necessary to do something tricky enough to forego using iterators, and when working with raw pointers Rust code just plain doesn't use basic addition for pointer arithmetic; instead it has a variety of pointer arithmetic methods for being explicit about the desired semantics (e.g. ptr::add, ptr::offset, ptr::wrapping_add, etc).

> Plus with rust macros is there any technical reason you can't trivially implement ++ for yourself?

There's not, but people might look at you sideways. Here, I implemented it for you: https://play.rust-lang.org/?version=stable&mode=debug&editio... . It expands to nested blocks with internal assignments, which results in a well-defined semantics following the defined order of evaluation in Rust.


In Rust you hide all kinds of error prone iterations behind the "iterator" interface. Both the "for(int x=0;..." and the "while(list[i++])" are implemented at the standard library.

People tend to use FP abstractions for the "x[i++] = f(y[j++])" though, not iteration.


I always hate C-style for-loops because even thought I learned C over 40 years ago, I can never remember whether the increment comes before the test or the test comes after the increment. Fortunately, modern IDEs let me continue to be ignorant on those occasions when they’re necessary (usually because I need the index for some reason).

Wenn wouldn't have pearls like while (dst++ = src++);

It's valuable for compilers to be able to choose the instruction scheduling order. Standards authors try not to unnecessarily bind implementors. If post increment happened after the full statement is finished, then the original value has to be maintained until the next sequence point. Maybe the compiler will be smart enough to elide that, maybe not, but it's a lot more difficult to fix those kinds of edge cases than to say sequencing is undefined.

But this is not valuable if doing so results in different numerical results, and I think that will always happen if ++ is executed at different times, there's no point in a compiler optimizing pointless code that can silently give different results elsewhere

The same rule which makes the evaluation order of a++ + ++a unsequenced also applies to (x+y+z+a+b+c) where x,y,z could be any expression (in a sane case on separate variables and without mix of pre/post increments). Breaking questionable code and introducing UB where reordering changes result is just a side effect of this.

Just switching between left to right or right to left wouldn't be that useful but it also permits to interleave the subexpression evaluation. Grouping memory fetches/writes, taking into account how many execution units and registers of different kinds a CPU has can have some performance benefits.

For example if you have something like `++a[0] + ++a[1] + ++a[2] + ++a[3]` instead of evaluating each increment one by one both GCC and Clang will vectorize it loading all 4 values from memory using single simd instruction, incrementing and then writing result back to memory. And if you add fifth one (but not 8) which needs to be handled using regular instruction, that will be done after the first 4. If standard defined that left subexpression of addition is fully evaluated before the right expression that wouldn't be allowed.


> If standard defined that left subexpression of addition is fully evaluated before the right expression that wouldn't be allowed.

I'm no expert, but surely this would still be allowed so long as the compiler can prove that incrementing a[0] has no effect on the value of a[1]?


Your compiler does many optimizations that break numerical reproducibility, especially in floats. I reviewed a PR the other day that wrote X=AB+(CD)+E;

And when I checked 3 different compilers, each of them chose a different way to use FMAs.

Even with integer math, you can get different numerical results via UB (e.g. expressions with signed overflow one way and not another).


Floating point reproducibility and cross platform determinism requires strict adherence to the IEEE standard and disabling of fused instructions.

The point being that there are many other places where reproducibility fails to hold, especially when optimizations are involved. The standard doesn't mandate a way to disable contraction, nor the existence (or absence) of -ffast-math and others. They're simply different, legal compilations within the broad scope allowed under the standard.

It would only make a difference in cases that are currently UB, so there is no program valid under current C that would be pessimized by this change.

It's a language feature that was in K&R, and the rules around sequencing were introduced in C89. There were good reasons to believe it would pessimize code in the following decades. Dennis Richie himself pointed out that Thompson probably added the operators because compilers of the time were able to generate better code that way.

The C standard doesn't define things where two or more historical compilers disagreed and there wasn't an obviously correct way. This is defined behavior (left to right, assignment last) in Java, which is a different language.

Probably because when C was standardised there were already multiple implementations, and this was an area where implementations differed but it wasn't viewed as important enough to bring them in line with one approach.

The only other reasonable option is to make such garbage a compile time error. There is no reasonable definition of what code like that should do and if you write it in the real world you need find better job fit. I'd normally say McDonald's is hiring, but they don't want people like that either

> What's the reason that C didn't define the order of this?

The OP article provides experimental details but annoyingly does not give the big picture w.r.t. C language specifications (provided in the links though).

There are three concepts at interplay here which is at the root of the problem; 1) Expressions (evaluates to a single value) 2) Statements (tells computer to perform an action) and 3) Sequence Points (specific moment during execution when all previous side-effects are guaranteed to be complete).

It is the sequence points during the evaluation of expressions which is important to understand here. From https://en.wikipedia.org/wiki/Sequence_point;

In C and C++, a sequence point defines any point in a computer program’s execution at which it is guaranteed that all side effects of previous evaluations will have been performed, and no side effects from subsequent evaluations have yet been performed. They are a core concept for determining the validity of and, if valid, the possible results of expressions...

1) An expression's evaluation can be "sequenced before" the evaluation of another expression. (Equivalently, the other expression's evaluation can be "sequenced after" that of the first.)

2) The expression's evaluation is "indeterminately sequenced", meaning that one is "sequenced before" the other, but which is unspecified.

3) The expression's evaluation is "unsequenced", meaning the operations in each expression may be interleaved.

The "Order of Evaluation" states; (from https://en.cppreference.com/c/language/eval_order)

"Order of evaluation of the operands of any C operator, including the order of evaluation of function arguments in a function-call expression, and the order of evaluation of the subexpressions within any expression is unspecified (except where noted below)."

The "Single Update Rule" states; (from https://www.accellera.org/images/eda/sv-bc/0282.html)

Between consecutive "sequence points" an object's value can be modified only once by an expression. The C language defines the following sequence points:

       Left operand of the logical-AND operator (&&).
       Left operand of the logical-OR operator (||).
       Left operand of the comma operator.
       Function-call operator.
       First operand of the conditional operator.
       The end of a full initialization expression.
       The expression in an expression statement.
       The controlling expression in a selection (if or switch) statement.
       The controlling expression of a while or do statement.
       Each of the three expressions of a for statement.
       The expression in a return statement.
Putting all of the above together in OP's code snippet; The "single update rule" fails for the expression since the variable a is modified multiple times between two consecutive sequence points and hence the result is UB.

For more detailed explanations, see Angelika Langer's Sequence Points and Expression Evaluation in C++ - https://angelikalanger.com/Articles/VSJ/SequencePoints/Seque...


> What's the reason that C didn't define the order of this?

I didn't open TFA but my first thought was "Is this even defined?".

It kinda make sense that suck fucktardedness could be not defined.


It's defined. And called "operator precedence", both post/pre-increment have a higher precedence than the single "+".

At least according to this: https://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B#Exp...

I think the main confusion here comes from the fact that "a" is just a value, not a pointer, where it matters when the value/address which the pointer points at is accessed (before of after the increment of the pointer's own 'value').

Anyway… my C skills are rusty. Maybe I get it wrong. :) In any case I always would use brackets to avoid any ambiguity in constructs like this.


Nope. Order of evaluation and operator precedence are completely unrelated. They should have been defined to be the same, but instead order of evaluation was left undefined. So if you write ++a + a++, operator precedence means this will be interpreted as (++a) + (a++), not say ++(a + a)++, but it is up to the compiler whether to execute ++a or a++ first, rather than executing them left to right.

Sometimes it helps to test. Which I just did. :-)

Actually the compiler (at least clang) warns about this:

    $ gcc -W -Wall test.c -o test
    test.c:8:7: warning: multiple unsequenced modifications to 'a' [-Wunsequenced]
            a = a++ + ++a;
                 ^    ~~
    1 warning generated.
The undefined behaviour stems from the fact that "a" is modified multiple times between the "sequence points" (so it's irrelevant to the actual problem that this happens with ++, --, pre-, or, post-, or in which order) We only can modify the variable safely once on the right side without entering bizarro world.

A construct like this certainly can be confusing.


I was hoping this article would conclude with, “and the C language spec in K&R says THIS which is the correct answer”. Apparently not. So the appendix in K&R is ambiguous? And yet we use ++ so often! I can see people crawling the Linux source tree using LLM-bots looking for bad uses of ++ …

I had to fight through school and university in India with my teachers who believed these were legit questions to ask in written exams. Can't 100% blame them since almost all standard-issue textbooks had them and claimed they'd give predictable output. I thought the same until I noticed the weirdness when running them across different compilers and after I read about UB, sequence points and similar quirks in books that are not total garbage.

Luckily, I ended up with smug smiles in all those cases after showing them the output from different compilers.


But what answer did they expect? A specific number or that it's UB?

> The interesting thing here is the Undefined Behavior (UB), well... actually two UBs, thanks to which there are three possible correct answers: 11, 12 and 13.

No, if you invoke undefined behavior any result at all is possible.


Hey! Author here :)

So let me start by saying that that blog post was written was 15 years ago and I don't even remember the details of it and what I've written there. But, I have a hot-take on this topic you've touched on!

From a programmer perspective, you are absolutely right. The behaviour is undefined, end of discussion. A programmer should never rely on what they observe as the effective behaviour of an UB. A programmer must avoid creating situations in code that could result in the execution flow venturing into the areas of UB. And - per C and C++ standards - results of UB can be anything (insert the old joke about UB formatting one's disk being a formally correct behaviour).

However, I'm a security researcher, and from the security point of view - especially on the offensive side - we need to know and understand the effective behaviours of UBs. This is because basically all "low-level" vulnerabilities in C/C++ are formally effects of UBs. As such, for the security crowd, it still makes sense to investigate, understand, and discuss the actual observed effects of UBs, especially why a compiler does this, what are the real-world actual variants of generated code (if any) for a given UB for this and other compilers, how can this be abused and exploited, and so on.

My point being - there are two sides to this coin.


Agreed.

As a programmer, the solution to "int a = 5; a = a++ + ++a;" is to decide what you result you wanted, and write code that will produce that result, and probably to pass options to the compiler that tell it to detect this kind of problem and print a warning. (On my system, the result happens to be 12; if that's what I want, I'll write "int a = 12;").

But if you have an existing program that includes that code, it can be useful to look into the actual behavior (for all the compilers that might be used to compile the code, with all possible options, on all possible target systems). Fixing the code should be part of that process, but you might still have running systems with the old bad code, and you need to understand the risks.

But producing some numeric result is not the only possible behavior, even in real life. Compilers can assume that the code being compiled does not have undefined behavior, and generate code based on that assumption. The results can be surprising.

As for formatting your disk, that's not just a theoretical risk. If a program has enough privileges that it can format your disk deliberately, it's possible that it could do so accidentally due to undefined behavior (for example, if a function pointer is corrupted).


> My point being - there are two sides to this coin.

No, you're simply wrong. UB means that anything can happen. And from a security perspective, that is vital to understand.

The only proper response to this code (or similar UB due to ambiguous sequence points) if found in production is to rewrite it and fire or reeducate the author.

Sorry, but some people just aren't competent.


Actually, I do think I'm right ;)

There are two layers two this. On the formal, C and C++ standard lawyering layer, UB can have any result. I of course agree with this as per my previous comment.

However, the compilers are an actual implementation, and actual implementations do things in deterministic ways (even if randomness is involved, realistically it is limited to a certain set of outcomes). As such, in case of UBs it's not "anything can happen" - there is actually a limited set of things that can happen.

And I do believe you've missed the "especially on the offensive side" part of my comment. What you are saying about "if found in production is to rewrite it and fire or reeducate the author" is the defensive security perspective, not the offensive security one. From the offensive security perspective you aren't there to fix the code - you are there to exploit it and hack into the system / leak info / raise your privileges.


I feel we need another category - unspecified behavior. I think everyone would agree the compiler should putout ONE of those answers and that nasal demons would be out of spec.

The problem is that it’s not specified which should be picked, but all pick something.


I agree what you say seems reasonable at a glance. But (IIUC) the issue is that for optimization we want the compiler to assume that UB doesn't happen in order to constrain the possible code paths. So if it goes some distance down a possible execution branch and discovers UB it can trim the subtree. At that point "anything can happen" becomes an (approximate) reality.

The obvious counterpoint in this particular instance is that there's no good reason not to make such an awful expression a compile time error.

I also personally think that evaluation order should be strictly defined. I'm unclear if the current arrangement ever offers noticable benefits but it is abundantly clear that it makes the language more difficult to reason about.


As I understand it UB was not really intended to be for optimisation. It was so that C could compile on wildly different architectures that existed at the time.

Today we don't have nearly the variety of architectures, so they in theory C doesn't need nearly as much UB (like more modern languages).

Although there is one modern case where C's "anything goes" attitude has actually helped: CHERI works pretty well with C/C++ even though pointers are double the size they normally are, because doing so many things with pointers is UB (I assume because of segmented memory). CHERI is a slightly awkward target for Rust because Rust makes more assumptions about pointers - specifically that pointers and addresses are the same size.


Which is a form of optimization- if you don’t require something that may be incredibly difficult on a given CPU it makes portability easier.

The reality is these are all edge conditions rarely encountered.


The C and C++ standards include "Implementation defined behavior", which means that a conforming implementation can do whatever it wants, as long as it specifically documents and sticks to that behavior.

This doesn't really help portability all that much.


That's a different category. Standard defines and uses all 3 "undefined", "implementation defined" and "unspecified" behavior. The difference between last two is that compiler isn't required to document exact behavior. Unlike UB triggering it doesn't automatically summon nasal demons and range of possible behaviors is usually described by standard.

That already exists, and it is in fact called unspecified behavior. Order of function argument evaluation is unspecified, for instance.

With undefined behavior, a conforming compiler can do anything it wants at all, including generating a program that segfaults or something else.

But what often happens in practice is that "Bill's Fly-By-Night-C-Compiler-originally-written-in-the-mid-nineties" implemented it in some specific way (probably by accident) and maintains it as a (probably informal) extension. And almost certainly has users who depend on it, and can't migrate for a myriad of reasons. Anyway, it's hard to sell an upgrade when users can't just drop the new compiler in and go.

At the language level, it is undefined-behavior, and any code that relies on it is buggy at the language level, and non-portable.

Defining it would make those compiler non-conforming, instead of just dependent on defining something that is undefined.

Probably the best way forward is to make this an error, instead of defining it in some way. That way you don't get silent changes in behavior.

Undefined behavior allows that to happen at the language level, but good implementations at least try not to break user code without warning.

Modern compilers with things like UBSan and such makes changing the result of undefined behavior much less of an issue. But most UB is also, "No diagnostic required", so users don't even know they have in their code without the modern tools.


> including generating a program that segfaults or something else.

UB = run nethack or Emacs:

https://feross.org/gcc-ownage/

We should have kept this behaviour. It would make UB a lot more unpalatable and easy to find.


Perhaps I'm just naive and/or have forgotten too much C, not that I knew that much, but I'm a bit perplexed as to why this is UB.

It seems like something that should trigger a "we should specify this" reaction when adding these operators, and there is at least one reasonable way to define it which is fairly trivial and easily implementable.


Yeah, like left-to-right as in JS for example.

The final value of a is that if you write this you are fired. It's worse than a racist joke.

The only point you can conclude out of these discussions, especially in an interview, that it doesn't matter what the answer happens to be on $CC and $ARCH but you wouldn't want anyone to write stuff like that in the first place.

Failing to recognize the dangers would be an instant fail; knowing that something reeks of undefined behaviour, or even potential UB, is enough: you just write out explicitly what you want and skip the mind games.


> The interesting thing here is the Undefined Behavior (UB), well... actually two UBs, thanks to which there are three possible correct answers: 11, 12 and 13.

There’s UB, so any answer is possible, isn’t it?


Hey! Author here :)

I'm going top-to-bottom through comments, and there was a similar question, so I'll link my answer here: https://news.ycombinator.com/item?id=48140821 (TL;DR: you are right, but there's another perspective on this)


If a behavior is undefined, the theoretical answer to this could be anything, including -123, 500, or 0. We are just lucky that the compilers choose a more sane version of undefined behavior in practice.

I have always hated this crap; the fact that I'm not 100% sure the result of this indicates that maybe the ++ operator (pre or postfix) is something that should be avoided?

I don't do a lot of C anymore, but even when I did, I always would do increments on separate lines, and I would do a +=1, or just a = a + 1. I never noticed a performance degradation, and I also don't think my code was harder to read. In fact I think it was easier since I think the semantics were less ambiguous.


I also started doing this. I feel that "b = expr(a); a++;" expresses what I mean better than "b = expr(a++)": store expr(a) in b, then store a+1 in a. Any good compiler will optimize the same.

After separating a++ onto its own line, replacing a++ with a+=1 or a=a+1 comes down to personal taste in syntax sugar. I vote for a+=1.


Yeah exactly, especially for newer people.

I wouldn't be surprised if someone read `b = expr(a++)` to indicate that `a` is incremented, and then passed into `expr`, especially considering that it is within parentheses. The fact that it does it after passing it in is weird, and not obvious, at least not in my opinion. In my mind, there's no reason not to do what you suggested, or do the increment of `a` on the line before if you want the prefix.


The statement is valid C#, which has left-to-right execution order and no undefined behavior. The answer is 5 + 7 = 12.

Awk also says it's 12.

  awk 'BEGIN{a=5; a = a++ + ++a; print a}'
  12

https://www.gnu.org/software/gawk/manual/gawk.html#index-pre...

"When side effects happen is implementation-defined. In other words, it is up to the particular version of awk."


Android appears to use the One True AWK.

  :/ $ awk 'BEGIN{a=5; a = a++ + ++a; print a}'
  12

  :/ $ which awk
  /system/bin/awk

  :/ $ awk --version
  awk version 20240728

On my CS lectures algorithms professor used this pseudo language when writing an algorithm on a whiteboard :

  I <- I++
On the next hour another professor was giving lecture on C++ programming. I asked him the question: what would happen if we compiled

  i = i++

He went into some deep elaboration on it, but reassumed that only idiot would write like this...

Out of curiosity, I checked if gcc would optimize i = i++ out, and it does!

What can be optimized out depends on the context.

If you write:

    int i = 0;
    i = i++;
and never use the value of i, the declaration and assignment are likely to be optimized out. (The behavior of the assignment is undefined, so this is a valid choice).

If you print the value of i, the compiler can still optimize away the computation, but is perhaps less likely to do so.

The solution, of course, is not to write code like that. Decide what you want to do, and write code that does that. "i = i++" will never be the answer to "how do I do this?", and wouldn't be even if the behavior were well defined. If you want i to be 1, write "int i = 1;".


My expectation was none of the four presented. Evaluate left to right, a is five, post-increment, pre-increment, a is seven, 5 + 7 = 12. For right to left I would expect pre-increment, a is six, a is still six, post-increment, a is 7, overwrite with 6 + 6 = 12.

There is a possibility that the two increments, as well as the assignment, happen in parallel, and conflict at the bit level, resulting in a value that is neither a + 1, nor a + 2.

(When I say "possibility", I meant that it would not be nonconforming, not that I have in mind a specific implementation where such a result can be reproduced.)

A side-effected object may be modified at most once in one evaluation phase.

But this problem already occurs in something simpler like:

  b = a++ + a
where the problem is that a modified object is observed by a subexpression in the same evaluation phase, but that subexpression is independent of the side effect.

If a is updated in some piecewise, non-atomic way, then it's possible that the right side of the + obtains a half-baked snapshot. Say that a is unsigned and wraps from FF..FF to 00..00, but say this happens byte by byte. The right side of the assignment could access a torn value like FF..00.


For understanding this type of question, I highly recommend the C FAQ compiled by Steve Summit based on Usenet discussions in the comp.lang.c newsgroup.

You should start here:

https://c-faq.com/expr/evalorder2.html

I cannot recommend the C FAQ enough. It is written in an accessible way and contains proper references to textbooks and standards.

Disclosure: I was one of the contributors.


I am, thankfully, out of this craziness now but it was fun solving ton of such puzzles from Yashavant Kanetkar books while preparing for campus hiring interviews back in 2000. "Test Your C Skills" in particular. Fun times.

https://www.scribd.com/document/235004757/Test-Your-C-Skills...


"Test Your C Skills" is a published book by Yashavant Kanetkar, apparently published in 2005, and still available in paperback. The document you linked to appears to be a scan of a printed copy of that book, and is almost certainly in violation of copyright. The cover and the title and copyright pages are notably missing.

> apparently published in 2005

No. It was published in late 90s. As per this copy on Archive.org 1997

https://archive.org/details/testyourcskills00yash


That's worth being aware of, though somewhere around 20 years is where I start caring a lot less about copyright from a moral or practical point of view. Yeah, that book stays out of the public domain until the next century, but it shouldn't.

One of the things I like about Python is the lack of the increment operator.

That's why allowing ++, = and += in expressions is a language design mistake. They should be statements with no possibility of result reuse. The same is for = in branch conditions and loops.

Although Rust doesn't have either ++ increment operator, it does have both the assignment = and the add-assign += and they're both expressions because almost everything in Rust is an expression.

Crucially however these expressions have the unit type () aka the empty tuple as their result, ruling out hard to follow C like int a = 1 + (b = 2 + (c = d * 3));

Similarly the "Oops I wrote = instead of ==" gets so rare as to be negligible when you stop coercing everything to a boolean. In Rust only true is true and only false is false. So when you write if k = 5 rather than checking if k == 5 that's a type error. The expression k == 5 is a boolean, that would be fine, but the expression k = 5 is just () and that's neither true nor false it's just the wrong type.


Some C++ quiz with ++a and a++? It‘s always about sequence points, or better the lack of sequence points.

It‘s the standard technical C++ blog post everybody seems to write.


As a guy who writes compiler the answer for this is it depends on the type of parser you are using.

LL and LR parser generates different derivation, and as such it is deterministically non-deterministic, hence UB.


But you can still change the parser to output the expression in the AST (or otherwise) so it is evaluated left to right or right to left. Just that doing it in a way that is not natural for the algorithm will require extra code.

/sarcastic

This is how to keep simpletons out of your code base. Every numeric constant is defined in terms of a different lang quiz. Works well in JS as well of course.

  const DEFAULT_SELECTION = true + true
  const BASE_PRICE = 4 * parseInt(0.0000001)
  const BILLING_DAY_OF_MONTH = a++ + ++a

Ah the kind of undefined behaviour questions that incompetent Indian professors ask to assert their dominance on poor freshman.

Why do you need Java four times in tests? They are all the same.

The main, I would say, defining, feature of Java is "no of undefined behavior". Aka "write once, run everywhere".


hackernews capitalising Int makes this question kind of confusing. Because the question is meant to be about c++ behaviour but `Int` is not a standard c++ type. But Int is not java because java uses Integer and `Int` is not c# because c# has uses the explicit IntBitsize types. I think maybe the only languages where the title makes sense in is Scala or Swift.

I don’t have gcc available so I can’t test it, but I wonder what it does with

     int a = 5;
     int b = a++;
if it gives b==5 in this circumstance (which I would say is the correct value), then it seems that giving 13 for a++ + ++a is a bug in the compiler. I kind of feel like giving 6 as an answer would also be a bug in the compiler since postfix-++ should return the old value and then increment.

Your code:

    int a = 5;
    int b = a++;
has well defined behavior. The first line initializes a to 5. The second initializes b to 5 and sets a to 6. (The language doesn't specify the order of the two operations of assigning a value to be and incrementing a, but in this case it doesn't matter.)

Giving 13 for a++ + ++a is not a bug in the compiler. It's a bug in the code.

The correct answer to "what does a++ + ++a do" is "it gets rejected in code review and replaced with code that expresses the actual intent.


The trick here is that the original expression contains undefined behavior. Your example does not.

The trait of an experienced C developer is to avoid creating expressions such as this.

I love such puzzles! I used to use a lot ternary operators in C++ but one day friend of mine told me that I shouldn't nest ternary operators too much because code is too complicated to read - he understands code perfectly, he was just worried about younger programmers. Since then I started to use longer versions of code instead of smart shortcuts - to improve readability of code.

++ should be banned, just like goto

Are you volunteering to update all the code that would be broken?

The a=13 was most surprising to me but in retrospect obvious and amusing.

I still don't understand why programmers seemed to get off on this sort of shit. Doing `while((dest++ = src++));` is great and all (maybe fine because it's kinda idiomatic now, but should you really be using that over `strncpy`?), but being clever like that in real code makes it harder to review, and harder to understand months down the line. I've mentally cussed out 'whoever wrote this confusing shit' to only `git blame` myself.

Does anyone know if these are already explained in Eskil Steenberg's dependablec.org?

The smart nerd will know precisely how to decode that line's results.

The wise nerd will not allow lines like it in their codebase, in the first place and, having seen one, will refactor it (probably involving more lines or parentheses) to make it more clear and easier to maintain.

The latter approach scales better, in long run.


This is true. What's also true, is that if that smart name works in cybersec, they'll feel right at home :)

(this is related to my other comment here https://news.ycombinator.com/item?id=48140821)


Tried it on https://www.onlinegdb.com/online_c_compiler Returns 12. If I were designing C, it would return 13. But then again, I'm an assembly programmer.

Godbolt: clang and gcc compilers give 12. msvc compilers yield 13.

    #include <stdio.h>

    int main() {
        int a = 5;
        a = a++ + ++a;
        printf("%d\n", a);
        return 0;
    }
x64 msvc v19.50 VS18.2 output:

    example.c
    ASM generation compiler returned: 0
    example.c
    Execution build compiler returned: 0
    Program returned: 0
    13
x86-64 gcc 16.1 output:

    ASM generation compiler returned: 0
    Execution build compiler returned: 0
    Program returned: 0
    12
armv8-a clang 22.1.0 output:

    <source>:5:10: warning: multiple unsequenced modifications to 'a' [-Wunsequenced]
        5 |     a = a++ + ++a;
          |          ^    ~~
    1 warning generated.
    ASM generation compiler returned: 0
    <source>:5:10: warning: multiple unsequenced modifications to 'a' [-Wunsequenced]
        5 |     a = a++ + ++a;
          |          ^    ~~
    1 warning generated.
    Execution build compiler returned: 0
    Program returned: 0
    12

> If you would like to test your compiler (posting back the results in the comments is really appreciated, especially from strange/uncommon compilers and other languages which support pre- / post- increment ....

Uh, 85% of them show the wrong result so 85% of them clearly do not support pre and post increment.


If the behavior is undefined, there is no wrong result.

hmm surprising. I assumed it would be 12 since 5+5+1+1 doesn't really matter what order you do it in. But I suppose this really undefined behavior.

Please tell me the answer is somehow 42!

int a = 5; a = (++a * a++) + --a; a = ?


Oh god. How long before yet another UB-based question ends up in technical coding interviews?

The nice thing about these is that all answers are correct.

the correct answer is that the program will launch nethack, duh

Haha this comment is spot on :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: