Exchange Argument in Sequences (Part 1)

Greedy problems are a common category of problems in competitive programming problem sets. In the simplest terms, these problems ask us to optimize a value in some scenario, and turns out by using some small greedy decisions, we can optimize the target value. This is, of course, not always the case. But in many cases, it is. I find greedy solutions to problems to be fun. But not the solving part, rather the proving part (Yes, one should prove why the solution works, at least when practising).

Exchange argument is a general technique of proving greedy algorithms. In these arguments, we prove that some optimal output (which doesn’t match the one produced by our greedy algorithm) can be exchanged one step at a time to an output provided by our algorthm, without making the target value worse. Since each exchange leaves us same or better, our algorithm has produced an output as good as the optimal one.

What this article is about is the application of exchange arguments in the context of sequences. More precisely, I’ll be talking about problems where we are given a list of items and we can shuffle the items however we want. We are also given a scoring function of the shuffled list and we need to find the shuffle or order of items where the score is optimized (i.e. minimized or maximized, whatever the problem wants). There will be some examples where the problem won’t be exactly like this, but we’ll see how those can be handled.

An warning, this article will get a bit formal and talk about logical inference a lot. So put on your mathematician hats and let’s go.

Formal Definitions First

First, let’s define the kind of problems we are handling. Suppose we are given a sequence $S$ of objects (not necessarily distinct) and a scoring function $f$ which provides a score on any permutation of the sequence $S$ (for the remainder of the article, permutation means multiset permutation). Our task is to find the permutation $S^\prime$ such that $f(S^\prime)$ is maximum (or minimum). The actual target is generally irrelevant, formally speaking. So I’ll say that the goal is to optimize the score instead of maximizing or minimizing.

Here’s an example of the problem, the sequence $S$ is $[40, 10, 30, 20]$ and the scoring function is $f(S) = S_1 \times 1 + S_2 \times 2 + S_3 \times 3 + S_4 \times 4$. The goal is to find the permutation producing maximum result. With a bit of playing around, we find the answer is $[10, 20, 30, 40]$.

Now, the set of all the problems following this pattern is huge, let’s call these permutation optimizing problems. Luckily there’s a trick to solve them for some specific cases. The trick is called exchange argument and that is also a vast trick. I will be focusing on a narrowser scope. I don’t know if the following has any known formal name (let me know if there is one). I call the following theorem Adjacent Exchange Theroem.

Adjacent Exchange Theorem: In a permutation maximizing problem as above, suppose we find a relational operator $\preceq$ with the following property: in any two permutations $S_1, S_2$ which have $a, b$ adjacent ($b, a$ in $S_2$) and all other elements matching i.e. of formats $$ S_1 = [\cdots, a, b, \cdots] \text{ and } S_2 = [\cdots, b, a, \cdots]$$ $a \preceq b$ implies $f(S_1) \geq f(S_2)$. Then the sequence $S^\prime$ found by sorting the original sequence using $\preceq$ will produce maximum score. Similarly for minimizing problem, $a \preceq b$ implying $f(S_1) \leq f(S_2)$ will imply $f(S^\prime)$ will attain minimum score.

The theorem is a mouthful. What it basically states is that if we can find an opertaor which always guarantees better permutation when doing swaps according to it, then sorting by it will get us the best permuation. This seems to make sense, if we think about it. Since swapping according to the operator never makes the score worse, from any optimal permutation of the original sequence, we can perform swaps according to $\preceq$ to reach $S^\prime$ and while making the swaps, we will never make the score worse. So $S^\prime$ will definitely be the best.

For example suppose we have $a \preceq b \preceq c$, but an optimal solution is $[b, c, a]$. Then we can swap $c, a$ to get $[b, a, c]$ and score hasn’t gotten worse due to the required property of $\preceq$. Then we can swap $b, a$ to get $[a, b, c]$ and again score hasn’t gotten worse. So $[a, b, c]$ is as good as $[b, c, a]$

Note the theroem does not say it will be the only best sequence, but it will be one of the best sequences. In fact when both $a \preceq b$ and $b \preceq a$, there’s no requirement of one coming before another, so the sorting itself can provide multiple optimal answers. Also there’s no requirement that there is only one unique $\preceq$. One example later will highlight this.

Now, the way to find this $\preceq$ is not really fixed and that is the interesting thing. Let’s see some examples.

Example 1: Score Is Sum Over Position Times Elements

Let’s try the example from previous section. The scoring function is $f(S) = \sum \limits_{i=1}^{len(S)} i \times S_i$, the goal is to maximize this result. How can we find the mysterious $\preceq$? The idea is to write down the requirement first. Suppose two elements are $a, b$ and $a \preceq b$. $S_1$ has $a, b$ at positions $i, i+1$, while $S_2$ has $a, b$ at positions $i+1, i$ (and all other elements are same). We should have $f(S_1) \geq f(S_2)$. Now if we compare $f(S_1)$ and $f(S_2)$, we notice that most of the terms in sums are same except for position $i$ and $i+1$. In fact if we cancel out the common terms, then those two scores have $ai + b(i+1)$ and $bi + a(i+1)$ left. If it is the case that $a \preceq b$, then $f(S_1) \geq f(S_2) \iff ai + b(i+1) \geq bi + a(i+1) \iff b \geq a$. Soooo, $a \preceq b \implies a \leq b$.

You might be celebrating that we have found our illusive $\preceq$. But this is a one way implication, this detail will be important in a later example. For now, we can say that we define $a \preceq b$ as $a \leq b$, because after all, $\preceq$ is an operator we need to find. So if we sort a sequence using $\preceq$, better known as $\leq$, we find an optimal sequence. In fact, for the example in previous section, we had to sort $[40, 10, 30, 20]$ by $\leq$ to find $[10, 20, 30, 40]$, the answer.

By the way, this toy problem is inspried from rearrangement inequality.

Example 2: Score Is Sum Of Deadline Penalty

This example is a part of CSES problem Tasks and Deadlines. This time, we are working with sequence of tasks. Each task is described by two values $t$ and $d$. $t$ is the time duration taken to finish the task (irrespective of when we start it), $d$ is the deadline of that particular task. When we choose a permutation of the tasks, we do the tasks starting from time $0$, one by one. Suppose we finish the $i$th task at $T_i$ (basically sum of all $t$ up to that task), then the penalty of that task is $T_i - d_i$ (can be negative i.e. reward in that case). The score of the permutation is sum of all the penalties. Our goal is to minimize this score.

Ok, let’s try the way of last example. Suppose we have two tasks $(t, d)$ and $(t^\prime, d^\prime)$. In sequence $S_1$, they are at position $i, i+1$ and in sequence $S_2$, they are at position $i+1, i$ (and all other elements are same). Then $(t, d) \preceq (t^\prime, d^\prime)$ would mean $f(S_1) \leq f(S_2)$. And that means $T_1 - d + T_1^\prime - d^\prime \leq T_2^\prime - d^\prime + T_2 - d$ (all other terms cancel out). Here, $T_1, T_1^\prime$ are the end times of the tasks in $S_1$ and similarly $T_2, T_2^\prime$ are the end times of the tasks in $S_2$. Now since $S_1, S_2$ are same before position $i$, we can actually say $T_1 = C + t, T_1^\prime = C + t + t^\prime$ and $T_2^\prime = C + t^\prime, T_2 = C + t^\prime + t$. Here $C$ is the sum of all the time durations of tasks before $i$th position. So $f(S_1) \leq f(S_2)$ means $$C + t - d + C + t + t^\prime - d^\prime \leq C + t^\prime - d^\prime + C + t^\prime + t - d$$

Don’t be afraid of the big equation. Most things cancel out and we get $t \leq t^\prime$. Sooo, $(t, d) \preceq (t^\prime, d^\prime)$ implies $t \leq t^\prime$. Again, this is a one way implication, but our goal is to define the $\preceq$. So we can define it as simply normal order between $t$. So sort by $t$, and we’ll get the optimal sequence.

So far easy examples, let’s make it more difficult.

Example 3: Score Is Number of Inversions Between 0s And 1s

As far as I know, I haven’t seen this problem anywhere. I had made a competitive program problem based on this at Inversions and Goodbye. Anyways, this time, we are given a sequence of binary strings. We can permute them and the score of this permutation is the number of inversions in the string generated by concatenating all these small strings. We have to minimize this score. A reminder that in a list, a pair of indices $(i, j)$ is an inversion if the value at $i$ is greater than value at $j$ while $i < j$.

The problem might seem a bit complicated so, let’s see an example. Suppose the original sequence is $[01, 100, 11]$. In this original state, the score will be the number of inversions in $0110011$, which is 4. The inversions are at pairs $(1, 3), (1, 4), (2, 3), (2, 4)$ ($0$-indexed).

Anyways, the process is same as before. Suppose we have two strings $a, b$. Sequence $S_1, S_2$ are same except for at positions $i, i+1$. $S_1$ has $a, b$ there, while $S_2$ has $b, a$ there. now $a \preceq b$ would mean $f(S_1) \leq f(S_2)$. It’s a bit difficult to reason about the score here due to the conditional nature of inversions. But it can be done. Notice that in $f()$, some inversions come from the element strings themselves. For example in the $100$ string of above example, $2$ inversions will contribute to the final score no matter what. So we can forget about all the inversions coming from element strings themselves. Next, in $f(S_1)$ and $f(S_2)$, both cases consider every other element except $a, b$. Observe that their relative position with respect to positions $i, i+1$ do not change. So those inversion contributions will also not change. The only inversions that matter in comparing $f(S_1), f(S_2)$ are the inversions between $a, b$.

Suppose $a_0, b_0$ are the number of $0$’s and $a_1, b_1$ are the number of $1$’s in $a, b$. Then in $f(S_1)$, the inversions contributed will be $a_1 \times b_0$. In $f(S_2)$, the inversions contributed will be $a_0 \times b_1$. Sooo, $a \preceq b$ means $a_1b_0 \leq a_0b_1$. Ok, this is new, so far all our implications had a very straight forward representations. They were some known relation. This time can we say that $a_1b_0 \leq a_0b_1$ is a well defined ordering operator? This is important. If we try to define $a \preceq b$ as $a_1b_0 \leq a_0b_1$, this needs to be a proper ordering operator. Luckily $a_1b_0 \leq a_0b_1 \implies \frac{a_1}{a_0} \leq \frac{b_1}{b_0}$, so we can define $a \preceq b$ as $\frac{a_1}{a_0} \leq \frac{b_1}{b_0}$, and we can be confident that this will be a proper order.

If you have keen observation, you will have noticed a blatant mistake I made. We can’t say $a_1b_0 \leq a_0b_1 \implies \frac{a_1}{a_0} \leq \frac{b_1}{b_0}$ because it might be that $a_0 = 0$ or $b_0 = 0$ (though one can say the reverse). But we can define $\frac {x}{0} = \infty$, and we see that it works out after all, i.e. both inequalities behave equivalently. Using $\frac{x}{0} = \infty$, we can prove $a_1b0 \leq a_0b1 \iff \frac{a_1}{a_0} \leq \frac{b_1}{b_0}$ as we are only working with non-negative numbers where both numerator, denominator can’t be $0$.

Notice this time, we first came to the relation $a_1b_0 \leq a_0b_1$, then we did some massaging to get another relation $\frac{a_1}{a_0} \leq \frac{b_1}{b_0}$. Which one do we define $a \preceq b$ as? In this case we can actually use both. From the second relation, we’re sure that the definition will work as a proper order. And since it is equivalent to the first relation, we can use that as a definition too. This will not always happen, as you will see later.

Example 4: There Is No Score? (But There Are Weights And Capacities)

Ok, this problem is inspired from a math problem I was shown by a friend long ago. There might be some competitive programming problems based on this out there. If there are let me know. Anyways, this problem isn’t straight forward like previous ones. Suppose we are given $n$ objects, which we need to stack in some order. Each object has a weight $w$ and a capatity $c$. The stacking will be stable if for each object, the total weight above it (i.e. sum of weights of objects over it) is less than or equal to it’s capacity. Is it possible to stack all the objects so that it is stable?

Now, you should be thinking to yourself, this is not an optimization problem, there is no score! But that is the fun part, we will first make this a permutation optimizing problem, and then look into the solution part. Stacking in some order means that we just define a permutation of the objects where the bottom-most object is in first position. And in that order, an object being stable means $W \leq c$ where $c$ is the object’s capacity and $W$ is the sum of weights of objects after it in the sequence. More specifically, we can say we want the expression $c - W$ to be always greater than or equal to $0$ for each object. And, all of the values being greater than or equal to $0$ is same as the minimum of them being greater than or equal to $0$. This means that we want $\min \limits_{1 \leq i \leq len(S)} \left(c_i - \sum \limits_{j=i+1}^n w_j \right)$ to be greater than or equal to $0$. So, we can just target this as the score! i.e. find a sequence that maximizes $$f(S) = \min \limits_{1 \leq i \leq len(S)} \left(c_i - \sum \limits_{j=i+1}^n w_j \right)$$. If the score for optimal sequence is greater than or equal to $0$, we’ve found a stable stack! If not, it’s not possbile.

Phew, that was some complex reasoning, but more are coming, hehe. Anyways, we have a permutation optimizing problem. Now the process is familiar. Suppose $a, b$ are two objects. In seqeuences $S_1, S_2$, all objects are same except for positions $i, i+1$. In first, $a, b$ are at $i, i+1$th positions, in second $a, b$ are at $i+1, i$th positions. Now $a \preceq b$ implies $f(S_1) \geq f(S_2)$ and that means $$\min \limits_{1 \leq i \leq len(S_1)} \left(c_{1, i} - \sum \limits_{j=i+1}^n w_{1, j} \right) \geq \min \limits_{1 \leq i \leq len(S_2)} \left(c_{2, i} - \sum \limits_{j=i+1}^n w_{2, j} \right)$$ I have used $c_{x, y}, w_{x, y}$ to express capacity, weight for $y$th object in $S_x$.

Ok, this time the situation is a bit complex. So far, we could cancel out terms not related to positions $i, i+1$. This time, it’s not so straight forward. In fact, it might happen that due to other objects in $S_1, S_2$, the terms related to position $i, i+1$ don’t even affect the relation. For example, what if $c_{1, 1} = c_{2, 1}$ is too small and rest of the weights are too big? $f(S_1) = c_1 - \sum \limits_{j=2}^n w_j = f(S_2)$ might then happen regardless of $a, b$’s positioning. So, it might be that $a \preceq b$, but $f(S_1) > f(S_2)$ in some cases and $f(S_1) = f(S_2)$ in other cases, depending on $S_1, S_2$. Again, this did not happen in any of the previous problems. And this is where my earlier warning is important. This is an implication, not an equivalence. We don’t have to always exactly decipher $f(S_1) \geq f(S_2)$. In fact, we can do some creative thinking and come up with something like $g(a) \preceq^\prime g(b)$ which also implies $f(S_1) \geq f(S_2)$. Then we can define $a \preceq b$ as $g(a) \preceq^\prime g(b)$. This is the important thing. We don’t have to depend on $f(S_1) \geq f(S_2)$, but it should depend on our target operator! If it just so happens that the dependence goes both ways, that’s just a happy coincidence (which happened previous three times).

Enough talking, Let’s analyze $f(S_1), f(S_2)$. While all the terms inside $\min$ affect the ultimate scores, most of them are same. Only the terms at positions $i, i+1$ are different. So, let’s forget about other terms and consider those terms only i.e. $\min \left(c_a - (w_b + W), c_b - W\right) \geq \min\left(c_b - (w_a + W), c_a - W\right)$. Here $W$ is the sum of weights of all objects after position $i+1$. If this relation is true, $f(S_1) \geq f(S_2)$ will still happen, because those terms we stopped caring about won’t be able to make $f(S_1) < f(S_2)$. At most, they will make $f(S_1) = f(S_2)$. The relation can be turned to $\min \left(c_a - w_b, c_b\right) \geq \min \left(c_b - w_a, c_a\right)$ by subtracting $W$ from all terms. So can we define $a \preceq b$ as $\min \left(c_a - w_b, c_b\right) \geq \min\left(c_b - w_a, c_a\right)$? Not yet, we don’t know if this is a proper operator. After all, our goal is to sort using the defined operator, so it should be well defined as a soring operator (more on this later). Next, note that $c_a - w_b \leq c_a$ and $c_b - w_a \leq c_b$. So if we have $c_a - w_b \geq c_b - w_a$ then that coupled with $c_b \geq c_b - w_a$ will mean $\min(c_a - w_b, c_b) \geq c_b - w_a \geq \min(c_b - w_a, c_b)$. So we can actually target $c_a - w_b \geq c_b - w_a$ which is equivalent to $c_a + w_a \geq c_b + w_b$. And this is a proper operator, since we map each object to a single number and compare them using greater than or equal. Soooo, let’s define $a \preceq b$ as $c_a + w_a \geq c_b + w_b$, this will imply $\min(c_a - w_b, c_b) \geq \min(c_b - w_a, c_a)$, which in turn will imply $f(S_1) \geq f(S_2)$.

Phew, that was a lot of reasoning. But we’ve found our illusive sorting operator. Now you might be scratching your head so far thinking, what did I mean by a well defined sorting operator? Let’s see.

What’s A Well Defined order?

Well, mathematically speaking, to sort by a binary operator it needs to be a total order. Total orders have 4 properties that you can find in the wikipedia page. Let’s list them,

Reflexivity: $a \preceq a$
Transitivity: $a \preceq b, b\preceq c \implies a \preceq c$
Antisymmetry: $a \preceq b, b \preceq a \implies a = b$
Strongly Connectedness: $a \preceq b$ or $b \preceq a$ for all $a, b$.

Only when an order is a total order, can it be used to sort. Antisymetry is not important for us. The reason is trust me bro (actually the reason is we can treat our domain as a set of equivalence classes, but that is another rabbit hole). Reflexivity and strongly connectedness are most of the time trivial. The transitivity is the real issue. Most of the time, reasoning from $f(S_1) \geq f(S_2)$, we can get to a candidate relation $g_1(a, b) \geq g_2(a, b)$ for $a \preceq b$. But if this is not transitive, the sorting will not work. If we’re using some sorting algorithm, the algorithm will sprout out some shuffled up sequence. But without transitivity the adjacent exchange theorem won’t work.

In the previous example, we tried to define $a \preceq b$ as $\min \left(c_a - w_b, c_b\right) \geq \min\left(c_b - w_a, c_a\right)$, would that have been transitive? The answer is a bit complex and left as an exercise to the reader (hehe). But let’s see one last example which will highlight the importance of transitivity.

Example 5: Score Is Number of Inversions According To Product Order

This example is a bit artificial. It’s made only to highlight the requirement of transitivity. We are given $n$ two dimensional points. The score of a permutation of these objects is the number of inversions in it according to product order. In product order, we say $(x_1, y_1) < (x_2, y_2)$ if and only if $x_1 < x_2$ and $y_1 < y_2$. Note, this order is actually not a total order! It’s transitive but not strongly connected. For example, $(2, 5)$ and $(3, 1)$ are not comparable. For inversions to count we need $(x_1, y_1) > (x_2, y_2)$ but the first point comes before the second one in the sequence. We need to minimize the number of inversions.

This problem is similar to example 3. We follow the same process. Assume $a, b$ are two points and in sequences $S_1, S_2$ are elements are same except at positions $i, i+1$. In $S_1$, $a, b$ are at $i, i+1$, in $S_2$, $a, b$ are at $i+1, i$ positions. Now $a \preceq b$ means $f(S_1) \leq f(S_2)$. And like example 3, most of the inversions do not matter (they cancel out). In fact the only thing that is left is $i(a, b) \leq i(b, a)$ where

$$ i(p, q) = i((p_x, p_y), (q_x, q_y)) = \begin{cases} 1 &\quad p_x > q_x \text{ and } p_y > q_y \\ 0 &\quad \text{ else} \\ \end{cases} $$

i.e. just the inversion contributed by $a, b$. An overly eager fellow might just say that we’ll define $a \preceq b$ as $i(a, b) \leq i(b, a)$. But this relation is not transitive. Consider $p = (10, 10), q = (2, 20), r = (5, 5)$. then $i(p, q) = i(q, p) = i(q, r) = i(r, q) = i(r, p) = 0$ and $i(p, r) = 1$. From these values, we’d have $p \preceq q$ and $q \preceq r$ but $p \npreceq r$. So sorting by this operator would not do! What is the problem? If $i(a, b) < i(b, a)$, the it does indeed mean $f(S_1) < f(S_2)$, so keeping $a$ before $b$ meakes sense. But if $i(a, b) = i(b, a)$, we can’t really say $a$ can come berfore $b$. The ties are the problems as they don’t give us information.

So what’s the solution? well a good idea to is to derive something from this not-so-well-defined operation that is possibly better. We know that $i(a, b) < i(b, a)$ gives us good information about the permutation. In that case, $a_x < b_x$ and $a_y < b_y$ and that implies $a_x + a_y < b_x + b_y$. Certainly, $a \preceq b \equiv a_x + a_y \leq b_x + b_y$ is a well defined operator. But we came to it from $f(S_1) < f(S_2)$, can we go in the opposite direction? It turns out, yes we can. If $a_x + a_y \leq b_x + b_y$ then it must be that $i(a, b) \leq i(b, a)$, because otherwise $i(a, b) > i(b, a)$ and that means $a_x + a_y > b_x + b_y$! And $f(S_1) \leq f(S_2)$ is equivalent to $i(a, b) \leq i(b, a)$. So we’re done! $a_x + a_y \leq b_x + b_y$ is the illusive operator we are looking for!

You might be screaming at me now. Just few paragraphs earlier I said, we can’t depend on implications and now I basically use a implication blindly and it magically happend that it went the other way too. Well I can’t blame you, mathematics is like that sometimes. But in this particular case, we knew that our operator needed to imply $i(a, b) \leq i(b, a)$. And $i(a, b) < i(b, a)$ was good to work with, but $i(a, b) = i(b, a)$ didn’t help us. In this kind of particular case, due to the symmetrical nature of the relation, deriving any $g(a) < g(b)$ from $i(a, b) < i(b, a)$ is enough! Why? prove it yourself 😉. In fact we could have used $a_x * a_y \leq b_x * b_y$ too. See, I told you there can be multiple good operators.

(Anti) Example 6: There is no illusive operator

Ok I lied tiny bit. There is one more example. But this one is an anti-example. Some permutation optimizing problems don’t really have a good operator to find. Here’s an example of that. Suppose we are given $n$ numbers and the score of a permutation is defined as $f(S) = \sum \limits_{i=2}^{len(S)} |S_i - S_{i-1}|$. We need to maximize this score. Before you start writing up equations to find the mysterious $\preceq$, let me stop you. There isn’t one. Why? suppose the $n$ numbers are $[1, 1, 1, 10, 10]$. After some trial and error, you will find that there is only one best permutation, that is $[1, 10, 1, 10, 1]$. And here’s the problem, if there was an operator $\preceq$ which resulted in this sequence then that means $1 \preceq 10 \preceq 1$. That’d mean all of our elements are equivalent and any permutation should also produce the optimal answer. But that is of course false. So our theorem can’t help.

By the way, while no such operator can be found, can you still find the common pattern or algorithm to come up with the optimal sequence in this problem? It won’t be sorting by some operator, but there is an algorithm.

Conclusion

Ok, this was a difficult article to write. I had planned this article back in March. But things came up. I started the article, but couldn’t even finish the first section. But recently something incredible happened, my country overthrew a dictator (haha tangential story, I know). And I decided to finish this article. Anyways, hope you learnt something good from this article. The best way to utilize the adjacent exchange theorem is to try to solve problems by it and also figure out why they work, why some relations that come up don’t work etc.

I have a part 2 planned, which focuses on a very specific type of permutation optimizing problem that I have faced many times over my competitive programming career. But that will have to wait. Writing math heavy articles are difficult for me. If you find any new problems that utilize the theorem, let me know!

I try to share things that I have learnt recently, and in the process, I can obviously make mistakes, so if you think you found something wrong feel free to create a issue in the github repository for this blog.