## Introduction

Sometimes we’ll encounter DP in forms like: $$dp_i=\mathop{\max/\min}_{j\lt i} \left\{ cost(j + 1, i) \right\}$$ which will usually take quadratic time or more (depends on how fast we can calculate $cost$)to calculate it. However, sometimes the following observations is true: $$H_i=\mathop{\mathop{\arg\max}/\mathop{\arg\min}}_{j\lt i} \left\{ cost(j + 1, i) \right\} \implies H_{i+1} \le H_i\space /\space H_{i+1} \ge H_i$$ i.e. the best transition point is monotone increasing / decreasing, then we usually can reduce the time complexity by a $\log$ factor using divide and conquer. For example, if the original time complexity is $O(N^2)$, then we can reduced it to $O(N\log N)$.

Usually, the monotone property is find either by instinct, print out the DP table, or by Monge condition(described in this post). Actually, divide and conquer optimization is a special case of 1D/1D convex/concave Knuth optimization(cost function doesn’t depends on previous DP values).

The implementation part is given in the first example problem.

## Example problems

### Atcoder ARC067D - Yakiniku Restaurants

#### Problem description

There’re $N$ restaurants along a street. The restaurants are numbered $1$ through $N$ from west to east, and the distance between restaurant $i$ and $i+1$ is $A_i$.

Joisino has $M$ tickets, numbered $1$ to $M$. Every restaurant offers meals in exchange for these tickets. Restaurant $i$ offers a meal of deliciousness $B_{i, j}$ in exchange for ticket $j$. Each ticket can only be used once, but any number of tickets can be used at a restaurant.

Joisino wants to have $M$ barbecue meals by starting from a restaurant of her choice, then repeatedly traveling to another barbecue restaurant and using unused tickets at the restaurant at her current location. Her eventual happiness is calculated by the following formula: “(The total deliciousness of the meals eaten) - (The total distance traveled)”. Find her maximum possible eventual happiness.

$2\le N\le 5\times 10^3, 1\le M\le 200, 1\le N_i\le 10^9, 1\le B_{i, j} \le 10^9$

#### Problem analysis

First, let’s try to calculate the maximum possible eventual happiness if Joisino starts at restaurant $i$ and ends at restaurant $j$. We can observe that Joisino will walk directly from $i$ to $j$ without zigzaging (to minimize total distance traveled). Thus, for every tickets, Joisino will choose the restaurant that have the largest deliciousness of the corresponding meal. So the final happiness (represented by $f(i, j)$) is: $$f(i, j)=\left( \sum_{c=1}^{M} \max_{i\le k\le j} B_{k, c} \right) - \left( \sum_{k=i+1}^{j}A_k \right)$$ which can be calculated in $O(M)$ if we use prefix sums and Sparse table (turorial).

Therefore, we now have a $O(NM^2)$ solution:

• DP state : $dp_i$ represents the maximum happiness if Joisino ends at $i^{th}$ restaurant
• DP transition : $dp_i=\max_{j\le i} f(j, i)$
• Final answer : $\max_{1\le i\le N} dp_i$

However, this is not good enough. To optimize it further, we need an observation: $$H_i=\mathop{\arg\max}_{j\le i} f(i, j)\implies H_i\le H_{i+1}$$ In other words, the best transition point will not decrease as our destination increase, which means that the best transition point have monoticity. Prove is omitted.

Given the observation above, we can optimize it to $O(NM\log N)$ by divide and conquer:

1. Suppose we’re at $(l, r, ql, qr)$, meaning that we want to calculate $dp_i, l\le i\le r$, and those best transition points satisfies $ql\le H_i\le qr$
2. Let $mid=\lfloor \frac{l+r}{2} \rfloor$, calculate $H_{mid}$ by enumerating $j=ql, ql+1, \dots, \min\{qr, mid\}$ and fill $dp_{mid}$.
3. By the observation above, we can split our calculation into $(l, mid-1, ql, H_{mid})$, $(mid+1, r, H_{mid}, qr)$.

As every time we split our interval into half, and our calculation is at most $O(N)$ every level, so we only need $O(N\log N)$ calculations of $f(i, j)$, which implies that our final time complexity is $O(NM\log N)$.

#### Problem solution

Refer to the code below for more details.

code
#include <bits/stdc++.h>
using namespace std;
typedef long long ll;

const int N = 5000 + 5;
const int lgN = 13;
const int M = 200 + 5;
const ll inf = (ll)1e18;

int n, m, B[M][N][lgN];
ll dp[N], A[N];

ll query(int x, int l, int r) {
int k = __lg(r - l + 1);
return max(B[x][l][k], B[x][r - (1 << k) + 1][k]);
}
ll calc(int l, int r) {
ll sum = 0;
for(int i = 1 ; i <= m ; ++i) sum += query(i, l, r);
return sum - (A[r] - A[l]);
}
void DP(int l, int r, int ql, int qr) {
if(l > r) return;
int mid = l + (r - l) / 2;
ll val = -inf, pos = 0;
for(int i = ql ; i <= min(mid, qr) ; ++i) {
ll ret = calc(i, mid);
if(ret > val) val = ret, pos = i;
}
dp[mid] = val;
DP(l, mid - 1, ql, pos);
DP(mid + 1, r, pos, qr);
}

void init() {
cin >> n >> m;
for(int i = 2 ; i <= n ; ++i) cin >> A[i], A[i] += A[i - 1];
for(int i = 1 ; i <= n ; ++i) {
for(int j = 1 ; j <= m ; ++j) cin >> B[j][i][0];
}
for(int k = 1 ; k <= m ; ++k) {
for(int j = 1 ; (1 << j) <= n ; ++j) {
for(int i = 1 ; i + (1 << j) - 1 <= n ; ++i) {
B[k][i][j] = max(B[k][i][j - 1], B[k][i + (1 << (j - 1))][j - 1]);
}
}
}
}
void solve() {
DP(1, n, 1, n);
ll ans = -inf;
for(int i = 1 ; i <= n ; ++i) ans = max(ans, dp[i]);
cout << ans << '\n';
}

int main() {
ios_base::sync_with_stdio(0), cin.tie(0);
init();
solve();
}



### CF321E - Ciel and Gondolas

#### Problem description

There’re $N$ people numbered from $1$ to $N$ and $K$ cars. Today, they want to go out driving those $K$ cars, so they need to split them into $K$ groups. For any two people $i, j, i\neq j$, we know how much they hate each other, which is given as $u_{ij}$. The level of hate of a group $G$ is calculated as following: $$\sum_{i, j\in G, i\lt j} u_{ij}$$ Now, please find a way to split those $N$ peoples into $K$ groups satisfying

• Each part is consisted of people with consecutive indexes
• The total sum of level of hate of each group is minimized

$1\le N\le 4000, 1\le K\le \min(N, 800), 0\le u_{ij}\le 9, u_{ij}=u_{ji}, u_{ii}=0$

#### Problem analysis

Similar to the previous problem, we need to first calculate the level of hate if we group $i, i+1, \dots, j$ together: $$f(i, j)=\sum_{k=i}^{j} \sum_{l=k+1}^{j} u_{kl}=\frac{1}{2}\left( pre_{j, j} - pre_{i - 1, j} - pre_{j, i - 1} + pre_{i-1, i-1}\right)$$ where $pre$ is the two-dimensional prefix sum. Thus, $f(i, j)$ can be calculated in $O(1)$.

Now, let’s try DP on it:

• DP state : $dp_{i, j}$ represents mimimum level of hate when we split people from $1$ to $i$ into $j$ groups
• DP transition : $dp_{i, j}=\min_{0\le k\lt i} \left\{ dp_{k, j - 1} + f(k + 1, i) \right\}$
• Final answer : $dp_{N, K}$

The complexity will be $O(N^2K)$ if we do it directly. However, like the previous problem, the transition point here is also monotone! In other words, $$H_{i, j}=\mathop{\arg\max}_{0\le k\lt i} \left\{ dp_{k, j - 1} + f(k + 1, i) \right\} \implies H_{i, j} \le H_{i+1, j}$$ It can be prove formally, however let’s try to explain it intuitively. The level of hatred is positively correlated (in some sense) to the number of people in the group. Therefore, if $H_{i, j}$ is the best transition point for $dp_{i, j}$, this means that all transition points before $H_{i, j}$ have higher cost then $H_{i, j}$. As the level of hatred is positively correlated to the number of people in the group, it’s impossible for them to become the best transition point for $H_{i+1, j}$ ($H_{i, j}$ is better then all of them).

Finally, we can use divide and conquer similar to the previous problem to reduce the time complexity to $O(NK\log N)$.

#### Problem solution

Note that I used fast I/O to pass this problem. Details in code.

code
#pragma GCC optimize ("O3,unroll-loops,no-stack-protector")
#include <bits/stdc++.h>
using namespace std;
typedef long long ll;

const int N = 4000 + 5;
const int K = 800 + 5;
const ll inf = (ll)1e18;

ll n, k, pre[N][N], dp[K][N];

inline ll cost(int a, int b) {
a--;
return pre[b][b] - pre[a][b] - pre[b][a] + pre[a][a];
}
void DP(int l, int r, int ql, int qr, int kk) {
if(l > r) return;
int mid = l + (r - l) / 2;
ll val = inf, pos = 0;
for(int i = ql ; i <= min(mid - 1, qr) ; ++i) {
ll ret = dp[kk - 1][i] + cost(i + 1, mid);
if(ret < val) val = ret, pos = i;
}
dp[kk][mid] = val;
DP(l, mid - 1, ql, pos, kk);
DP(mid + 1, r, pos, qr, kk);
}

#define getchar getchar_unlocked
#define putchar putchar_unlocked
void rit(ll &x) {
char c;
while(!isdigit(c=getchar()));
x = c - '0';
while(isdigit(c=getchar())) x = x * 10 + c - '0';
}
char buf[100];
void wit(ll x) {
int i = 0;
while(x > 0) buf[i++] = '0' + (x % 10), x /= 10;
if(i == 0) return void(putchar('0'));
while(i > 0) putchar(buf[--i]);
}
void init() {
rit(n); rit(k);
//cin >> n >> k;
for(int i = 1 ; i <= n ; ++i) {
for(int j = 1 ; j <= n ; ++j) {
rit(pre[i][j]);
pre[i][j] += pre[i - 1][j] + pre[i][j - 1] - pre[i - 1][j - 1];
}
}
}
void solve() {
fill(dp[0], dp[0] + N, inf);
dp[0][0] = 0;
for(int i = 1 ; i <= k ; ++i) {
DP(1, n, 0, n - 1, i);
}
cout << dp[k][n] / 2 << '\n';
}

int main() {
ios_base::sync_with_stdio(0), cin.tie(0);
init();
solve();
}



### CF868F - Yet Another Minimization Problem

#### Problem description

You are given an array of $N$ integers $a_1, a_2, \dots a_N$. The cost of a subsegment is the number of unordered pairs of distinct indices within the subsegment that contain equal elements. Split the given array into $K$ non-intersecting non-empty subsegments so that the sum of their costs is minimum possible. Each element should be present in exactly one subsegment.

$2\le N\le 10^5, 2\le K\le \min(N, 20), 1\le a_i\le N$

#### Problem analysis

Let’s write down the DP first in this problem:

• DP state : $dp_{i, j}$ represents the minimum cost splitting $a_1, a_2, \dots, a_i$ into $j$ subsegments
• DP transition : $dp_{i, j}=\min_{0\le k\lt i} \left\{ dp_{k, j - 1} + f(k + 1, i) \right\}$
• Final answer : $dp_{N, K}$

where $f(i, j)$ is the cost of subsegment $a_i, a_{i+1}, \dots, a_j$.

For similar reasons (feeling) above, the transition point in this problem also have monoticity: $$H_{i, j}=\mathop{\arg\max}_{0\le k\lt i} \left\{ dp_{k, j - 1} + f(k + 1, i) \right\} \implies H_{i, j} \le H_{i+1, j}$$ So the time complexity is reduced to $O(xNK\log N)$, where $x$ is the time needed to calculate $f(i, j)$ given $i$ and $j$.

Now, how do we calculate $f(i, j)$ efficiently? Surprisingly, we can calculate directly and the amortized cost will be $O(1)$!

#### Problem solution

Of course we didn’t calculate it “directly” :p. Instead, we maintain three global variable $sum, nl, nr$ representing that $sum=f(nl, fr)$. Every time we need $f(i, j)$, we just move $nl$ to $i$, $nr$ to $j$ and update $sum$ in $O(1)$. The movement of $nl$ and $nr$ is $O(N\log N)$, which implies the calculation of every $f(i, j)$ is $O(1)$ after amortization. See the code for more understanding and details.

code
#include <bits/stdc++.h>
using namespace std;
typedef long long ll;

const int N = (int)1e5 + 5;
const int K = 20 + 5;
const ll inf = (ll)1e15;

int n, k, a[N], cnt[N];
ll dp[K][N], sum, nl, nr;

void add(int x) {
sum += cnt[x] * 2, cnt[x]++;
}
void sub(int x) {
sum -= cnt[x] * 2 - 2, cnt[x]--;
}
void DP(int l, int r, int ql, int qr, int now) {
if(l > r) return;
int mid = l + (r - l) / 2;
ll val = inf, pos = 0;
while(nr < mid) add(a[++nr]);
while(nr > mid) sub(a[nr--]);
for(int i = ql ; i <= min(mid - 1, qr) ; ++i) {
while(nl > i + 1) add(a[--nl]);
while(nl < i + 1) sub(a[nl++]);
ll ret = dp[now - 1][i] + sum;
if(ret < val) val = ret, pos = i;
}
dp[now][mid] = val;
DP(l, mid - 1, ql, pos, now);
DP(mid + 1, r, pos, qr, now);
}

void init() {
cin >> n >> k;
for(int i = 1 ; i <= n ; ++i) cin >> a[i];
}
void solve() {
fill(dp[0], dp[0] + N, inf);
dp[0][0] = 0;
nl = 1, nr = 0;
for(int i = 1 ; i <= k ; ++i) DP(1, n, 0, n - 1, i);
cout << dp[k][n] / 2 << '\n';
}

int main() {
ios_base::sync_with_stdio(0), cin.tie(0);
init();
solve();
}