DP optimization - Monotone-Queue Optimization
tags:icpc
algorithm
dp
dp-optimization
monotone-queue
Outline
Monotone Queue & Relation with DP
To understand what it is, we first look at a problem:
Minimum value in a sliding window
Given an sequence $A_1, A_2, \dots, A_N$ and a integer $K$, for every $1\le i\le N - K + 1$, find $\min_{i\le j \le i + k - 1} A_j$.
This can be solved with monotone queue in $O(N)$ time complexity. This link explained it thoroughly. My implementation is given below:
deque<pair<int, int> > dq; // <id, value>
for(int i = 1, j = 1 ; i <= n - k + 1 ; ++i) {
// pop : remove expired elements
while(!dq.empty() && dq.front().first < i) dq.pop_front();
// push : add new element
while(j <= n && j <= i + k - 1) {
// We want to put A_j into the deque
// First, remove elements that is useless
while(!dq.empty() && A[j] <= dq.back().second) dq.pop_back();
// Then, put it in
dq.push_back({j, A[j]}), ++j;
}
// value of min { A_i, A_{i + 1}, ..., A_{i + k - 1} }
cout << dq.front().second << '\n';
}
So what’s the relation between Monotone queue and DP ? Sometimes calculating DP you may encounter similar problems above i.e. maximum / minimum value for consecutive $k$ elements. See example problems below to understand more.
Example problems
CF1077F - Pictures with Kittens
Problem description
You’re given a sequence of number $A_1, A_2, \dots, A_N$, please choose exactly $X$ numbers out of them. Moreover, if the index you chose is $x_1, x_2, \dots, x_X, x_1 \lt x_2 \lt \dots x_X$, they need to satisfy $x 3, \dots, X$. In other words, at least one number in every $K$ consecutive number must been chosen.
$1\le K, X \le N\le 5000, 1\le A_i \le 10^9$
Problem analysis
- DP state : $dp_{i, j}$ represents maximum value if you choose $j$ from $1, 2, \dots, i$
- DP transition : $dp_{i, j} = \max \left\{ dp_{i-1, j-1}, dp_{i-2, j-1}, \dots, dp_{i-k, j-1} \right\} = \max_{1\le k\le K} dp_{i - k, j - 1}$
- Final answer : $\max \left\{ dp_{N-k+1, x}, dp_{N-k+2, x}, \dots, dp_{N, x} \right\}$
If we calculate it directly the time complexity will be $O(NK\times K) = O(NK^2)$, which is unacceptable.
However, if we look more carefully, we may observe that DP transition is exactly the problem above (maximum value for consecutive $K$ elements) ! Therefore, the time complexity can be improved to $O(NK)$.
Problem solution
Remember to initialize dp values to $-\infty$.
code
#include <bits/stdc++.h>
#define F first
#define S second
using namespace std;
typedef long long ll;
const int N = 5000 + 5;
const int inf = (int)1e9;
int n, k, x;
ll dp[N][N], a[N];
void init() {
cin >> n >> k >> x;
for(int i = 1 ; i <= n ; ++i) cin >> a[i];
}
void solve() {
// dp[i][j] = max { dp[i - 1][j - 1], dp[i - 2][j - 1], ..., dp[i - k][j - 1] } + a[i]
memset(dp, -0x3f, sizeof(dp));
dp[0][0] = 0;
for(int j = 1 ; j <= x ; ++j) {
deque<pair<int, ll> > dq;
dq.push_back({j - 1, dp[j - 1][j - 1]});
for(int i = j ; i <= n ; ++i) {
while(!dq.empty() && dq.front().F < i - k) dq.pop_front();
dp[i][j] = dq.front().S + a[i];
while(!dq.empty() && dp[i][j - 1] >= dq.back().S) dq.pop_back();
dq.push_back({i, dp[i][j - 1]});
}
}
ll ans = -1;
for(int i = 0 ; i < k ; ++i) ans = max(ans, dp[n - i][x]);
cout << ans << '\n';
}
int main() {
ios_base::sync_with_stdio(0), cin.tie(0);
init();
solve();
}
CF445A - Boredom
Problem description
Given a sequence $A$ consisting of $N$ integers. The player can make several steps. In a single step he can choose an element of the sequence (let’s denote it $A_k$) and delete it, at that all elements equal to $A_k + 1$ and $A_k - 1$ also must be deleted from the sequence. That step brings $A_k$ points to the player.
Now, calculate the maximum point a player can get on $A$ if arbitrary steps can be made.
$1\le N\le 10^5, 1\le A_i \le 10^5$
Problem analysis
First, if at a step we choose $A_i$ to delete it, then all other $j$ with $A_j = A_i$ will also be chosen as there’s no cost choosing them (choosing $A_i$ will remove all elements equal to $A_i + 1$ and $A_i - 1$). Base on this observation, we can create a sequence $cnt_i$ represents how many time $i$ appears in $A$.
Another observation is that you always start deleting elements from the largest or the smallest. This is because deleting the largest element (or smallest) will minimize the numbers being removed (as $largest + 1$ doesn’t exist).
Base on the two observations above, we can write our dp equations:
- DP state : $dp_i$ represents maximum point you can achieve considering numbers from $1$ to $i$
- DP transition : $$ dp_i=\max \begin{cases} dp_{i-1} &, \text{ don’t choose } i \\ cnt_i\times i + \max_{j\lt i-1} \left\{ dp_j \right\} &, \text{ choose } i \text{, so } i-1 \text{ can’t be chosen } \\ \end{cases} $$
- Final answer : $\max_{1\le i\le C} \left\{ dp_i \right\}$, where $C$ is the range of $A_i$
Similarly, the $\max$ term in the DP transition is exactly the problem mentioned in the beginning.
Problem solution
Note that the time complexity is $O(C)$, which can be optimized to $O(N)$ (My implementation is $O(C)$, though).
code
#include <bits/stdc++.h>
using namespace std;
typedef long long ll;
const int N = (int)1e5 + 5;
ll n, c, cnt[N], dp[N];
void init() {
cin >> n;
for(int i = 1 ; i <= n ; ++i) {
ll x; cin >> x; cnt[x]++;
c = max(c, x);
}
}
void solve() {
// dp[i] = max { dp[i - 1], cnt[i] * i + max_{j, j < i - 1} { dp[j] } }
dp[0] = 0;
ll mx = 0, ans = 0;
for(int i = 1 ; i <= c ; ++i) {
dp[i] = max(dp[i - 1], cnt[i] * i + mx);
mx = max(mx, dp[i - 1]);
ans = max(ans, dp[i]);
}
cout << ans << '\n';
}
int main() {
ios_base::sync_with_stdio(0), cin.tie(0);
init();
solve();
}
CF372C - Watching Fireworks is Fun
Problem description
$M$ fireworks are going to be launched on a line with length $N$. The $i^{th}$ firework will be launched at time $t_i$, position $a_i$. If you’re at $x$ when firework $i$ launched, then you gain happiness value $b_i - |a_i - x|$ (might be negetive).
Now, you can move $d$ unit every second and choose arbitrary starting point between $1$ and $N$. Please maximize your happiness value (note that you cannot move to position $x$ if $x < 1$ or $x > N$).
$1\le N\le 1.5\times 10^5, 1\le M\le 300, 1\le d\le N, 1\le a_i\le N, 1\le b_i, t_i\le 10^9, t_i\le t_{i+1}$
Problem analysis
- DP state : $dp_{i, j}$ represents maximum happiness value after watching fireworks $1\sim i$ and standing at $j$
- DP transition : $dp_{i, j}=\max_{|k-j|\le d(t_i-t_{i-1})} dp_{i-1, k} + b_i - |a_i - j|$. In other words, we considered all possible position aftre watching fireworks $i-1$.
- Final answer : $\max_{1\le i\le N} dp_{M, i}$
If we calculate DP transition directly, the time complexity will be $O(NM\times N)=O(N^2M)$, which is too slow. How can we optimize it ?
Observe that $|k-j|\le d(t_i-t_{i-1})$ can be rewritten to: $$ \max\left\{1, j - d(t_i-t_{i-1})\right\} \le k \le \min\left\{N, j + d(t_i-t_{i-1})\right\} $$ which is exactly maximum value in a sliding window ! Thus, the time complexity can be optimized to $O(NM)$.
Problem solution
Note that we can’t declare $dp[150000][300]$ directly. Rolling array techniques is needed to optimize the space complexity to $O(N)$. Details in code.
code
#include <bits/stdc++.h>
#define F first
#define S second
using namespace std;
typedef long long ll;
const int N = (int)1.5e5 + 5;
const int M = 300 + 5;
const ll inf = (ll)1e18;
int n, m, d;
struct Event { int a, b, t; } events[M];
ll dp[2][N];
void init() {
cin >> n >> m >> d;
for(int i = 1 ; i <= m ; ++i) cin >> events[i].a >> events[i].b >> events[i].t;
sort(events + 1, events + 1 + m, [&](auto x, auto y) { return x.t < y.t; });
events[0] = {0, 0, 0};
}
void solve() {
// dp[i][j] = b_i - |a_i - j| + max_{ |k-j| <= d * (t_i - t_{i - 1}) } { dp[i - 1][k] }
int cur = 1;
for(int i = 1 ; i <= m ; ++i) {
ll tt = events[i].t - events[i - 1].t, k = 1;
deque<pair<int, ll> > dq;
for(int j = 1 ; j <= n ; ++j) {
while(!dq.empty() && abs(dq.front().F - j) > tt * d) dq.pop_front();
while(k <= n && abs(k - j) <= tt * d) {
while(!dq.empty() && dp[cur ^ 1][k] > dq.back().S) dq.pop_back();
dq.push_back({k, dp[cur ^ 1][k]}), k++;
}
dp[cur][j] = (ll)events[i].b - abs(events[i].a - j) + dq.front().S;
}
cur ^= 1;
}
ll ans = -inf;
for(int i = 1 ; i <= n ; ++i) ans = max(ans, dp[cur ^ 1][i]);
cout << ans << '\n';
}
int main() {
ios_base::sync_with_stdio(0), cin.tie(0);
init();
solve();
}
CF939F - Cutlet
Problem description
You’re given a steak. Your mission is to cook each side exactly $N$ minutes. However, you can’t flip at anytime you want. You can only flip in $K$ time intervals $[l_i, r_i]$. Now, please calculate the minimum number of flips to achieve the goal or determine it is impossible to do it.
$1\le N\le 10^5, 1\le K\le 100, 1\le l_i\le r_i\le 2N$
Problem analysis
- DP state : $dp_{i, j}$ represents the minimum number of flipes if in the first $i$ minutes, the side which isn’t cooked right now has already been cooked $j$ minutes.
- DP transition : $$ dp_{i, j}=\min \begin{cases} dp_{i-1, j} &, \text{ don’t flip at minute} i \\ dp_{i-1, i-j-1}&, \text{ (if we can) flip at minute } i \\ \end{cases} $$
- Final answer : $dp_{2N, N}$
Is this solution feasible ? We have $O(N^2)$ states and the transition takes $O(1)$, so the time complexity is $O(N^2)\implies$ Not feasible QQ. How do we improve it ?
Observe that there are at most $100$ intervals that we can flip, and in every interval we will either filp one or two times or not flip at all. Therefore, we can redesign of DP:
- DP state : $dp_{i, j}$ represents the minimum number of flipes if in the first $r_i$ minutes, the side which isn’t cooked right now has already been cooked $j$ minutes.
- DP transition : $$ dp_{i, j}=\min \begin{cases} dp_{i-1, j} &, \text{ don’t flip in } [l_i, r_i] \\ \min_{0\le k\le r_i-l_i}+1\left\{ dp_{i-1, r_i-j-k}\right\} &, \text{ flip once at time } l_i+k\\ \min_{0\le k\le r_i-l_i}+2\left\{ dp_{i-1, j-k}\right\} &, \text{ flip twice, and the time between the two flip is } k\\ \end{cases} $$
- Final answer : $dp_{M, N}$
Now the time complexity is $O(NK^2)$, and can be reduced to $O(NK)$ with techniques mentioned in the beginning (the range $\min$ parts in the transition).
Problem solution
Like the previous problem, I used rolling array techniques to reduce space complexity. Also, be careful when calculating the indexes (I debugged for like two hours…).
code
#include <bits/stdc++.h>
#define F first
#define S second
using namespace std;
typedef long long ll;
const int N = (int)1e5 + 5;
const int M = 100 + 5;
const int inf = (int)1e9;
int n, k, dp[2][2 * N];
struct Seg { int l, r; } seg[M];
void init() {
cin >> n >> k;
for(int i = 1 ; i <= k ; ++i) cin >> seg[i].l >> seg[i].r;
}
void solve() {
// dp[i][j] = min { dp[i - 1][j], min_{ 0 <= k <= r_i - l_i } { dp[i - 1][r_i - j - k] + 1, dp[i - 1][j - k] + 2 } }
int cur = 1, ans = inf;
fill(dp[0], dp[0] + n + 1, inf); dp[0][0] = 0;
for(int i = 1 ; i <= k ; ++i) {
deque<pair<int, int> > dq;
copy(dp[cur ^ 1], dp[cur ^ 1] + n + 1, dp[cur]);
int tt = seg[i].r - seg[i].l;
for(int j = 0 ; j <= min(n, seg[i].r) ; ++j) {
while(!dq.empty() && j - dq.front().F > tt) dq.pop_front();
while(!dq.empty() && dp[cur ^ 1][j] <= dq.back().S) dq.pop_back();
dq.push_back({j, dp[cur ^ 1][j]});
dp[cur][j] = min(dp[cur][j], dq.front().S + 2);
}
dq.clear();
for(int j = seg[i].r ; j >= 0 ; --j) {
while(!dq.empty() && dq.front().F < seg[i].l - j) dq.pop_front();
while(!dq.empty() && dp[cur ^ 1][seg[i].r - j] <= dq.back().S) dq.pop_back();
dq.push_back({seg[i].r - j, dp[cur ^ 1][seg[i].r - j]});
dp[cur][j] = min(dp[cur][j], dq.front().S + 1);
}
cur ^= 1;
}
ans = dp[cur ^ 1][n];
if(ans == inf) cout << "Hungry\n";
else cout << "Full\n" << ans << '\n';
}
int main() {
ios_base::sync_with_stdio(0), cin.tie(0);
init();
solve();
}
More problems
Changelog
- [2023-04-19] Fix error in the first example problem. Thank Hossam (hossamsherif2008@gmail.com) for pointing out the error.