## Outline

Do binary search parallelly(of course 😃). Usually this kind of problems needs to do lots of binary search. See problems for more understanding.

## Example problems

### POI XVIII MET - Meteors

#### Problem description

You’re given $N$ people and $M$ lands, every land has an owner $o_i$, and every person has a target resource level $g_i$. Now, $K$ events will happen sequentially. Every event is described by $l_i, r_i, x_i$, meaning that lands from $l_i$ to $r_i$ will generate $x_i$ resources ($l_i$ may be greater than $r_i$ as the land is arranged as a circle), and the owner of the land will get the resource it generate. In other words, the person that have $c$ lands in $[l_i, r_i]$ will get $cx_i$ resources. For each person, please output after which event does he/she achieved his/her target resource level. If he/she can’t achieve his/her target, then output “NIE”(polish for no).

$1\le N, M, K \le 3\times 10^5, 1\le o_i, l_i, r_i\le N, 1\le g_i\le 10^9, 1\le x_i\le 10^9$

#### Problem analysis

Suppose that we only have one person to calculate, than we can do binary search on the answer and maintain the events using BIT(range update, single query). The complexity is $O((K+C_i)\log K)$, where $C_i$ is the number of lands he/she owns.

If we do this $N$ times, the complexity will rise to $O(NK+M)\log K)$, which is unacceptable.

So we cannot do $N$ times binary search. This is where parallel binary search can be used.

#### Problem solution

The pseudocode of the solution is:

parallel_binary_search(L, R, candidates): // its called totBS in code
if L + 1 == R:
the answer of all people in candidates is L
return
mid = (L + R) / 2
Add events in [L, mid) into BIT
split candidates into two groups, left(done) and right(undone)
Remove events in [L, mid) from BIT
parallel_binary_search(L, mid, left)
parallel_binary_search(mid, R, right)


The time complexity is $O((N+M+K)\log M\log K)$, and the space complexity is $O(N\log N)$. Note that the space complexity can be reduced to $O(N)$.

code
#include <bits/stdc++.h>
using namespace std;
typedef long long ll;
typedef unsigned long long ull;

const int N = (int)3e5 + 5;

namespace bit {
ll b[N], n;

void init(int _n) {
n = _n;
fill(b, b + n + 1, 0);
}
inline int lowbit(int x) {
return x & (-x);
}
void upd(int x, int v) {
for(int i = x ; i <= n ; i += lowbit(i)) b[i] += v;
}
void update(int l, int r, int v) {
upd(r + 1, -v), upd(l, v);
if(l > r) upd(1, v), upd(n + 1, -v);
}
ll query(int x) {
ll ans = 0;
for(int i = x ; i > 0 ; i -= lowbit(i)) ans += b[i];
return ans;
}
}

int n, m, k, target[N], ans[N];
vector<tuple<int, int, int> > event;
vector<int> lands[N];

void totBS(int l, int r, vector<int>& candi) {
if(l + 1 == r || candi.empty()) {
for(auto c : candi) ans[c] = l;
return;
}
int mid = (l + r) >> 1;
// do things: events from [l, mid)
for(int i = l ; i < mid ; ++i) {
auto [el, er, ea] = event[i];
bit::update(el, er, ea);
}
// split candi into two parts
vector<int> ok, not_ok;
for(auto c : candi) {
ull sum = 0;
for(auto land : lands[c]) sum += bit::query(land);
if(sum >= (ull)target[c]) ok.push_back(c);
else {
target[c] -= sum;
not_ok.push_back(c);
}
}
// undo things
for(int i = l ; i < mid ; ++i) {
auto [el, er, ea] = event[i];
bit::update(el, er, -ea);
}
// continue binary search and free memory
totBS(l, mid, ok); vector<int> ().swap(ok);
totBS(mid, r, not_ok); vector<int> ().swap(not_ok);
}

void init() {
cin >> n >> m;
for(int i = 1 ; i <= m ; ++i) {
int x; cin >> x; lands[x].push_back(i);
}
for(int i = 1 ; i <= n ; ++i) cin >> target[i];
cin >> k;
for(int i = 0 ; i < k ; ++i) {
int l, r, a; cin >> l >> r >> a;
event.push_back({l, r, a});
}
}
void solve() {
bit::init(m);
vector<int> all;
for(int i = 1 ; i <= n ; ++i) all.push_back(i);
totBS(0, k + 1, all);
vector<int> ().swap(all);
for(int i = 1 ; i <= n ; ++i) {
if(ans[i] == k) cout << "NIE\n";
else cout << ans[i] + 1 << '\n';
}
}

int main() {
ios_base::sync_with_stdio(0), cin.tie(0);
init();
solve();
}



### HDU5412 - CRB and Queries

#### Problem description

You are given $N$ elements $A_1, A_2, \dots, A_N$, and you need to handle $Q$ queries of two type:

• $1\space l\space v$ : Change $A_l$ to $v$
• $2\space l\space r\space k$ : Answer the $k^{th}$ smallest value in $A_l, A_{l+1}, \dots, A_r$

$1\le N, Q \le 10^5, 1\le A_i, v \le 10^9, 1\le l \le r \le N, 1\le k \le r-l+1$

#### Problem analysis

Let’s view every number in this problem as a four-tuple $(i, v, t, val)$, representing that after time $t$, $A_i=v$. Then, the first type of query can be transformed to:

• $1\space l\space v$ : Create $(l, old\space A_l, t, -1)$ and $(l, v, t, 1)$

Now, a second query $(l, r, k)$ can be handled with a binary search. We can do binary search on answer. For a value $x$, we sum up the $val$ part of tuples $(i, v, t, val)$ that satisfy $l\le i\le r$, $v\le x$, $t\le \text{time of the query }$. If the sum of $val$ is smaller than $k$, then we know the answer is greater than $x$, else the answer is smaller than $x$. The time complexity is $O((N+Q)\log N\log C)$ if we use BIT to maintain interval sums.

So now we can solve each type $2$ query with one binary search. Similar to the previous problem, instead of doing $Q$ binary searches, we can do all binary searches at the same time i.e. use parallel binary search technique! The time compelxity will be $O((N + Q)\log N\log C)$, which is same as one binary search.

#### Problem solution

The pseudocode of the main part of the solution is:

def totBS(L, R, events, queries):
if L + 1 == R:
the answer to query in queries is L
return
mid = (L + R) / 2
split events into l_event, r_event and queries into l_query, r_query
totBS(L, mid, l_event, l_query)
totBS(mid, R, r_event, r_query)


Note that the events and queries are sorted by time, and all values are discretized. Thus the time complexity is $O((N+Q)\log N\log(N+Q))$.

code
#include <iostream>
#include <vector>
#include <algorithm>
#include <tuple>
using namespace std;
typedef long long ll;

const int N = (int)1e5 + 5;

namespace bit {
int b[N * 2], n;

void init(int _n) {
n = _n;
fill(b, b + n + 1, 0);
}
inline int lowbit(int x) { return x & (-x); }
inline void update(int x, int v) {
for(int i = x ; i <= n ; i += lowbit(i)) b[i] += v;
}
inline int qry(int x) {
int ans = 0;
for(int i = x ; i > 0 ; i -= lowbit(i)) ans += b[i];
return ans;
}
int query(int l, int r) {
return qry(r) - qry(l - 1);
}
}
struct Event {
int p, x, v, t;
};
struct Query {
int l, r, k, t, id;
};

int n, q, q2, rg, a[N], ans[N];
vector<int> num;
vector<Event> events;
vector<Query> queries;

int id(int x, vector<int>& v) {
return lower_bound(v.begin(), v.end(), x) - v.begin() + 1;
}
void totBS(int l, int r, vector<Event>& event, vector<Query>& query) {
/*
cerr << "l=" << l << ", r=" << r << ": \n";
cerr << " event: "; for(auto [p, x, v, t] : event) fprintf(stderr, "(%d,%d,%d,%d), ", p, x, v, t); cerr << '\n';
cerr << " query: "; for(auto [l, r, k, t, id] : query) fprintf(stderr, "(%d,%d,%d,%d,%d), ", l, r, k, t, id); cerr << '\n';
*/
if(l + 1 == r || query.empty()) {
for(int i = 0 ; i < (int)query.size() ; ++i) ans[query[i].id] = num[l - 1];
return;
}

int mid = (l + r) >> 1, i = 0, j = 0;
vector<Event> le, re;
vector<Query> lq, rq;
while(i < (int)event.size() || j < (int)query.size()) {
if(j == (int)query.size() || (i < (int)event.size() && event[i].t <= query[j].t)) {
if(event[i].x < mid) {
//cerr << "e: " << event[i].p << ' ' << event[i].v << '\n';
bit::update(event[i].p, event[i].v);
le.push_back(event[i]);
}
else re.push_back(event[i]);
++i;
}
else {
int cnt = bit::query(query[j].l, query[j].r);
//cerr << "q: " << query[j].l << ' ' << query[j].r << ' ' << cnt << '\n';
if(cnt >= query[j].k) lq.push_back(query[j]);
else {
rq.push_back(query[j]);
rq.back().k -= cnt;
}
++j;
}
}

for(int k = 0 ; k < (int)le.size() ; ++k) bit::update(le[k].p, -le[k].v);
totBS(l, mid, le, lq); vector<Event> ().swap(le); vector<Query> ().swap(lq);
totBS(mid, r, re, rq); vector<Event> ().swap(re); vector<Query> ().swap(rq);
}

bool init() {
if(!(cin >> n)) return 0;
num.clear();
events.clear();
queries.clear();
for(int i = 1 ; i <= n ; ++i) {
cin >> a[i];
events.push_back({i, a[i], 1, 0});
num.push_back(a[i]);
}
cin >> q; q2 = 0;
for(int i = 1 ; i <= q ; ++i) {
int op; cin >> op;
if(op == 1) {
int l, v; cin >> l >> v;
events.push_back({l, a[l], -1, i});
events.push_back({l, a[l]=v, 1, i});
num.push_back(v);
}
else {
int l, r, k; cin >> l >> r >> k;
queries.push_back({l, r, k, i, q2++});
}
}
sort(num.begin(), num.end());
num.resize(unique(num.begin(), num.end()) - num.begin());
rg = (int)num.size();
for(int i = 0 ; i < (int)events.size() ; ++i) events[i].x = id(events[i].x, num);
return true;
}
void solve() {
bit::init(n);
totBS(1, rg + 1, events, queries);
for(int i = 0 ; i < q2 ; ++i) cout << ans[i] << '\n';
}

int main() {
ios_base::sync_with_stdio(0), cin.tie(0);
while(init()) solve();
}