第八章 Stackelberg Game - Applications to Seccurity

[TOC]

8.1 Supplements for Nash Equilibrium

Player 1: $P_1 = (p_{11}, p_{12})$

Payoff for Player 1: $U_1 = P_1 L P_2^\top$

Player 2: $P_1 = (p_{21}, p_{22})$

Payoff for Player 2: $U_1 = P_1R P_2^\top$

8.1.1 NE

$
\text{Two players with payoff matrices } L \text{ and } R \text{ of size } m \times n
$

$
P_1 = (p_{11}, p_{12}, \dots, p_{1m})
$
$
P_2 = (p_{21}, p_{22}, \dots, p_{2n})
$

$
(P_1, P_2) \text{ is a Nash equilibrium (NE) if and only if }
$

$
P_1 L P_2^\top \geq P_1’ L P_2^\top \text{ for all mixed strategies } P_1’
$
$P_1 R P_2^\top \geq P_1 R [P_2’]^\top \text{ for all mixed strategies } P_2’$

或者可以这么说：

$
(P_1, P_2) \text{ is a NE if and only if}
$

$
P_1 L P_2^\top \geq e_i L P_2^\top \text{ for every } i \in [m] \quad e_i = (0, \dots, 0, 1, 0, \dots, 0)
$
$
P_1 R P_2^\top \geq P_1 R e_i^\top \text{ for every } i \in [n] \quad e_i = (0, \dots, 0, 1, 0, \dots, 0)
$

8.1.2 Strategy game

$
\text{Player 1’s mixed strategy: } \mathbf{x} = (x_1, 1 - x_1)
$
$
\text{Player 2’s mixed strategy: } \mathbf{y} = (y_1, 1 - y_1)
$
$
\text{Player 1’s payoff: } \mathbf{x} L \mathbf{y}^\top
$
$
\text{Player 2’s payoff: } \mathbf{x} R \mathbf{y}^\top
$

8.2 Stackelberg game (主从博弈)

Player 1: “leader” Player 2 “follower”

1) Leader selects a (possibly mixed) strategy $x_1$

2) Follower learns about $x_1$, selects the best response $x_2$

8.2.1 如何表示主从博弈

Mixed strategies are hard to visually represent

Continuous spectrum of possible actions
行动的连续谱

8.2.2 主从博弈的纳什均衡

Player 1 takes mixed strategies, the advantage is more:

If Player 1 plays Up and Down with probabilities 0.49 and 0.51, respectively
Player 2 is still better off playing Right than Left, in expectation
The Reward of Player 1 >2

Can the leader lose in Stackelberg equilibrium compared to a Nash equilibrium?

In Stackelberg, the leader must take action in advance, while in Nash, he can change his strategy at any point.
The answer: No
- The optimal reward for the leader in the Stackelberg game is always greater than or equal to his maximum reward under any Nash equilibrium.
If (x,y) is a NE, then Player 1 can always select x, ensure that Player 2 will play y and achieve the reward in NE
- Player 1 may be able to select a better strategy than x
- 后选择的总是被动

8.2.3 Stackelberg vs Nash

It is important to note that:

the leader can take mixed strategies
the follower knows (and trusts) the leader’s strategies
the leader knows the follower’s reward structure

8.3 Stackelberg in zero-sum game

Recall the minimax theorem:

$\max_{\mathbf{x}} \min_{\mathbf{y}} \mathbf{x} M \mathbf{y}^\top = \min_{\mathbf{y}} \max_{\mathbf{x}} \mathbf{x} M \mathbf{y}^\top$

Player 1 goes first $\rightarrow$ Player 1 selects the maxmin strategy.

Player 2 goes first $\rightarrow$ Player 2 selects the minmax strategy.

The minimax theorem: make no difference.

Strategy game and Stackelberg game in zero-sum are essentially identical.

策略博弈和零和中的斯塔克伯格博弈本质上是相同的。

Proof:

$U_{1st}\geq U_{1NE}$
$U_{2st}\geq U_{2NE}$
$-U_{1st}\geq U_{2NE}=-U_{1NE}$
$U_{1st}\leq U_{1NE}$
$U_{1st}=U_{1NE}$

8.3.1 Stackelberg for general 2-persons game

2-persons game: the payoff matrices are L and R for player 1 and 2, respectively.

$\max_{\mathbf{x}} \mathbf{x} L \mathbf{y}_*^\top \\\text{subject} \quad \text{to}\\ \mathbf{y}_* = \arg \max_{\mathbf{y}} \mathbf{x} R \mathbf{y}^\top$

8.3.2 example

Mixed strategy x=(p,1-p) over ‘Up’ and ‘Down’. Consider two cases:

Player 2 selects ‘Left’
Player 2 selects ‘Right’

case 1

Player 2 selects ‘Left’ :

$\max_{p \in [0, 1]} \left( 1p + 0(1 - p) \right)$

subject to $p \cdot 1 + 0 \cdot (1 - p) \geq 0 \cdot p + 1 \cdot (1 - p)$

That is:

$\max_{p \in [0, 1]} p$
subject to : $p \geq 1 - p$

Solution : p=1,the payoff : 1

case 2

Player 2 selects ‘Right’ :

$\max_{p \in [0, 1]} \left( 3p + 2(1 - p) \right)$
subject to : $p \cdot 1 + 0 \cdot (1 - p) \leq 0 \cdot p + 1 \cdot (1 - p)$

That is:

$\max_{p \in [0, 1]} \left( p + 2 \right)$
subject to : $p \leq 1 - p$

Solution : p=1/2,the payoff : 2.5

8.3.3 Stackelberg via linear programs

High-level idea:

Fix a strategy $ a_2^* \in A_2 $ for Player 2.

Write a linear program with mixed strategy $\mathbf{x}$ of Player 1.

Maximize the payoff of Player 1 with respect to mixed strategy $\mathbf{x} $, and Player 2 selects the best response $ a_2^* $.

Solve this linear program.

Solve linear programs for every $ a_2^* \in A_2 $.

8.3.4 Stackelberg via linear programs (formally)

$A$, $B$ for player 1 (Leader) and player 2 (Follower)

$|A| = m$, $|B| = n$

$\mathbf{x}$ is a mixed strategy over $A$

$\mathbf{x}(a_i)$ denotes the probability of Leader selecting $a_i \in A$

$u_1(a, b)$ and $u_2(a, b)$: payoff for Leader and Follower

One LP for fixed $ b^* \in A_2 $

Fixed strategy $ b^* \in A_2 $ for player 2 (Follower)

$U_1(b^) = \max \sum_{a \in A} \mathbf{x}(a) u_1(a, b^)$

subject to

$\forall b \in B, \sum_{a \in A} \mathbf{x}(a) u_2(a, b^*) \geq \sum_{a \in A} \mathbf{x}(a) u_2(a, b)$

$\sum_{a \in A} \mathbf{x}(a) = 1, \quad \mathbf{x}(a) \geq 0$

Taking the maximum over $b \in A_2 $ : $\max_{b} U_1(b)$

exercise

2选择Left

$\max p+2q+3(1-p-q) \\\text{subject \quad to}\\ 2p\geq3q\\ 2p\geq1-p-q$

结果：$[\frac{1}{3},0,\frac{7}{3}]$,payoff:$\frac{7}{3}$

2选择middle

$\max 3p+q+2(1-p-q) \\\text{subject \quad to}\\ 3q\geq2p\\ 3q\geq1-p-q$

结果：$[\frac{3}{5},\frac{2}{5},0]$,payoff:$\frac{11}{5}$

2选择right

$\max p+2q+(1-p-q) \\\text{subject \quad to}\\ 1-p-q\geq p\\ 1-p-q\geq3q$

结果：$[\frac{1}{3},0,\frac{2}{3}]$,payoff:$1$

因此选择left有最大收益$\frac{7}{3}$

上面的s.t.指的是在follower有这种选择必须满足的条件：即当前选择的利好必须不小于其他的选择

8.4 Computing Stackelberg

Theorem [Conitzer and Sandholm 2006]: In 2-player normal form games, an optimal Stackelberg strategy can be found in polynomial time.

Theorem: In (>)3-player normal form games, an optimal Stackelberg strategy is an NP-Hard problem.

8.5 Cournot Competition (Strategy game)

Two firms compete by choosing how much to produce:

$G = \{ {1, 2}, {q_1, q_2}, {u_1, u_2} \}$

价格函数为：$p(q_1 + q_2) = \max \left( 0, a - b(q_1 + q_2) \right)$

公司的成本函数为：$c_i(q_i) = c q_i$

收益函数为：$u_i(q_1, q_2) = \left( \max \left( 0, a - b(q_1 + q_2) \right) - c \right) q_i$

条件为：$a > b, , c > 0, , q_1 \geq 0, , q_2 \geq 0$

纳什均衡为：$\left( \frac{a - c}{3b}, \frac{a - c}{3b} \right)$

8.5.1 Stackelberg Competition (主从博弈)

Here’s the version you requested in LaTeX format with the modifications made:

$G = \{ {1, 2}, {q_1, q_2}, {u_1, u_2} \}$

The price function is:$p(q_1 + q_2) = \max \left( 0, a - b(q_1 + q_2) \right)$

The cost functions for the firms are: $c_i(q_i) = c q_i$

The payoff functions are: $u_i(q_1, q_2) = \left( \max \left( 0, a - b(q_1 + q_2) \right) - c \right) q_i$

The conditions are: $a > b, , c > 0, , q_1 \geq 0, , q_2 \geq 0$

Difference: Player 1 chooses $q_1$ first, then Player 2 chooses $q_2$ after observing $q_1$.

8.5.2 Stackelberg Competition (Continued)

This is an extensive game, and we look for Subgame Perfect Equilibrium (SPE).

Back Induction – Not a finite game but with finite length.

We look at a subgame where player 1 chooses $q_1$. Then, player 2 ’s maximization problem is:

$\max_{q_2 \geq 0} \, u_2(q_1, q_2) = (a - b(q_1 + q_2) - c) q_2$

This gives the best response for player 2:

$q_2 = \frac{a - c - bq_1}{2b}$

No difference.

8.5.3 Stackelberg Competition (Continued)

The difference: Player 1 will choose $q_1$ after recognizing player 2 ’s best response. Player 1 is the leader; player 2 is the follower.

The problem for player 1 is:

$\max_{q_1 \geq 0} \, u_1(q_1, q_2) = (a - b(q_1 + q_2) - c) q_1$

subject to

$q_2 = \frac{a - c - bq_1}{2b}$

This implies that:

$\max_{q_1 \geq 0} \left( a - b\left( q_1 + \frac{a - c - bq_1}{2b} \right) - c \right) q_1$

We get the best response for player 1:

$q_1 = \frac{a - c}{2b}$