CPT302 Multi-Agent Systems 题型

article/2025/7/13 23:18:41

Agent games

Wumpus World 乌普斯世界

设定

环境:一个二维网格状的洞穴(cave),由多个房间(rooms)组成。
起点:智能体(agent)总是从左下角的 Room[1,1] 开始。
连接方式:每个房间通过上下左右的通道walkways 与其他房间连接(不包括对角线diagonally)。

内容与感知线索

Wumpus(怪兽)
静止不动,但致命:进入其所在房间的智能体会被吃掉,游戏失败。
它周围相邻的房间会**发出恶臭(stench)**作为提示。
智能体有箭(arrow),如果面朝它可以使用空格键射杀。

Pits(陷阱)
掉进坑里会卡住,游戏结束。
陷阱周围相邻的房间会有**微风(breeze)**提示。

Gold(宝藏)
目标是不死的情况下尽可能收集所有宝藏。
有宝藏的房间会发出**闪光(glitter)**提示。
使用 Enter 键捡起宝藏。


游戏规则与限制
1. 每次移动(↑↓←→)都会扣分(代表“代价”)( The agent is penalized and must pay to move to a room.
),鼓励智能体做出最优路径决策
2. 需要利用感知(微风、恶臭、闪光)来推理未知房间是否安全。
3. 游戏目标:在不死的情况下尽可能多地收集宝藏

控制方式

↑ ↓ ← →:移动智能体。
Enter:拾取当前房间中的金子。
Space:射箭(如果面前是 Wumpus 则可以杀死)。

The Agent starts from Room[1, 1].

four aspecs of rationality

Lec1
o    Performance measure

The criteria by which the agent’s success is judged(评估智能体“表现好坏”的标准).

  • Collect gold

  • Stay alive (avoid Wumpus and pits)

  • Use minimal actions and resources

o    Percept sequence

所有感知信息的历史序列

This is the complete history of what the agent has perceived. At each step, the agent receives sensory input from the environment.

Stench (Wumpus nearby)
Breeze (Pit nearby)
Glitter (Gold here)
Bump (Hit a wall)
Scream (Wumpus killed)

(1,1,E),(1,1,N),shooting, capture

o    Agent's knowledge about environment 智能体对环境的知识

  • Starts at Room[1,1]

  • Rules: Stench → Wumpus nearby, Breeze → Pit nearby

o    Actions

  • Move (↑ ↓ ← →)

  • Grab (Enter)

  • Shoot arrow (Space)

sensors

SensorPerceptMeaning
StenchSmellA Wumpus is in an adjacent room
BreezeWindA pit is in an adjacent room
GlitterVisualGold is in the current room

properties of the environment

From T2

1. Identify the properties of the environment:
        o    Fully observable vs partially observable
        o    Deterministic vs non-deterministic
        o    Static vs dynamic
        o    Discrete vs continuous
        o    Episodic vs non-episodic
        o    Real Time

PropertyWumpus World Description
Partially ObservableThe agent only perceives local clues (e.g., stench, breeze), not the full map.
DeterministicActions (e.g., move, shooting an arrow) may have certain results.
StaticThe environment doesn’t change.
DiscreteThe world is a grid of distinct cells.
Non-EpisodicCurrent decisions depend on the entire history (percepts, moves).
not Real-Time如果有timer, 则是real time


2. What are the elements of the set E and Ac?

Elements of the set E (Percepts)

Each percept can include:

  • stench, breeze, glitter, bump, scream
    So an element e ∈ E could look like:
    e = ⟨stench, breeze, none, none, none⟩

Elements of the set Ac (Actions)

Possible actions:

  • Up, Down, Left, Right, Grab, Shoot, Climb
    So Ac = {Up, Down, Left, Right, Grab, Shoot, Climb}


    Give an example of e0.

3. Example of e₀ (initial percept)

The agent starts at Room[1,1], which is safe and clean.
So:
e₀ = ⟨none, none, none, none, none⟩


    Give an example of a run r.

A run is a sequence of percepts and actions.
Example:
r = ⟨e₀, Right, e₁, Up, e₂, Grab, e₃, Left, e₄⟩

This represents:

  • perceive e₀ → move Right (action, a0)→ perceive e₁ → move Up  (action, a1)→ perceive e₂Grab gold → ...

Environment and Notation 2

from T3

1. An agent can do the following actions (one at a time): Turn(Right), Turn(Left), Forward, Shoot, Grab. Suggest predicates 谓词 to describe the domain!

    An agent can do the following actions (one at a time):
Turn(Right), Turn(Left), Forward, Shoot, Grab

    Predicates to describe the domain:
In(i, j) means the agent is in square (i, j).
Facing(d) the agent is facing direction d = N, E, W or S.
W(i, j) means there is a Wumpus in square (i,j).
S(i, j) means there is a stench in square(i,j).
P(i, j) means there is a pit in square(i,j).
B(i, j) means there is a breeze in square (i,j).
G(i, j) means there is a treasure (and a glitter) in square (i,j).
V(i, j) means that square (i,j) has been visited.
OK(i, j) means that square(i,j) is safe.
Arrow() means that there is still an arrow left (assume there's only one arrow).

2.    Describe the following rules:
o    A square is safe iff it contains no Wumpus and no pit.

OK(i, j) :- ¬W(i, j) ∧ ¬P(i, j)
 


o    A stench iff a Wumpus is in an adjacent square.

S(i, j) :- W(i+1, j) ∨ W(i-1, j) ∨ W(i, j+1) ∨ W(i, j-1)
 


o    A breeze iff a pit is in an adjacent square.

B(i, j) :- P(i+1, j) ∨ P(i-1, j) ∨ P(i, j+1) ∨ P(i, j-1)

    Create rules for grabbing a treasure and shooting an arrow.

Grab :- In(i, j) ∧ G(i, j)

Shoot :- In(i, j) ∧ Arrow() ∧ Facing(d) ∧ W(i1, j1)
 

Predicate

In(i, j) means the agent is in square (i, j).
Facing(d) the agent is facing direction d = N, E, W or S.
W(i, j) means there is a Wumpus in square (i,j).
S(i, j) means there is a stench in square(i,j).
P(i, j) means there is a pit in square(i,j).
B(i, j) means there is a breeze in square (i,j).
G(i, j) means there is a treasure (and a glitter) in square (i,j).
Arrow() means that there is still an arrow left (assume there's only one arrow).

safe(i, j) means that square(i,j) is safe.

V(i, j) means that square (i,j) has been visited.

Actions

Turn () : P: Facing(d) D: Facing(d) A: Facing(d')

Forward, P: In(i, j) ∧ Facing(d) ∧ safe(i', j')   D: In(i, j)  A: In(i', j'), V(i', j')

Shoot, P: Arrow()  D: Arrow()

Grab  P: In(i, j) ∧ G(i, j)  D: G(i, j)

Plan

From T4

Q1. Give the pre-condition list(执行该动作前需要满足的条件, delete list(动作执行后变为不成立的事实, and the add list(动作执行后变为成立的事实 of the actions.

Pre: In(i,j) ∧ V(i,j)∧G(i,j)

  • In(i,j):代理必须在该格子;

  • V(i,j):说明该格子已被访问,表示代理“已经感知到这里有宝藏”,这是一种常规约束;

  • G(i,j):该格子确实有金子,否则就不需要执行 Grab。

Del:G(i,j)

G(i,j):金子被拿走后,格子不再含有金子

Add:--

没有新增谓词(即空 Add 列表),因为在你提供的谓词系统中没有定义像 HasGold() 这样的谓词,因此不能添加它。


Q2. Formally describe the initial state / beliefs B0 of the figure above.

Initial State整个环境的“真实状态”,包括所有格子的内容(哪有坑、金子、Wumpus 等)
Initial B0代理(Agent)在游戏开始时“知道的部分信息”,通常是当前格子和感知到的提示

initial state:

In(1,1)  
Facing(E)  
Arrow()  
W(1,3)  
G(2,3)  
OK(1,1) ∧ OK(1,2) ∧ OK(2,1)  
¬W(1,1), ¬W(1,2), ¬W(2,1)  
¬P(1,1), ¬P(1,2), ¬P(2,1)  

...(总之是初始状态就行)

Beliefs B0​,也叫做初始信念状态,表示智能代理在游戏一开始时对世界的已知信息

In(1,1)
Facing(E)
Arrow()
V(1,1)
OK(1,1)
¬W(1,1)
¬P(1,1)

 


Q3  Calculate a plan π that would achieve killing the Wumpus and getting the treasure without dying, given the beliefs B0.

e.g 这里随便编的
Forward         // (1,1) → (1,2)
Forward         // (1,2) → (1,3)
Shoot           // kill Wumpus at (1,3)
Turn(Right)     // now facing South
Forward         // (1,3) → (2,3)
Grab            // pick up the treasure

 

2324 q1 ab

(a) Is the environment of the Wumpus World above static or dynamic? Explain your answer!

static, the environment doesn’t change. 

(b) Given the following possible actions: Turn_Right, Turn_Left, Forward, Shoot, Grab;
And the following predicates:
In(i, j) means the agent is in square (i, j).
Facing(d) the agent is facing direction d = N, E, W or S.
W(i, j) means there is a Wumpus in square (i, j).
S(i, j) means there is a stench in square(i, j).
P(i, j) means there is a pit in square(i, j).
B(i, j) means there is a breeze in square (i, j).
G(i, j) means there is a treasure (and a glitter) in square (i, j).
Give a rule to the deductive reasoning agent in the Wumpus World above for the case of "if there is a treasure to the right of the agent while it is already facing east, and the next room to the right is safe, then move towards the treasure".    ???

G(i+1, j): In(i,j) ∧ Facing(E) ∧ ¬W(i+1, j) ∧ ¬P(i+1, j) → Do(Forward)

Vacuum World

Compact Rules 

from T3

The goal in the Vacuum World is for the robot to clear up all the dirt. There are 3 domain predicates: 
    In(x,y) agent is at (x,y).
    Dirt(x,y) there is dirt at (x,y).
    Facing(d) the agent is facing direction d = N, E, W or S.
    The actions are Ac = {turn, forward, suck}, where turn means "turn right". 

Suggest compact decision making rules for the vacuum world using first-order logic that works for arbitrary grid sizes.  
Note that you can use the usual quantifiers for all ∀ and there exists ∃, equality =, integers and logical operations on them as well as a constant s to denote denote the size of the grid. 

Actions

Go(x, y) P:In(Shakey, x) ∧ Location(x) ∧ Location(y), D: In(Shakey, y), A: In(Shakey, y)

Push(b, x, y) P:In(Shakey, x) ∧ In(b, x) ∧ Box(b) ∧ Location(y) D: In(Shakey, x), In(b, x) A: In(Shakey, y), In(b, y)

ClimbUp(b) P:In(Shakey, x) ∧ In(b, x) ∧ Box(b) D: In(Shakey, x) A: On(Shakey, b)

ClimbDown(b) P:On(Shakey, b) ∧ In(b, x) D: On(Shakey, b) A: In(Shakey, x)

TurnOn(s)   P:On(Shakey, b) ∧ In(b, s) A: light(s)

TurnOff(s)   P: On(Shakey, b) ∧ In(b, s) ∧ light(s), D:light(s)

Rules

(a) 吸尘规则(如果当前位置有脏点,则吸尘)

(b) 向前移动规则(如果当前位置没有脏点,且前方格子在网格内且未清理,则向前移动)

(c) 转向规则(当前格子无脏点,且前方无脏点或不可走,则转向)

subsumption architecture

Design a solution for the vacuum-world example following Brooks’ subsumption architecture. Identify how the design needs to change to allow an additional action to charge the robot at the home location whenever it runs low on power. Discuss how this design compares with the design based on deductive reasoning. 

Overview of action selection in the subsumption architecture:

  • Set of all behaviours Beh = {(c, a) | c ⊆ P er and a ∈ Ac}

  • Behaviour will fire in state s iff see(s) ∈ c

  • Agent’s set of behaviours R ⊆ Beh, inhibition relation <⊆ R × R

  • < is a strict total ordering (transitive, irreflexive, antisymmetric)

  • If b1 < b2, b1 will get priority over b2

  • Action selection in the subsumption architecture:

1.  function action(p: P er) : Ac
2.      var fired := ∅, selected : A
3.      begin
4.          fired ← {(c, a) | (c, a) ∈ R and p ∈ c}
5.          for each (c, a) ∈ fired do
6.              if ¬∃(c', a') ∈ fired such that (c', a') < (c, a) then
7.                  return a
8.          return null
9.      end

LayerBehaviorDescription
0Obstacle AvoidanceImmediately avoids obstacles using sensors; cannot be interrupted.
1RechargeWhen battery is low, suppresses all non-survival behaviors and returns to home to charge.
2Clean Dirty SpotsUses dirt sensors to clean dirty locations.
3Random WanderingRoams randomly when no higher-priority task is active.

In deductive systems, charging must be explicitly defined as a goal. When the battery is low, the system re-plans its actions, which allows flexibility but adds complexity and slower response.
In contrast, the subsumption design handles charging by priority—it's faster, simpler, and more robust in dynamic environments.

inhibition relation

Suppose we allow an additional action “move in random direction” to avoid having to specify conditions for each and every situation and to analyse whether the robot can get stuck.
We consider the following percepts: “There is dirt in the robot current location”, and “The robot is facing a wall”.
Describe the robot's behaviour and the corresponding inhibition relation.
Modify further and introduce behaviors related to battery state and the corresponding inhibition relation!

Domain Predicates:
In(x, y): Robot is at (x, y).
Dirt(x, y): There is dirt at (x, y).
Facing(d): Robot is facing direction d ∈ {N, E, W, S}.

Actions (original):
suck: clean dirt at current position
turn: turn right(90)
forward: move forward (if no wall)

Actions (New):
move_random: pick a random direction and move (used as a fallback)
go_home: go to charging station
charge: recharge battery

Rules:

In(x, y) ∧ Dirt(x, y) → do(suck)
FacingWall(d) → do(turn)
¬Dirt(x, y) ∧ ¬FacingWall(d) → do(forward)
Default → do(move_random)
LowBattery ∧ AtHome → do(charge)
LowBattery ∧ ¬AtHome → do(go_home)

 

Priority

charge < turn < go_home < suck < forward < move_random
 

22-23 q1a

(a) Consider a 4-by-4 cell Vacuum World as follows:

where “R” represents a robot agent, “H” represents a hole, “B” represents a battery station and “*” represents dirt. 

1.    Develop a set of rules (including predicates and actions) that can be used to describe the above 4-by-4 cell Vacuum World. 

Domain Predicates:
In(x, y): Robot is at (x, y).
Dirt(x, y): There is dirt at (x, y).
Facing(d): Robot is facing direction d ∈ {N, E, W, S}.
Hole(x,y) there is hole at(x,y)
Battery(x,y) there is battery at(x,y)

Actions (original):
suck: clean dirt at current position
turn: turn right(90)
forward: move forward (if no wall)

Actions (New):
move_random: pick a random direction and move (used as a fallback)
charge: recharge battery
go_home: go to charging station


2.    Use these rules to instruct the following: plan
    the robot to clean up all the dirt starting from (4, a) while avoiding falling into any hole;
    the robot must pass the battery station to get recharged for at least one time;
    after having cleaned all the dirt, the robot is located at (3,d).

3. Design a subsumption architecture for the 4-by-4 cell Vacuum World and use inhibition to coordinate the behaviors. 

LayerBehaviorCan InhibitInhibited by
1Avoid Hole2, 3, 4
2Charge Battery3, 41
3Clean Dirt41, 2
4Explore/Move1, 2, 3

Shakey's World 

Shakey’s world consisting of four rooms lined up along a corridor

each room has a door and a light switch.

The actions in Shakey’s world include moving from place to place, pushing movable objects (such as boxes), climbing onto and down from rigid objects (such as boxes), and turning light switches on and off. To turn a light on or off, Shakey must be on top of a box at the light switch’s location. 
Shakey’s six actions are the following: 
    Go(x, y) form x to y, which requires that Shakey be at location x.
    Push a box b from location x to location y:  Push(b, x, y).
    Climb onto a box: ClimbUp(b); climb down from a box: ClimbDown(b).
    Turn a light switch on: TurnOn(s); turn it off: TurnOff(s).

e.g desceibe predicates

You are given the following constants: Room1,... , Room4, Corridor, Shakey, Box1,... , Box4, Switch1, ... , Switch4, and Floor.  
Suggest predicates to describe the domain!  

1) In (o,x) object o Shakey/box/switch is in location x e.g In(Shakey, Room3)=True
2) light(x) light is on in location x,
3) Box(o) denote the object o is a box
4)On(o1, o2) object o1 is stacked on object o2
5)  Location(o) denote the object o is a location

plan

Q1 Give the pre-condition list, delete list, and the add list of the six actions.


Q2 Formally describe the initial state / beliefs B0 of the figure above.


Q3 Calculate a plan π that would achieve in intention i = Box2 in Room2, given the beliefs B0.

form T4

Blocks world

from T4

Observe the picture for the Initial State/Beliefs B0 and Goal/Intention i in a particular Blocksworld. Calculate an efficient plan π that would achieve i, given the beliefs B0. 

predicates

On(x,y) object x on top of object y

OnTable(x) object x is on the table

Clear(x) nothing is on top of object x

Holding(x) arm is holding x

ArmEmpty()

actions

Stack(x, y)

Clear(y) ∧ Holding(x)

Clear(y) ∧ Holding(x)

ArmEmpty ∧ On(x, y)

Place object x (held) on top of object y

UnStack(x, y)

On(x, y) ∧ Clear(x) ∧ ArmEmpty

On(x, y) ∧ ArmEmpty

Holding(x) ∧ Clear(y)

Pick up object x from on top of object y

Pickup(x)

Clear(x) ∧ OnTable(x) ∧ ArmEmpty

OnTable(x) ∧ ArmEmpty

Holding(x)

Pick up object x from the table

PutDown(x)

Holding(x)

Holding(x)

OnTable(x) ∧ ArmEmpty ∧ Clear(x)

Put down object x onto the table

plan

1. UnStack(A, B) 2. PutDown(A) 3. Pickup(B) 4. Stack(B, C) 5. Pickup(A) 6. Stack(A, B)

Utilitie function

Tutorial 3

Redo the Lecture Example 2 on finding the best agent where we have non-deterministic/stochastic world and utilities over runs (Page 71 of Lecture 2 Slides). 

Concurrent MetateM 

lec 4

In Concurrent MetateM, each agent is programmed by giving it a temporal logic specification so the truth of proposition changes over time. 
We describe some of the operators with example in the lecture, as summarized here: 

Beginning of time: special nullary operator start satisfied only at the beginning. Agent interface consists of 

    unique agent identifier
    (environment propositions) i.e. messages accepted by the agent
    [component propositions] i.e. messages agent will send

For example: stack(pop, push)[popped, full] 

Computational engine based on MetateM, based on program rules: antecedent about past ⇒ consequent about present and future 

Agents are trying to make present and future true given past 

    Collect constraints with old commitments
    These taken together form current constraints
    Next state is constructed by trying to fulfil these
    Disjunctive formula ⇒ choices
    Unsatisfied commitments are carried over to the next cycle

Given the following program "Snow White" in Concurrent MetateM. 

    The idea is that Snow White has some sweets (resources), which she will give to the Dwarves (resource consumers) who ask.
    She will only give to one dwarf at a time.
    She will always eventually give to a dwarf that asks.

Describe more completely what the program does (the dwarves do) and trace its operation in a table for the first three time steps! 

commitment

2425 mock q2a

Explain what you understand by Blind commitment, Single-minded commitment and Open-minded commitment. (Lec 4)

1. Blind Commitment

Keeps an intention until it believes the goal is achieved, no matter what.

Ignores whether the goal is still possible.

2. Single-minded Commitment

Keeps an intention until it believes the goal is achieved or impossible.

Gives up only if success is clearly not possible.

3. Open-minded Commitment

Keeps an intention as long as it believes the goal is still possible.

More flexible and responsive to new information.

2425 mock q2a

The following pseudo-code defines a control loop for a practical reasoning(BDI) agent:

Recall that “Practical Reasoning = deliberation + means ends reasoning”. With reference to the above code, answer the following questions:
1.    What commitment protocol is used in this code?


2.    What should be modified in this code if the commitment protocol “Opened-minded commitment” is used?


3.    What should be modified in this code if the commitment protocol “Single-minded commitment” is used?


4.    Assume the commitment protocol “Single-minded commitment” is used in the above code. When should an agent stop to reconsider its intentions?

subsumption architecture

Describe what is an “subsumption architecture”. Give an example of subsumption architecture and explain how such a subsumption architecture operates. 

definition: A subsumption architecture is a hierarchy of task-accomplishing behaviours. Each behavior is a basic rule that competes for control.
Lower layers (e.g., obstacle avoidance) take priority over higher ones.

auction

wk9 

Q1. Consider the statement:  “Bidding one’s own valuation in a Vickrey  auction is the dominant strategy for a rational agent.”. If you think the  statement is true, show that statement is true. Otherwise, give a counter- example to show the statement is false.  

wk9

Briefly comment on the English, Vickrey, first-price sealed bid, or Dutch auction protocols which guards better against bidder collusion.

2425 Mock Q1b auction

Explain briefly the principal characteristics and differences between a Vickrey auction and a Dutch auction. 

A Vickrey auction is a sealed-bid auction where the highest bidder wins but pays the second-highest price.

A Dutch auction is an open descending-price auction where the price drops until a bidder accepts. 

The key difference is that Vickrey auctions use sealed bids and promote honest bidding, while Dutch auctions reveal prices publicly and rely on timing decisions.

2324 Q3c

(e) If you believe that there is a dominant strategy for bidders in the English auction, then justify that. If not, give a counterargument.

Dominant strategy: bid a small amount above highest current bid until one’s own valuation is reached

2223 Q4 c auction

What do you understand by a Vickrey auction and a Dutch auction. Also, how can auctioneer and bidder cheat in an English auction?

Vickrey auction: - Highest bidder wins but pays second-highest bid 
Dutch auction: price decreases until a bidder accepts
Cheat in English auction: Highest bidder wins but pays second-highest bid

payoff matrix

2425 Mock Q5b

 Player 1 can play A, B, C or D; and Player 2 can play X or Y. Consider the following payoff matrix:

Determine if either player has any dominant strategy and justify your answer.

Player 1 (row) dominant strategy:

  • Compare strategy A with others:

    • A vs B: A better in both X (6>4) and Y (5>4)

    • A vs C: A better or equal in both X (6>3) and Y (5=5)

    • A vs D: A better in X (6>5), but worse in Y (5<6)

  • Since A is not better than D in all cases, Player 1 has no strictly dominant strategy


Player 2 (column) dominant strategy:

  • Compare strategy Y with X:

    • For Player 2’s payoffs: Y is equal or better than X in all cases, and strictly better for B (3>2) and D (5>4).

  • Therefore, Player 2’s strategy Y is strictly dominant.

2425 Mock Q5 c

Consider the game where Player 1 plays A, B, C, D; and Player 2 plays R, Q, S, T with the following payoff matrix:

With reference to the matrix above, answer the following questions:

1.    Identify with justification, if any, the pairs that are in Nash equilibrium.

Step 1: Player 1's best responses to Player 2's strategies:

  • To R: best response is A (payoff 2)

  • To Q: best response is B (payoff 4)

  • To S: best response is C (payoff 3)

  • To T: best response is D (payoff 5)

Step 2: Player 2's best responses to Player 1's strategies:

  • To A: best response is R (payoff 2)

  • To B: best response is Q (payoff 3)

  • To C: best response is S (payoff 4)

  • To D: best response is T (payoff 5)

nash equilibria are strategy pairs where both players' strategies are mutual best responses:

(A,R),(B,Q),(C,S),(D,T)

At these points, neither player can improve payoff by unilaterally deviating.


2.    State whether any outcomes are Pareto optimal. Justify your answer.

To find Pareto optimal outcomes, eliminate all strategy pairs that are dominated by others (where both players get equal or higher payoffs and at least one strictly higher).

In this game, all low-payoff pairs like (1,1) are dominated.

The four Nash equilibria (A,R),(B,Q),(C,S),(D,T) with payoffs (2,2),(4,3),(3,4),(5,5)are not dominated by any other pairs.

Therefore, these four are Pareto optimal.


3.    Identify with justification, if any, the pairs that maximizes the social welfare.

PairPayoffs (uj,ui)Total W
(A, R)(2, 2)4
(B, Q)(4, 3)7
(C, S)(3, 4)7
(D, T)(5, 5)10
Others(1, 1)2

The pair (D, T) maximizes social welfare with a total payoff of 10.

2324 Q2  abc 

(a) Is the strategy (Invest, Invest) a Pareto optimal outcome? Justify your answer.

(b) Give all Pareto optimal outcomes. Justify your answer.

(c) Give all pure strategy Nash equilibria. Justify your answer.

2324  Q2d

 (d) Give an example of a payoff matrix where all individual payoff values are different, but all strategies are equally good in terms of social welfare. Justify your answer.

2324  Q2e

 Tutorial 5

Given the following payoff matrix. 

a)    For the matrix above, is (2, 1) a Pareto optimal outcome? Explain. If not, find one or more Pareto optimal outcomes!
b)    Find one or more pure strategies NE!

Tutorial 5

Find a pure strategy Nash Equilibrium of this game/payoff matrix: 

Game of Chicken 

tutorial5

In the Game of Chicken, two people drive two very fast cars towards each other from opposite ends of a long straight road. If one of them swerves before the other, he is called a chicken. Of course, if neither swerves, they will crash.  

    The worst possible payoff is to crash into each other, so we assign this a value 0.
    The best payoff is to have your opponent be the chicken, so we assign this a value 3.
    The next to worst possibility is to be the chicken, so we assign this a value 1.
    The last possibility is that both drivers swerve. Then, neither has less honor than the other, so this is preferable to being the chicken. However, it is not quite as good as being the victor, so we assign it a value 2.

S: Swerve 
D: Drive straight 

Find a pure strategy Nash Equillibrium! 

 The Prison's Dilemma

Lec5

Two men are collectively charged with a crime and held in separate cells, with no way of meeting or communicating. 
They are told that: 

    If one confesses and the other denies (C,D) or (D,C), the confessor will be freed, and the other will be jailed for three years;
    If both confess (C,C), then each will bejailed for two years.
    Both prisoners know that if neither confesses (D, D), then they will each bejailed for one year

The corresponding payoff matrix is given below. 

a)    An outcome is Pareto optimal if there is no other outcome where at least one player is better off and no player is worse off compared to that outcome.
For the matrix above, is (-2, -2) a Pareto optimal outcome? Explain.
If not, what is the Pareto optimal outcome?
b)    A strategy is a pure strategy Nash Equilibrium (NE) if neither player can improve their outcome by switching their strategy.
Is (D, D) a pure strategy NE? Explain.
If not, what is a pure strategy NE for that matrix?
c)    What strategy maximizes the social welfare?

gpt

XYZ
P(3, 2)(1, 4)(0, 1)
Q(2, 3)(2, 2)(1, 0)
R(0, 1)(3, 3)(4, 4)

Nash:

j-i best: X-P,Y-R, Z-R

i-j best: P-Y, Q-X, R-Z

Nash e: Z-R

Pareto:
Z-R

social warfare

组合收益对 (uj,ui)社会福利总和 uj+ui
(P, X)(3, 2)5
(P, Y)(1, 4)5
(P, Z)(0, 1)1
(Q, X)(2, 3)5
(Q, Y)(2, 2)4
(Q, Z)(1, 0)1
(R, X)(0, 1)1
(R, Y)(3, 3)6
(R, Z)(4, 4)8

dominant

j: 

比较p和q,没有绝对占优

q和r,没有绝对占优

p和r,没有绝对占优

i:

x-y,没有

y-z,没有

z-x 没有

无玩家列占优策略

coalitional game

concepts

2425 mock Q4a coalitional game

Explain what you understand by the Core of a coalitional game and what happens when the core is empty?

2324 Q3 ad

(a) Explain the two concepts: the Core of a coalitional game and the Shapley value.

(d) For the statement: The Shapley value of any characteristic function belongs to the core. If you think that the statement is correct, then justify it. Otherwise, give a counterargument.

Shapley

wk8

 

calculate Sh({1,2},1) and Sh({1,2},2),分别计算玩家 1 和 2 的 Shapley 值

1.对每个排列,计算该玩家加入时带来的 边际贡献(Marginal Contribution),然后对所有排列求平均(因为两个玩家,有 2!=22! = 22!=2 个排列,所以平均就是除以 2)

wk 8 

 Consider the following characteristic function:  

v({1})=100,v({2})=125,v({3})=50,

v({1,2})=270,v({1,3})=375,v({2,3})=350 and v({1,2,3})=500

Compute the Shapley values for the agents 1, 2 and 3.

sol

天哪这个三个的太多了,到时候慢慢写
 

wk8

Three students share a taxi. Here are the costs for each individual journey: 

    Player 1 - Zhao: 6 (rmb)
    Player 2 - Wang: 12 (rmb)
    Player 3 - Xu: 42 (rmb)

(a). Construct the characteristic function of the above game. 

 (b)Find a fair way of sharing taxi fare for Zhao, Wang and Xu. 

徐的 Shapley 值 ϕ3​:

2425 mock q4c

(c)    Consider the coalitional game with agents Ag = {a, b, c } and characteristic function ν defined by the following:

Compute the Shapley values for the agents a, b, and c. You are required to show the relevant steps in your answers about how you have obtained the values. 

weighted subgraph

25 mock Q4b Weight graph of characteristic function

Consider the following weighted subgraph representation of a characteristic function:

Let µ be the characteristic function defined by the above subgraph. Give the values of µ({v1}), µ({v2}), µ({v3}), µ({v1, v2}), µ({v1, v3}), µ({v2, v3}) and µ({v1, v2, v3}).

µ({v1}) = 1,µ({v2})= 1, µ({v3}) = 1

µ({v1, v2}) = 2+1+1 = 4  , µ({v1, v3})= 4+1+1 = 6 , µ({v2, v3})= 3+1+1 = 5

µ({v1, v2, v3}) =2 + 4 + 3 + 1 + 1 + 1 = 12

Voting

wk 9.5

Plurality voting is NOT a good approach to select among four candidates. If you think it is true, explain the statement. Otherwise, give a counterexample to show that the statement is false.

wk10

A majority candidate: A candidate with more than half of the first place votes is a majority candidate. 
Plurality-With-Elimination Method: This method is a preferential voting method and is a variation of the Plurality Method. Plurality with Elimination is carried out in rounds. After each round of voting the candidate (or alternative) with the fewest first place votes is eliminated and a new round of voting is done with the remaining candi-dates. When only two candidates remain in a round, the candidate with the most votes wins the election. 
The Method of Pairwise Comparisons: Each candidate (or alternative) is matched head-to-head (one-on-one) with each of the other candidates. Each candidate (alternative) gets 1 point for a one-on-one win and a half a point for a tie. The candidate (alternative) with the most total points is the winner. 

Q1.  How do you decide the winner of an election using the Plurality Method
The candidate with the most first place votes wins. 
Q2. How do you decide the winner of an election using the Borda Count Method?
1 point is awarded for last place, 2 points for second to last place, etc. You can then find a point total for every candidate and the candidate with the highest point total wins. 
Q3. How do you decide the winner of an election using the Method of Pairwise Com-parisons
Every candidate goes head to head. The candidate with the most victories wins. 
Q4. Does every election have a majority candidate
Not every election has one. 

wk10

What is a Condorcet candidate? Does every election have a Condorcet candidate?
A candidate who wins against every other candidate in a pairwise comparison (head to head match up) is a Condorcet candidate. Not every election has one. 
 

wk10

For this question, use the following preference schedule: 

(a).  Find the winner of the election using the Plurality Method

(b). Find the winner of the election using the Borda Count Method.

(c). Find the winner of the election using the Method of Pairwise Comparisons. Is there any Condorcet winner in the election? 

PairWinnerScore
A vs BB23–14
A vs CC23–14
A vs DD23–14
B vs CC19–18
B vs DB28–9
C vs DC25–12

Candidate C wins the most pairwise comparisons so he wins as a Condorcet candidate 

wk 9.5

A class of 18 CPT302 students uses the Borda count method (starting with 0) to select one of four candidates: Adam, Boby, Chuck or David. If Adam receives 35 Borda counts, Boby receives 28 Borda counts, and Chuck receives 20 Borda counts, how many Borda counts does David receive? Who wins this Borda election? 

wk 9.5

There are 6 PhD students are for the best Teaching Assistant (TA) of the year award. The decision is made by 100 module leaders using the plurality method. After the first 60 votes have been cast, the situation is as follows:

What is the minimal numbers of the remaining votes Peter needs to be guaranteed to win? Note that you are required to include the relevant steps in your answers to show all your work.

p denotes the number of votes Peter needs to make sure at least a tie with Diana for first place.
Then p+22=18+(40-p) and p=18.
So, if Peter gets 18+1=19 votes from the remaining 40 to guarantee that Peter will be the best TA.

 

Contract Net

wk11

Explain the five stages of task-sharing protocol in Contract Net. 

Recognition: An agent realizes it needs help to complete a task due to lack of resources or skills.
Announcement: The agent broadcasts a task description and requirements to other agents.
Bidding: Interested agents evaluate the task and decide whether to submit a bid based on their capabilities and costs.
Awarding: The announcing agent selects the best bid that meets its criteria and assigns the task.
Expediting: The chosen agent executes the task and may initiate further contracts if needed.


 

CDPS

wk11

This question considers Cooperative Distributed Problem Solving (CDPS). Answer the following two questions

With the aid of an example, explain what is meant by CDPS and why the concept of “cooperation” plays a relevant role in CDPS.

Ans: 
CDPS (Cooperative Distributed Problem Solving) refers to a system where multiple intelligent, independent agents work together to solve problems too complex for any one of them alone. Cooperation is essential because each agent has only partial knowledge or ability. For example, aircraft tracking by sensors in different locations requires combining their distinct data to understand overall movements.


Discuss also the main issues that arise in CDPS.
How to divide problems into sub-tasks for agents.
How to combine sub-solutions into a complete answer.
How to optimize agent activities for coherent results.
How to coordinate agents to avoid conflict and enhance cooperation.


http://www.hkcw.cn/article/jhaCbrfnWB.shtml

相关文章

太阳诱电多层陶瓷电容器的优势和特点

基于电容器市场需求或将扩大的方向性战略所开展的产品研发 除多层陶瓷电容器外&#xff0c;电容器还包括电解电容器和薄膜电容器等类型。随着节能化、物联网化的进一步加速发展&#xff0c;可以预见高性能电容器的需求量将在中长期内有所增长。 多层陶瓷电容器对于实现电子设…

Chrome v131.0.6778.86 绿色便携版 下载

Google Chrome浏览器增强版&#xff0c;采用shuax便携式Dll劫持补丁加入原版打包而成&#xff0c; Chrome增强软件模块&#xff0c;强制实现flash插件支持&#xff0c;解除Adobe Flash Player地区不相容限制和移除警告提示&#xff0c;增强标签页功能。 百度网盘&#xff1a;ht…

PYTHON调用讯飞唤醒实现麦克风说话机器人离线唤醒

引言 语音唤醒技术是现代智能语音交互系统中的重要组成部分&#xff0c;它允许设备在待机状态下通过特定的唤醒词进入交互状态。本文将介绍如何使用Python结合讯飞语音SDK实现一个简单的语音唤醒系统。 技术背景 语音唤醒技术主要依赖于以下几个关键技术点&#xff1a; 声学…

做销售讲究接地气

你有没有遇到过这种情况&#xff1f;两个人聊了半天&#xff0c;你越说对方越皱眉&#xff0c;最后礼貌地说"我再考虑考虑"。其实不是产品不够好&#xff0c;而是没戳中对方心里那根弦。做销售最讲究的就是"接地气"。 和人打交道就像炒菜&#xff0c;火候…

ImBatch 7.6.3 中文版 - 高效图片批量处理工具

ImBatch是一款专业高效的图片批量处理工具。它提供强大的图像编辑功能&#xff0c;包括裁剪、尺寸调整、旋转等操作&#xff0c;并内置数十种专业工具&#xff0c;能满足各类复杂的图像处理需求。软件界面已全面中文化&#xff0c;操作更加便捷直观。 ImBatch下链接&#xff1…

Python+requests+pytest接口自动化测试框架的搭建(全)

&#x1f345; 点击文末小卡片&#xff0c;免费获取软件测试全套资料&#xff0c;资料在手&#xff0c;涨薪更快 框架的设计思路 首先要明确进行接口自动化需要的步骤&#xff0c;如下图所示&#xff1a; 然后逐步拆解需要完成的工作&#xff1a; 1&#xff09;了解分析需求&…

C#定时器深度对比:System.Timers.Timer vs System.Threading.Timer性能实测与选型指南

本文通过真实基准测试揭秘两种常用定时器的性能差异&#xff0c;助你做出最佳选择 一、C#定时器全景概览 在C#生态中&#xff0c;不同定时器适用于不同场景。以下是主流定时器的核心特性对比&#xff1a; 定时器类型命名空间适用场景触发线程精度内存开销依赖框架System.Wind…

简单配置RHEL9.X

切换默认运行级别 将系统默认启动模式从多用户的图形界面调整为多用户的文本界面&#xff0c;适用于优化系统资源占用或进行远程服务器管理的场景。 注意&#xff1a;安装选择“带GUI的服务器”部分常用命令默认安装&#xff1b;如果选择“最小安装”时&#xff0c;部分常用命…

【运维实战】Linux 中su和sudo之间的区别以及如何配置sudo!

Linux 系统相比其他操作系统具有更高的安全性&#xff0c;其安全机制的核心之一在于用户管理策略和权限控制--普通用户默认无权执行任何系统级操作。 若普通用户需要进行系统级变更&#xff0c;必须通过su或sudo命令提权。 1.su与sudo的本质区别 su 要求直接共享 root 密码&…

基于Android的记录生活APP_springboot+vue

开发语言&#xff1a;Java框架&#xff1a;springboot AndroidJDK版本&#xff1a;JDK1.8服务器&#xff1a;tomcat7数据库&#xff1a;mysql 5.7数据库工具&#xff1a;Navicat12开发软件&#xff1a;eclipse/myeclipse/ideaMaven包&#xff1a;Maven3.6 系统展示 APP登录 A…

2025年渗透测试面试题总结-匿名[校招]攻防研究员(应用安全)(题目+回答)

安全领域各种资源&#xff0c;学习文档&#xff0c;以及工具分享、前沿信息分享、POC、EXP分享。不定期分享各种好玩的项目及好用的工具&#xff0c;欢迎关注。 目录 匿名[校招]攻防研究员(应用安全) 基础部分 1. HTTP状态码 2. HTTP请求方法及作用 3. 网络分层及协议 OW…

区域未停留检测算法AI智能分析网关V4打造铁道/工厂/机场等场景应用方案

一、背景 在工业生产、公共场所管理等场景中&#xff0c;特定区域的人员/物体停留时间管控关乎作业效率与安全。传统监控系统仅能录像存证&#xff0c;无法主动分析停留行为。AI智能分析网关V4的区域未停留检测功能&#xff0c;依托智能算法实现实时监测与异常告警&#xff0c…

第Y5周:yolo.py文件解读

&#x1f368; 本文为&#x1f517;365天深度学习训练营 中的学习记录博客&#x1f356; 原作者&#xff1a;K同学啊 本次任务&#xff1a;将YOLOv5s网络模型中的C3模块按照下图方式修改形成C2模块&#xff0c;并将C2模块插入第2层与第3层之间&#xff0c;且跑通YOLOv5s。 任务…

无人机桥梁3D建模、巡检、检测的航线规划

无人机桥梁3D建模、巡检、检测的航线规划 无人机在3D建模、巡检和检测任务中的航线规划存在显著差异&#xff0c;主要体现在飞行高度、航线模式、精度要求和传感器配置等方面。以下是三者的详细对比分析&#xff1a; 1. 核心目标差异 任务类型主要目标典型应用场景3D建模 生成…

【FlashRAG】本地部署与demo运行(一)

FlashRAG 简介 FlashRAG 是一种高效检索增强生成&#xff08;Retrieval-Augmented Generation, RAG&#xff09;技术&#xff0c;旨在优化大规模语言模型&#xff08;LLMs&#xff09;的推理性能&#xff0c;尤其在处理长上下文或复杂查询时。其核心特点是结合了快速检索与动态…

低功耗架构突破:STM32H750 与 SD NAND (存储芯片)如何延长手环续航至 14 天

低功耗架构突破&#xff1a;STM32H750 与 SD NAND &#xff08;存储芯片&#xff09;如何延长手环续航至 14 天 卓越性能强化安全高效能效图形处理优势丰富集成特性 模拟模块实时监控保障数据完整性提升安全性与可靠性测量原理采样率相关结束语 在智能皮电手环及数据存储技术不…

MySQL之约束和表的增删查改

MySQL之约束和表的增删查改 一.数据库约束1.1数据库约束的概念1.2NOT NULL 非空约束1.3DEFAULT 默认约束1.4唯一约束1.5主键约束和自增约束1.6自增约束1.7外键约束1.8CHECK约束 二.表的增删查改2.1Create创建2.2Retrieve读取2.3Update更新2.4Delete删除和Truncate截断 一.数据库…

在线制作幼教早教行业自适应网站教程

你想知道怎么做自适应网站吗&#xff1f;今天就来教你在线用模板做个幼教早教行业的网站哦。 首先得了解啥是自适应网站。简单说呢&#xff0c;自适应网站就是能自动匹配不同终端设备的网站&#xff0c;像手机、平板、电脑等。那如何做幼早教自适应网站呢&#xff1f; 在乔拓云…

[特殊字符] 超强 Web React版 PDF 阅读器!支持分页、缩放、旋转、全屏、懒加载、缩略图!

在现代 Web 项目中&#xff0c;PDF 浏览是一个常见需求&#xff1a;从政务公文到合同协议&#xff0c;PDF 文件无处不在。但很多方案要么体验不佳&#xff0c;要么集成复杂。今天&#xff0c;我给大家带来一个开箱即用、功能全面的 PDF 预览组件 —— [PDFView](https://www.np…

裂缝仪在线监测装置:工程安全领域的“实时守卫者”

在基础设施运维领域&#xff0c;裂缝扩展是威胁建筑结构安全的核心隐患之一。传统人工巡检方式存在效率低、时效性差、数据主观性强等局限&#xff0c;而裂缝仪在线监测装置通过技术迭代&#xff0c;实现了对结构裂缝的自动化、持续性追踪&#xff0c;为工程安全评估提供科学依…