osc.viz.rollout.self_attn_rollout
- self_attn_rollout(attns, head_reduction='mean', adjust_residual=True, global_avg_pool=True)[source]
Self-attn rollout: how much output token(s) attend to input tokens across layers
- Parameters
attns (
Union
[Mapping
[str
,Tensor
],Sequence
[Tensor
]]) – dict or list where each entry has shape [B heads Q K]head_reduction (
Union
[str
,Callable
]) – ‘mean’, ‘max’, or a callable that reduces the head dimensionadjust_residual – bool, whether to add 0.5 for the self connection
global_avg_pool – bool, if the output of the final attention layer is avg-pooled into a single vector of features
- Returns
Rollout, shape [B Q K] if
global_avg_pool=False
else [B K]