osc.viz.rollout.self_attn_rollout
- self_attn_rollout(attns, head_reduction='mean', adjust_residual=True, global_avg_pool=True)[source]
Self-attn rollout: how much output token(s) attend to input tokens across layers
- Parameters
attns (
Union[Mapping[str,Tensor],Sequence[Tensor]]) – dict or list where each entry has shape [B heads Q K]head_reduction (
Union[str,Callable]) – ‘mean’, ‘max’, or a callable that reduces the head dimensionadjust_residual – bool, whether to add 0.5 for the self connection
global_avg_pool – bool, if the output of the final attention layer is avg-pooled into a single vector of features
- Returns
Rollout, shape [B Q K] if
global_avg_pool=Falseelse [B K]