news 2026/6/9 21:01:29

Kronos解析

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
Kronos解析

模型结构

<bound method Module.parameters of Kronos(
(token_drop): Dropout(p=0.0, inplace=False)
(embedding): HierarchicalEmbedding(
(emb_s1): Embedding(1024, 832)
(emb_s2): Embedding(1024, 832)
(fusion_proj): Linear(in_features=1664, out_features=832, bias=True)
)
(time_emb): TemporalEmbedding(
(minute_embed): Embedding(60, 832)
(hour_embed): Embedding(24, 832)
(weekday_embed): Embedding(7, 832)
(day_embed): Embedding(32, 832)
(month_embed): Embedding(13, 832)
)
(transformer): ModuleList(
(0-11): 12 x TransformerBlock(
(norm1): RMSNorm()
(self_attn): MultiHeadAttentionWithRoPE(
(q_proj): Linear(in_features=832, out_features=832, bias=True)
(k_proj): Linear(in_features=832, out_features=832, bias=True)
(v_proj): Linear(in_features=832, out_features=832, bias=True)
(out_proj): Linear(in_features=832, out_features=832, bias=True)
(rotary): RotaryPositionalEmbedding()
(resid_dropout): Dropout(p=0.2, inplace=False)
)
(norm2): RMSNorm()
(ffn): FeedForward(
(w1): Linear(in_features=832, out_features=2048, bias=False)
(w3): Linear(in_features=832, out_features=2048, bias=False)
(w2): Linear(in_features=2048, out_features=832, bias=False)
(ffn_dropout): Dropout(p=0.2, inplace=False)
)
)
)
(norm): RMSNorm()
(dep_layer): DependencyAwareLayer(
(cross_attn): MultiHeadCrossAttentionWithRoPE(
(q_proj): Linear(in_features=832, out_features=832, bias=True)
(k_proj): Linear(in_features=832, out_features=832, bias=True)
(v_proj): Linear(in_features=832, out_features=832, bias=True)
(out_proj): Linear(in_features=832, out_features=832, bias=True)
(rotary): RotaryPositionalEmbedding()
(resid_dropout): Dropout(p=0.0, inplace=False)
)
(norm): RMSNorm()
)
(head): DualHead(
(proj_s1): Linear(in_features=832, out_features=1024, bias=True)
(proj_s2): Linear(in_features=832, out_features=1024, bias=True)
)
)>

def forward(self, s1_ids, s2_ids, stamp=None, padding_mask=None, use_teacher_forcing=False, s1_targets=None):

输入 token后的 s1_ids, s2_ids shape为 [1,400] [1,400]

x = self.embedding([s1_ids, s2_ids])

HierarchicalEmbedding

token_ids (torch.Tensor): Composite token IDs of shape [batch_size, seq_len] or [N], each in range [0, 2^(s1_bits + s2_bits) - 1]. 2^(s1_bits + s2_bits) - 1 这个哪里来的? token我找找
BSQuantizer
def bits_to_indices(self, bits): bits = (bits >= 0).to(torch.long) indices = 2 ** torch.arange( 0, bits.shape[-1], 1, dtype=torch.long, device=bits.device, ) return (bits * indices).sum(-1)

bits_to_indices(bits) ∈ [0, 2^N − 1]

s1_emb = self.emb_s1(s1_ids) * math.sqrt(self.d_model) s2_emb = self.emb_s2(s2_ids) * math.sqrt(self.d_model) return self.fusion_proj(torch.cat([s1_emb, s2_emb], dim=-1))
if stamp is not None: time_embedding = self.time_emb(stamp) x = x + time_embedding
TemporalEmbedding
x = self.token_drop(x)
for layer in self.transformer: x = layer(x, key_padding_mask=padding_mask) x = self.norm(x)
s1_logits = self.head(x)
DualHead
if use_teacher_forcing: sibling_embed = self.embedding.emb_s1(s1_targets) else: s1_probs = F.softmax(s1_logits.detach(), dim=-1) sample_s1_ids = torch.multinomial(s1_probs.view(-1, self.s1_vocab_size), 1).view(s1_ids.shape) sibling_embed = self.embedding.emb_s1(sample_s1_ids) x2 = self.dep_layer(x, sibling_embed, key_padding_mask=padding_mask) # Dependency Aware Layer: Condition on s1 embeddings 这个DependencyAwareLayer跨注意力(cross-attention)让一个子表示(sibling/subtoken)去感知并注入主序列 hidden states 的依赖信息,从而显式建模不同子表示之间的结构依赖关系。 s2_logits = self.head.cond_forward(x2) return s1_logits, s2_logits
计算损失 def compute_loss(self, s1_logits, s2_logits, s1_targets, s2_targets, padding_mask=None): if padding_mask is not None: valid_mask = (padding_mask == 0) s1_logits = s1_logits[valid_mask] s2_logits = s2_logits[valid_mask] s1_targets = s1_targets[valid_mask] s2_targets = s2_targets[valid_mask] ce_s1 = F.cross_entropy(s1_logits, s1_targets) ce_s2 = F.cross_entropy(s2_logits, s2_targets) else: ce_s1 = F.cross_entropy(s1_logits.reshape(-1, self.vocab_s1), s1_targets.reshape(-1)) ce_s2 = F.cross_entropy(s2_logits.reshape(-1, self.vocab_s2), s2_targets.reshape(-1)) ce_loss = (ce_s1 + ce_s2) / 2 return ce_loss, ce_s1, ce_s2
decode_s1
def decode_s1(self, s1_ids, s2_ids, stamp=None, padding_mask=None): """ Decodes only the s1 tokens. This method performs a forward pass to predict only s1 tokens. It returns the s1 logits and the context representation from the Transformer, which can be used for subsequent s2 decoding. Args: s1_ids (torch.Tensor): Input tensor of s1 token IDs. Shape: [batch_size, seq_len] s2_ids (torch.Tensor): Input tensor of s2 token IDs. Shape: [batch_size, seq_len] stamp (torch.Tensor, optional): Temporal stamp tensor. Shape: [batch_size, seq_len]. Defaults to None. padding_mask (torch.Tensor, optional): Mask for padding tokens. Shape: [batch_size, seq_len]. Defaults to None. Returns: Tuple[torch.Tensor, torch.Tensor]: - s1 logits: Logits for s1 token predictions. Shape: [batch_size, seq_len, s1_vocab_size] - context: Context representation from the Transformer. Shape: [batch_size, seq_len, d_model] """ x = self.embedding([s1_ids, s2_ids]) if stamp is not None: time_embedding = self.time_emb(stamp) x = x + time_embedding x = self.token_drop(x) for layer in self.transformer: x = layer(x, key_padding_mask=padding_mask) x = self.norm(x) s1_logits = self.head(x) return s1_logits, x
decode_s2
def decode_s2(self, context, s1_ids, padding_mask=None): """ Decodes the s2 tokens, conditioned on the context and s1 tokens. This method decodes s2 tokens based on a pre-computed context representation (typically from `decode_s1`) and the s1 token IDs. It uses the dependency-aware layer and the conditional s2 head to predict s2 tokens. Args: context (torch.Tensor): Context representation from the transformer (output of decode_s1). Shape: [batch_size, seq_len, d_model] s1_ids (torch.torch.Tensor): Input tensor of s1 token IDs. Shape: [batch_size, seq_len] padding_mask (torch.Tensor, optional): Mask for padding tokens. Shape: [batch_size, seq_len]. Defaults to None. Returns: torch.Tensor: s2 logits. Shape: [batch_size, seq_len, s2_vocab_size] """ sibling_embed = self.embedding.emb_s1(s1_ids) x2 = self.dep_layer(context, sibling_embed, key_padding_mask=padding_mask) return self.head.cond_forward(x2)
版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/6/10 13:54:57

博奥龙Hybridoma Feeder添加因子(含常见问题解答及客户评价)

01、什么是饲养层细胞&#xff1f; 在体外细胞培养中&#xff0c;对于一些难以生长或数量稀少的目的细胞&#xff08;如杂交瘤细胞&#xff09;&#xff0c;需要辅助支持。通常的做法是预先在培养器皿底部铺上一层活细胞&#xff08;如原代细胞或静息的肿瘤细胞&#xff09;&a…

作者头像 李华
网站建设 2026/6/10 13:51:34

LobeChat能否集成Figma插件?设计协作新范式

LobeChat 与 Figma 插件集成&#xff1a;重塑设计协作的智能路径 在今天的数字产品开发流程中&#xff0c;设计师、产品经理和工程师之间的协作效率&#xff0c;往往决定了项目推进的速度与质量。一个常见的场景是&#xff1a;产品经理在会议中突然发问&#xff0c;“最新的登录…

作者头像 李华
网站建设 2026/6/10 9:29:01

Git分支管理策略:适用于大型PyTorch项目协作开发

Git分支管理策略&#xff1a;适用于大型PyTorch项目协作开发 在现代AI研发中&#xff0c;一个再常见不过的场景是&#xff1a;团队成员A兴奋地宣布“我的新模型准确率提升了3%”&#xff0c;可当其他人试图复现结果时&#xff0c;却遭遇了五花八门的问题——CUDA版本不兼容、依…

作者头像 李华
网站建设 2026/6/10 13:56:50

“从开题答辩到终稿提交:一位普通本科生如何借助AI科研助手,在不碰红线的前提下走通毕业论文全流程?”

在高校教学管理日益规范、学术伦理要求愈发严格的今天&#xff0c;本科毕业论文早已不是“随便写写就能过”的形式任务。它既是学术训练的终点&#xff0c;也是科研思维的起点。然而&#xff0c;对于首次接触系统性研究的本科生而言&#xff0c;这场旅程往往伴随着三重困境&…

作者头像 李华
网站建设 2026/6/10 13:55:56

Stable Diffusion AIGC 视觉设计实战教程之 06-提示词应用技巧

正向提示词技巧 正向提示词基础 在 Stable Diffusion 的图像生成过程中&#xff0c;正向提示词书写公式扮演着至关重要的角色。在构建 Stable Diffusion 正向提示词时&#xff0c;主要包含画面内容&#xff08;主体、动作、道具、环境等&#xff09;、画面风格、画面构图、通用…

作者头像 李华