8000 Missing images · Issue #30 · kepano/defuddle · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Missing images #30

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
chenbihao opened this issue Apr 7, 2025 · 0 comments
Open

Missing images #30

chenbihao opened this issue Apr 7, 2025 · 0 comments

Comments

@chenbihao
Copy link

Lost images in the main text:

example url:

https://zhuanlan.zhihu.com/p/1891834264254932686
https://zhuanlan.zhihu.com/p/647108274

lost image html:

<figure data-size="normal">
   <div><img src="https://pica.zhimg.com/v2-00ed257f3d687d8784f2a22964adb9b8_1440w.jpg" data-caption="" data-size="normal" data-rawwidth="1080" data-rawheight="1032" data-original-token="v2-00ed257f3d687d8784f2a22964adb9b8" class="origin_image zh-lightbox-thumb" width="1080" data-original="https://pica.zhimg.com/v2-00ed257f3d687d8784f2a22964adb9b8_r.jpg"></div>
</figure>

full div code:

<div class="RichText ztext Post-RichText css-ob6uua" options="[object Object]">
   <blockquote data-first-child="" data-pid="IvTqR581">金磊 发自 凹非寺<br>量子位 | 公众号 QbitAI</blockquote>
   <p data-pid="Ecowpa2A">有点意思。</p>
   <p data-pid="RM_nlOpr">
      这不
      <b>
         <span>
            <a class="RichContent-EntityWord css-b7erz1" data-za-not-track-link="true" data-paste-text="true" href="https://zhida.zhihu.com/search?content_id=256017430&amp;content_type=Article&amp;match_order=1&amp;q=DeepSeek&amp;zd_token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJ6aGlkYV9zZXJ2ZXIiLCJleHAiOjE3NDQxNjMzNzAsInEiOiJEZWVwU2VlayIsInpoaWRhX3NvdXJjZSI6ImVudGl0eSIsImNvbnRlbnRfaWQiOjI1NjAxNzQzMCwiY29udGVudF90eXBlIjoiQXJ0aWNsZSIsIm1hdGNoX29yZGVyIjoxLCJ6ZF90b2tlbiI6bnVsbH0.jaWZTFN7_GD3DvPbBlWOsn_q4MHX6W815YfyuB256U0&amp;zhida_source=entity" target="_blank">
               DeepSeek
               <svg width="10px" height="10px" viewBox="0 0 16 16" class="ZDI ZDI--FourPointedStar16 css-1dvsrp" fill="currentColor">
                  <path d="m5.068 9.267-3.08-.77a.512.512 0 0 1 0-.994l3.08-.77a2.289 2.289 0 0 0 1.665-1.665l.77-3.08a.512.512 0 0 1 .994 0l.77 3.08c.205.82.845 1.46 1.665 1.665l3.08.77a.512.512 0 0 1 0 .994l-3.08.77a2.29 2.29 0 0 0-1.665 1.665l-.77 3.08a.512.512 0 0 1-.994 0l-.77-3.08a2.289 2.289 0 0 0-1.665-1.665Z"></path>
               </svg>
            </a>
         </span>
      </b>
      前脚刚刚上新了一篇关于推理时Scaling Law的论文嘛,引得大家纷纷联想<b>是不是R2马上要来了</b>。
   </p>
   <p data-pid="X0_AY36R">然鹅……奥特曼这边却发了一条“变卦”的消息:</p>
   <blockquote data-pid="VTHa0acF">计划改变:我们可能在几周之后先发布o3和o4-mini。</blockquote>
   <p class="ztext-empty-paragraph"><br></p>
   <figure data-size="normal">
      <div><img src="https://pica.zhimg.com/v2-00ed257f3d687d8784f2a22964adb9b8_1440w.jpg" data-caption="" data-size="normal" data-rawwidth="1080" data-rawheight="1032" data-original-token="v2-00ed257f3d687d8784f2a22964adb9b8" class="origin_image zh-lightbox-thumb" width="1080" data-original="https://pica.zhimg.com/v2-00ed257f3d687d8784f2a22964adb9b8_r.jpg"></div>
   </figure>
   <p class="ztext-empty-paragraph"><br></p>
   <p data-pid="EQiUz6g7">
      至于大家翘首以盼的
      <b>
         <span>
            <a class="RichContent-EntityWord css-b7erz1" data-za-not-track-link="true" data-paste-text="true" href="https://zhida.zhihu.com/search?content_id=256017430&amp;content_type=Article&amp;match_order=1&amp;q=GPT-5&amp;zd_token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJ6aGlkYV9zZXJ2ZXIiLCJleHAiOjE3NDQxNjMzNzAsInEiOiJHUFQtNSIsInpoaWRhX3NvdXJjZSI6ImVudGl0eSIsImNvbnRlbnRfaWQiOjI1NjAxNzQzMCwiY29udGVudF90eXBlIjoiQXJ0aWNsZSIsIm1hdGNoX29yZGVyIjoxLCJ6ZF90b2tlbiI6bnVsbH0.gpxz4ZZ5d2E4QVPPHrVqxnmtHUqQpAKBBpQCRdGuZDU&amp;zhida_source=entity" target="_blank">
               GPT-5
               <svg width="10px" height="10px" viewBox="0 0 16 16" class="ZDI ZDI--FourPointedStar16 css-1dvsrp" fill="currentColor">
                  <path d="m5.068 9.267-3.08-.77a.512.512 0 0 1 0-.994l3.08-.77a2.289 2.289 0 0 0 1.665-1.665l.77-3.08a.512.512 0 0 1 .994 0l.77 3.08c.205.82.845 1.46 1.665 1.665l3.08.77a.512.512 0 0 1 0 .994l-3.08.77a2.29 2.29 0 0 0-1.665 1.665l-.77 3.08a.512.512 0 0 1-.994 0l-.77-3.08a2.289 2.289 0 0 0-1.665-1.665Z"></path>
               </svg>
            </a>
         </span>
      </b>
      ,奥特曼表示:
   </p>
   <blockquote data-pid="axHwj09d">将在几个月之后,而且效果会比我们最初设想的还要好。</blockquote>
   <p data-pid="yz907XB8">至于原因,奥特曼也做出了解释。</p>
   <p data-pid="KE6CVBrC">大概意思就是,顺利整合所有内容比他们想象的要困难得多,希望确保有足够的能力来支持预期的需求。</p>
   <p class="ztext-empty-paragraph"><br></p>
   <figure data-size="normal">
      <div><img src="https://pic1.zhimg.com/v2-b23fc77d68f3da96b3617ea9255eb69e_1440w.jpg" data-caption="" data-size="normal" data-rawwidth="1080" data-rawheight="579" data-original-token="v2-b23fc77d68f3da96b3617ea9255eb69e" class="origin_image zh-lightbox-thumb" width="1080" data-original="https://pic1.zhimg.com/v2-b23fc77d68f3da96b3617ea9255eb69e_r.jpg"></div>
   </figure>
   <p class="ztext-empty-paragraph"><br></p>
   <p data-pid="Fm6h4c3C">咱就是说啊,现在真的是DeepSeek这边一有点声响,OpenAI那边就得有点动作来紧跟一下了。</p>
   <h2>DeepSeek新论文</h2>
   <p data-pid="btguT0fo">在这个小插曲之后呢,我们还是把目光聚焦在DeepSeek这篇新论文身上。</p>
   <p data-pid="K8TasddW">这篇论文的名字叫做<b>Inference-Time Scaling for Generalist Reward Modeling</b>,由DeepSeek和清华大学共同提出。</p>
   <p class="ztext-empty-paragraph"><br></p>
   <figure data-size="normal">
      <div><img src="https://pica.zhimg.com/v2-8a088d5dfca8f5c4344530160d1800d4_1440w.jpg" data-caption="" data-size="normal" data-rawwidth="1080" data-rawheight="332" data-original-token="v2-8a088d5dfca8f5c4344530160d1800d4" class="origin_image zh-lightbox-thumb" width="1080" data-original="https://pica.zhimg.com/v2-8a088d5dfca8f5c4344530160d1800d4_r.jpg"></div>
   </figure>
   <p class="ztext-empty-paragraph"><br></p>
   <p data-pid="sqJI-5eP">
      这篇研究核心的亮点,就是提出了一个叫做
      <b>
         <span>
            <a class="RichContent-EntityWord css-b7erz1" data-za-not-track-link="true" data-paste-text="true" href="https://zhida.zhihu.com/search?content_id=256017430&amp;content_type=Article&amp;match_order=1&amp;q=SPCT&amp;zd_token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJ6aGlkYV9zZXJ2ZXIiLCJleHAiOjE3NDQxNjMzNzAsInEiOiJTUENUIiwiemhpZGFfc291cmNlIjoiZW50aXR5IiwiY29udGVudF9pZCI6MjU2MDE3NDMwLCJjb250ZW50X3R5cGUiOiJBcnRpY2xlIiwibWF0Y2hfb3JkZXIiOjEsInpkX3Rva2VuIjpudWxsfQ.qbktTuW_NlC_6ZKssSWwzOSihShHJppb8hpejiRNw7M&amp;zhida_source=entity" target="_blank">
               SPCT
               <svg width="10px" height="10px" viewBox="0 0 16 16" class="ZDI ZDI--FourPointedStar16 css-1dvsrp" fill="currentColor">
                  <path d="m5.068 9.267-3.08-.77a.512.512 0 0 1 0-.994l3.08-.77a2.289 2.289 0 0 0 1.665-1.665l.77-3.08a.512.512 0 0 1 .994 0l.77 3.08c.205.82.845 1.46 1.665 1.665l3.08.77a.512.512 0 0 1 0 .994l-
7D90
3.08.77a2.29 2.29 0 0 0-1.665 1.665l-.77 3.08a.512.512 0 0 1-.994 0l-.77-3.08a2.289 2.289 0 0 0-1.665-1.665Z"></path>
               </svg>
            </a>
         </span>
         方法
      </b>
      (Self-Principled Critique Tuning)的方法——
   </p>
   <p data-pid="XaldjGNA">首次提出通过在线强化学习(RL)优化原则和批判生成,实现推理时扩展。</p>
   <p data-pid="9kn8QrJx">之所以要做这么一项研究,是因为之前大家用奖励模型(Reward Model, RM)在RL中为大语言模型生成奖励信号。</p>
   <p data-pid="YvP2uK_r">但现有的RM在通用领域却表现出受限的情况,尤其是在面对复杂、多样化任务的时候。</p>
   <p data-pid="H5JEqI1m">因此,就出现了两个关键挑战点。</p>
   <p data-pid="-56WPTze">一个是通用RM需要灵活性(支持单响应、多响应评分)和准确性(跨领域高质量奖励)。</p>
   <p data-pid="6j5Cg_C_">另一个则是现有RM(如标量RM、半标量RM)在推理时扩展性差,无法通过增加计算资源显著提升性能。</p>
   <p class="ztext-empty-paragraph"><br></p>
   <figure data-size="normal">
      <div><img src="https://pic2.zhimg.com/v2-95edf7d7b13208a87ce2f3aa50b65725_1440w.jpg" data-caption="" data-size="normal" data-rawwidth="1080" data-rawheight="692" data-original-token="v2-95edf7d7b13208a87ce2f3aa50b65725" class="origin_image zh-lightbox-thumb" width="1080" data-original="https://pic2.zhimg.com/v2-95edf7d7b13208a87ce2f3aa50b65725_r.jpg"></div>
   </figure>
   <p class="ztext-empty-paragraph"><br></p>
   <p data-pid="wMYu2biQ">为了解决这个问题,DeepSeek和清华大学团队便提出了SPCT。</p>
   <p class="ztext-empty-paragraph"><br></p>
   <figure data-size="normal">
      <div><img src="https://pic3.zhimg.com/v2-14aefa70c0e6271863f2c3d4e6d7c6fa_1440w.jpg" data-caption="" data-size="normal" data-rawwidth="1080" data-rawheight="784" data-original-token="v2-14aefa70c0e6271863f2c3d4e6d7c6fa" class="origin_image zh-lightbox-thumb" width="1080" data-original="https://pic3.zhimg.com/v2-14aefa70c0e6271863f2c3d4e6d7c6fa_r.jpg"></div>
   </figure>
   <p class="ztext-empty-paragraph"><br></p>
   <p data-pid="y_to5iYU">整体来看,这项研究主要包含三大核心技术点。</p>
   <p data-pid="FF9u6qMf">
      首先就是
      <b>
         <span>
            <a class="RichContent-EntityWord css-b7erz1" data-za-not-track-link="true" data-paste-text="true" href="https://zhida.zhihu.com/search?content_id=256017430&amp;content_type=Article&amp;match_order=1&amp;q=%E7%94%9F%E6%88%90%E5%BC%8F%E5%A5%96%E5%8A%B1%E6%A8%A1%E5%9E%8B&amp;zd_token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJ6aGlkYV9zZXJ2ZXIiLCJleHAiOjE3NDQxNjMzNzAsInEiOiLnlJ_miJDlvI_lpZblirHmqKHlnosiLCJ6aGlkYV9zb3VyY2UiOiJlbnRpdHkiLCJjb250ZW50X2lkIjoyNTYwMTc0MzAsImNvbnRlbnRfdHlwZSI6IkFydGljbGUiLCJtYXRjaF9vcmRlciI6MSwiemRfdG9rZW4iOm51bGx9.HElLBR7_J9UtuHDEjWysW-gXnvj4aKI74Frde7NCE-A&amp;zhida_source=entity" target="_blank">
               生成式奖励模型
               <svg width="10px" height="10px" viewBox="0 0 16 16" class="ZDI ZDI--FourPointedStar16 css-1dvsrp" fill="currentColor">
                  <path d="m5.068 9.267-3.08-.77a.512.512 0 0 1 0-.994l3.08-.77a2.289 2.289 0 0 0 1.665-1.665l.77-3.08a.512.512 0 0 1 .994 0l.77 3.08c.205.82.845 1.46 1.665 1.665l3.08.77a.512.512 0 0 1 0 .994l-3.08.77a2.29 2.29 0 0 0-1.665 1.665l-.77 3.08a.512.512 0 0 1-.994 0l-.77-3.08a2.289 2.289 0 0 0-1.665-1.665Z"></path>
               </svg>
            </a>
         </span>
      </b>
      (GRM)。
   </p>
   <p data-pid="i7cche0h">它采用点式生成奖励模型(Pointwise GRM),通过生成文本形式的奖励(如critiques)而非单一标量值,支持灵活输入(单响应、多响应)和推理时扩展。</p>
   <p class="ztext-empty-paragraph"><br></p>
   <figure data-size="normal">
      <div><img src="https://pic1.zhimg.com/v2-776c2ab94ff9f2e9c2f6be98aa1da4d2_1440w.jpg" data-caption="" data-size="normal" data-rawwidth="1080" data-rawheight="69" data-original-token="v2-776c2ab94ff9f2e9c2f6be98aa1da4d2" class="origin_image zh-lightbox-thumb" width="1080" data-original="https://pic1.zhimg.com/v2-776c2ab94ff9f2e9c2f6be98aa1da4d2_r.jpg"></div>
   </figure>
   <p class="ztext-empty-paragraph"><br></p>
   <p data-pid="U57ipv4z">其中,C是生成的critique,fextract从中提取分数。</p>
   <p data-pid="0dEqpLi_">接下来,是关键的<b>SPCT</b>了。</p>
   <p data-pid="kbTHrvAz">主要是通过在线强化学习(RL)训练GRM,使其能动态生成高质量的原则(principles)和批判(critiques),从而提升奖励质量。</p>
   <p data-pid="QS48wr4E">整体来看,SPCT是一个两阶段的过程,它们分别是:</p>
   <ul>
      <li data-pid="9z6Xm1Cl"><b>拒绝式微调(Rejective Fine-Tuning)</b></li>
      <li data-pid="TqGmkQGm">:冷启动阶段,通过采样和拒绝策略生成初始数据。</li>
      <li data-pid="LvqWgir5"><b>基于规则的在线RL</b></li>
      <li data-pid="r-DkH-hd">:使用规则化奖励函数优化原则和批判的生成,鼓励模型区分最佳响应。</li>
   </ul>
   <p data-pid="voV0C-rU">在此基础上,便是第三个技术点,即<b>推理时扩展技术</b>。</p>
   <p data-pid="hko11NH2">先是通过多次采样生成多样化的原则和批判,投票聚合最终奖励,扩展奖励空间。</p>
   <p data-pid="vLtBKg6a">再训练一个辅助模型过滤低质量采样,进一步提升扩展效果。</p>
   <p data-pid="17Lbi0B8">基于上述的方法,团队也对结果做了一波测试。</p>
   <p data-pid="dETzJuYR">在Reward Bench、PPE、RMB等基准上,DeepSeek-GRM-27B显著优于基线方法(如LLM-as-a-Judge、标量RM),且通过推理时扩展(32次采样)性能进一步提升(如Reward Bench准确率从86.0%提升至90.4%)。</p>
   <p class="ztext-empty-paragraph"><br></p>
   <figure data-size="normal">
      <div><img src="https://pic4.zhimg.com/v2-a228f4eb76d1862fa5bab71143fede6d_1440w.jpg" data-caption="" data-size="normal" data-rawwidth="1080" data-rawheight="698" data-original-token="v2-a228f4eb76d1862fa5bab71143fede6d" class="origin_image zh-lightbox-thumb" width="1080" data-original="https://pic4.zhimg.com/v2-a228f4eb76d1862fa5bab71143fede6d_r.jpg"></div>
   </figure>
   <p class="ztext-empty-paragraph"><br></p>
   <p data-pid="5FbuxZk7">总而言之,这篇研究证明了推理时扩展在通用RM中的有效性,性能超越训练时扩展。</p>
   <h2>One More Thing</h2>
   <p data-pid="h-RXLsEL">奥特曼发布“变卦”消息之外,还不忘给自己带一波货,称有两本他亲自参与的书即将发布:</p>
   <ul>
      <li data-pid="-mnXJqqc">一本是Keach Hagey写的关于奥特曼本人的书</li>
      <li data-pid="UP5-QFkV">一本是Ashlee Vance写的关于OpenAI的书</li>
   </ul>
   <p class="ztext-empty-paragraph"><br></p>
   <figure data-size="normal">
      <div><img src="https://pic1.zhimg.com/v2-41b8e7ba562857cc5cc1cd4e1046eb6e_1440w.jpg" data-caption="" data-size="normal" data-rawwidth="1080" data-rawheight="1069" data-original-token="v2-41b8e7ba562857cc5cc1cd4e1046eb6e" class="origin_image zh-lightbox-thumb" width="1080" data-original="https://pic1.zhimg.com/v2-41b8e7ba562857cc5cc1cd4e1046eb6e_r.jpg"></div>
   </figure>
   <p class="ztext-empty-paragraph"><br></p>
   <p data-pid="LlCZlzrL">论文地址:<a href="https://link.zhihu.com/?target=https%3A//arxiv.org/abs/2504.02495" class=" external" target="_blank" rel="nofollow noreferrer"><span class="invisible">https://</span><span class="visible">arxiv.org/abs/2504.0249</span><span class="invisible">5</span><span class="ellipsis"></span></a></p>
   <p data-pid="ROT3iwaP">参考链接:<br>[1]<a href="https://link.zhihu.com/?target=https%3A//x.com/sama/status/1908167621624856998" class=" external" target="_blank" rel="nofollow noreferrer"><span class="invisible">https://</span><span class="visible">x.com/sama/status/19081</span><span class="invisible">67621624856998</span><span class="ellipsis"></span></a><br>[2]<a href="https://link.zhihu.com/?target=https%3A//techcrunch.com/2025/04/04/openai-says-itll-release-o3-after-all-delays-gpt-5/" class=" external" target="_blank" rel="nofollow noreferrer"><span class="invisible">https://</span><span class="visible">techcrunch.com/2025/04/</span><span class="invisible">04/openai-says-itll-release-o3-after-all-delays-gpt-5/</span><span class="ellipsis"></span></a><br>[3]<a href="https://link.zhihu.com/?target=https%3A//x.com/sama/status/1908163013192069460" class=" external" target="_blank" rel="nofollow noreferrer"><span class="invisible">https://</span><span class="visible">x.com/sama/status/19081</span><span class="invisible">63013192069460</span><span class="ellipsis"></span></a></p>
   <p data-pid="JjOdr3Wj">—完—</p>
   <p data-pid="rynOGfNu"><a href="https://www.zhihu.com/org/liang-zi-wei-48/columns" class="internal" target="_blank">@量子位</a> · 追踪AI技术和产品新动态</p>
   <p data-pid="z-_VaKLq">深有感触的朋友,欢迎赞同、关注、分享三连վ'ᴗ' ի ❤</p>
</div>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant
0