Computer Science－Haibin's blog

MINEDRAFT: A Framework for Batch Parallel Speculative Decoding

MINEDRAFT: A Framework for Batch Parallel Speculative Decoding 把推测解码打成mini batch，随后在drafter和verifier上分批处理。在vllm上修改，工程量可观。不错的尝试和idea。 Architecture overview of MINEDRAFT. (Left) The Scheduler manages r

Paper Reading
Haibin
2026-03-23
108 Views
0 Comments

ISCA25 Neoscope: How Resilient Is My SoC to Workload Churn?

未来的硬件怎么应对不断演变的软件？ https://dl.acm.org/doi/pdf/10.1145/3695053.3731014 这篇文章是 ISCA 2025 的论文《Neoscope: How Resilient Is My SoC to Workload Churn?》，核心在回答一个非常系统/架构导向的问题：当软件和工作负载不断演进（churn）时，一个 SoC 设计在整个生命

Paper Reading
Haibin
2026-02-01
224 Views
0 Comments

ATC25 Colocating ML Inference and Training with Fast GPU Memory Handover

今天yf来分享一篇来自IPADS的ATC25文章。 Colocating ML Inference and Training with Fast GPU Memory Handover 简短点评：依旧IPADS特有的大工程，TVM+vLLM+NCCL+Pytorch 开组会大家一起问了很多问题。 https://ipads.se.sjtu.edu.cn/_media/publications/si

Paper Reading
Haibin
2026-01-15
363 Views
0 Comments

STOC81 I/O Complexity: The Red-Blue Pebble Game

STOC81 I/O Complexity: The Red-Blue Pebble Game 这是一篇理论计算机科学文章，但是描述了一个非常有趣的问题：就像时间复杂度一样，我们能不能做一个I/O复杂度，衡量一个程序最少要进行多少次I/O? 文章链接： https://www.eecs.harvard.edu/~htk/publication/1981-stoc-hong-kung.pdf Com

Paper Reading
Haibin
2026-01-09
329 Views
0 Comments

In-depth analysis: RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference

之前用LLM看文章，后来发现同样20分钟时间，学到的东西其实不如自己认真读读+关键问题请教。 KVCache可以用上 RAG 技术吗？这篇文章的idea是：能不能 "build KVCache as a Vector Storage System." 在长上下文情况中，KVCache经常超出显存，那么我们只能把多余的KVCache存进CPU内存里。而这样就很慢（CPU-GPU

Paper Reading
Haibin
2026-01-08
518 Views
0 Comments

Distributed and Cloud Computing Assignment 4

Feedback Feedback to Learner 12/30/25 3:55 PM 82+5=87 (extra: 0) > Summary: As we demonstrated in the lab, you should pre-assign labels and taints to cluster nodes using Kind config YAML. Other parts

Distributed Systems
Haibin
2026-01-07
299 Views
0 Comments

Distributed Systems and Cloud Computing: Review 1

This is the self-review pack of Distributed Systems and Cloud Computing. We have lesson 1-5. Lesson 1 Presentation – Effective communication of information rather than of data – Code and number conver

Distributed Systems
Haibin
2025-12-30
367 Views
0 Comments

DnCC3: Introduction to Spark

In this assignment, we need to use Spark to analyze the Parking dataset. Preparing Install pysark and java pip install pyspark sudo apt-get update sudo apt-get install openjdk-17-jdk export JAVA_HOME=

Distributed Systems
Haibin
2025-12-30
460 Views
0 Comments

DnCC Assignment 1: Parallel Matrix Multiplication

https://github.com/HaibinLai/Distributed-and-Cloud-Computing.git 【分布与云计算 - DnCC 复习】 https://www.bilibili.com/video/BV1eovaBTEW9/?share_source=copy_web&vd_source=72eac555730ba7e7a64f9fa1d7f2b2d4 Setup

Distributed Systems
Haibin
2025-12-30
345 Views
0 Comments

A Simple Merch Store Backend: Distributed and Cloud Computing Assignment 2

Scores 95+10=105 (extra: 5) Summary: The impl is nice in general, and the report is awesome! Yes, this is an assignment where you should follow certain instructions and submit certain stuff, but just

Distributed Systems
Haibin
2025-12-30
692 Views
0 Comments

You and your research | Richard W. Hamming

你和你的研究 https://gwern.net/doc/science/1986-hamming Great work is something else than mere brains. Brains are measured in various ways. In mathematics, theoretical physics, astrophysics, typically brain

Paper Reading
Haibin
2025-12-30
243 Views
0 Comments

怎么用AI写2000行的大作业

2026年3月16日更新：看看这篇文章：从 FAST26 SPECFS 看新时代 infra 开发者工作范式 - SPtuan的文章 - 知乎 https://zhuanlan.zhihu.com/p/2015537008425055371 人类已经丛底层编码走向编排者角色。我们需要编排agent去建立完善的控制体系。最近分布式课程有一个作业。作业内容是要写一个商城的后端。商城消费者通过网页

Computer Science
Haibin
2025-11-16
701 Views
0 Comments

我在CPU修PMU：Can We Trust Profiling Results?

Can We Trust Profiling Results? Understanding and Fixing the Inaccuracy in Modern Profilers https://par.nsf.gov/servlets/purl/10122098 在上次阅读完博客 # Where Do Interrupts Happen? 后（我的中文解析：https://www.haibi

Paper Reading
Haibin
2025-11-11
531 Views
0 Comments

AI Compiler Group Meeting

109 pages PPT，from TVM to Mirage. Introducing AI Compiler 101. Cost 90 minutes. PPT and videos： https://drive.google.com/drive/folders/1eKcHZKMpix31EcioiNCf16AzLIHkvGyy?usp=sharing

Paper Reading
Haibin
2025-11-11
364 Views
0 Comments

Can Tensor Cores Benefit Memory-Bound Kernels? (NO!)

本文学习自 Can Tensor Cores Benefit Memory-Bound Kernels? (NO!) https://dl.acm.org/doi/pdf/10.1145/3725798.3725803 这篇文章提出了一个有点惊人的观点：Tensorcore在面对 memory bound 的kernel/算子时效果并不是很好！文章用优秀的理论公式分析+实验验证了这点。读懂这篇文章

Paper Reading
Haibin
2025-11-02
365 Views
0 Comments

NSDI26: Can we use MLFQ in LLM Serving?

This paper is in arxiv for 2 years. Then it goes into NSDI26. Maybe we can see the difference between versions of 2023 and 2026. Paper link: https://arxiv.org/pdf/2305.05920 Main idea: Can we use MLFQ

Paper Reading
Haibin
2025-10-21
1212 Views
0 Comments

Distributed System 5: Bayou Algorithm

分布式一致性怎么在弱网情况下保证事件一致性，弱网指的是，只能时不时连接一下。 Bayou （1995） Bayou是一篇神奇的论文，在1995年这个互联网还没有普及的时代，就开始讨论分布式系统中弱一致性的问题。Bayou考虑的应用场景是移动设备不具备稳定的网络连接，如何保证这些不具备稳定网络连接的设备组成集群，处理读写操作时，用户看到的数据是合理的。Dynamo

Distributed Systems
Haibin
2025-10-11
537 Views
0 Comments

Distributed System 4: Chandy-Lamport Algorithm

Snapshots: save the data 我们想要捕捉系统在某一时刻 TTT 的一致全局状态，包括：每个进程的本地状态；每条通道上的消息状态（即“正在飞”的消息）。常见应用场景：检查点恢复（Checkpoint / Rollback Recovery）检测全局死锁检测全局不变式（如是否所有账户加和为常数）调试 / 稳定状态检测（如终止检测）问题是：在分布式系统中没有全局时

Distributed Systems
Haibin
2025-10-11
439 Views
0 Comments

Distributed System 3: Vector Clock

Review: Time is important in Distributed, for determine sequence. But we can\'t find a sync time for everyone. Vector Clock Lamport didn\'t solve: Solution: use a vector clock 两个向量一样：同一个事件一个向量小于另一个向量：所

Distributed Systems
Haibin
2025-10-11
333 Views
0 Comments

GridFTP: SC25 Test of Time Award

How to move massive data from server to client? How to serve multiple users around the world to use the compute machine? This technology was not invented in cloud computing, but grid computing. And th

Paper Reading
Haibin
2025-10-10
474 Views
0 Comments