为什么Python要使用有明显缺陷的引用计数而不是像JavaScript一样的标记清除?

引用计数有循环计数这个明显缺陷,那为什么Python还要使用引用计数而不是标记清除呢?
关注者
248
被浏览
37,573

13 个回答

这是一种设计取舍。用CPython的大家高兴就好呗。

其它Python实现有许多不用引用计数的,不高兴可以用它们(逃

另外CPython的引用计数是有mark-sweep备份的,不怕循环引用。

官网解释了这个选择(但其实也没说什么…):

docs.python.org/2/faq/d
The details of Python memory management depend on the implementation. The standard C implementation of Python uses reference counting to detect inaccessible objects, and another mechanism to collect reference cycles, periodically executing a cycle detection algorithm which looks for inaccessible cycles and deletes the objects involved. The gc module provides functions to perform a garbage collection, obtain debugging statistics, and tune the collector’s parameters.

In the absence of circularities and tracebacks, Python programs do not need to manage memory explicitly.

Why doesn’t Python use a more traditional garbage collection scheme? For one thing, this is not a C standard feature and hence it’s not portable. (Yes, we know about the Boehm GC library. It has bits of assembler code for most common platforms, not for all of them, and although it is mostly transparent, it isn’t completely transparent; patches are required to get Python to work with it.)
...
Traditional GC also becomes a problem when Python is embedded into other applications. While in a standalone Python it’s fine to replace the standard malloc() and free() with versions provided by the GC library, an application embedding Python may want to have its own substitute for malloc() and free(), and may not want Python’s. Right now, Python works with anything that implements malloc() and free() properly.

然后请看这篇文章介绍较新的Python的“mark-swep GC”其实还是“分代式”的:

patshaughnessy.net/2013

然后就是JavaScript也不一定是用mark-sweep的…语言规范没这么规定,实际实现也不全是用mark-sweep。

引用计数最大的好处是回收及时:一个对象的引用计数归零的那一刻即是它成为垃圾的那一刻,同时也是它被回收的那一刻。而这正式 mark-sweep 等 tracing GC 算法的劣势:一个对象成为垃圾之后,直到被下一轮 GC 清理掉之前,还要在内存中留存一段时间(floating garbage)。

Python 的 GC 设计是,对于内部不包含指向其他对象的引用的对象(如字符串、数值类型等),采用引用计数,因为这些对象根本不可能产生循环引用。对于 List、Map 等可能产生循环引用的对象,则采用 mark-sweep。所以我的理解是,Python 的 GC 设计一定程度上综合了两类 GC 算法的优点——即保证回收的完整性,又力求回收的及时性。

Update

上文描述有偏颇,把评论里 R 大的补充贴上来:

对List啊Map啊Set之类的引用计数也在起作用的。mark-sweep只是备份。

整套引用计数机制嵌在ceval.c里了,所有对象都要被它折腾到。
换句话说不是List不被引用计数,而是List的引用计数如果自然降到零的话就自然按照引用计数机制释放;否则当cycle GC启动的时候就会对它处理。

不过通过引用计数来提高回收及时性这点仍然还是成立的。