Python の組み込みコンテナ、collections.deque と heapq

2023-01-09 python

(C)Python には list, tuple, dict, set が組み込まれており、collections モジュールで deque が、heapq モジュールで優先度付きキューが実装されている。

$ python3 --version
Python 3.10.7

list

連結リストではなく可変長配列で実装されている。

C++ STLのContainersとAlgorithms - sambaiz-net

a = ['d', 'ae', 'ff', 'b', 'c']

print(a[2:-1]) # => ['ff', 'b']

a.append('x')
print(a.pop()) # => 'x'
print(a) # => ['d', 'ae', 'ff', 'b', 'c']

print('ae' in a) # => True

print(sorted(a, key=len, reverse=True)) # => ['ae', 'ff', 'd', 'b', 'c']

v, w, x, y, z = a
print(x) # => ff
print(*a) # => d ae ff b c

tuple は一致しない最初の要素で、object は __lt__() が呼ばれることよって比較され sort される。

sorted([(1, 2), (2, 1), (1, 2, 3)]) # => [(1, 2), (1, 2, 3), (2, 1)]

class Value:
    def __init__(self, value):
        self.value = value
    def __lt__(self, other):
        return self.value < other.value

list(map(lambda x: x.value, sorted([Value(3), Value(1), Value(2)]))) # => [1, 2, 3]

ちなみに binary_search の lower_bound, upper_bound 相当の関数は bisect モジュールにある。

import bisect

a = [1, 2, 2, 3, 5]
print(bisect.bisect_left(a, 2)) # => 1 
print(bisect.bisect_right(a, 2)) # => 3
print(bisect.bisect_left(a, 4)) # => 4
print(bisect.bisect_right(a, 4)) # => 4

tuple

list と似たようなことができるが、immutable で hashable なオブジェクト。

a = ('d', 'ae', 'ff', 'b', 'c')

# a.append('x') => AttributeError: 'tuple' object has no attribute 'append'
print(a + a) # => ('d', 'ae', 'ff', 'b', 'c', 'd', 'ae', 'ff', 'b', 'c')

# print({[100]: 100}) => TypeError: unhashable type: 'list'
print({tuple([100]): 100}) # => {(100,): 100}

hash table で実装されており格納順を保持しない、と思いきや実は Python 3.6 からの実装では保持するようになっており、 Python 3.7 で言語仕様にも入った。元々 sparse array に直接データの配列を格納していたものが、データは別の配列に順番に格納し、sparse array にはその index のみを持つことでメモリの使用率が 20% 以上減り、CPUのキャッシュメモリの効率も高まっているそうだ。

a = {'foo': 100, 'bar': 200}

print([k+str(v) for k, v in a.items()]) # => ['foo100', 'bar200']

print('foo' in a) # => True

print(sorted(a)) # => ['bar', 'foo']

del a['foo'] # => {'bar': 'a'}

なお、collections.defaultdict だとキーがない値が自動で default_factory で初期化される。

from collections import defaultdict

a = defaultdict(set)

a["a"].add(100)
a["a"].add(100)
a["a"].add(200)

print("b" in a) # => False
print(a) # => defaultdict(<class 'set'>, {'a': {200, 100}})

set

hash table で実装されている。

a = {'d', 'ae'} | {'ff', 'b', 'c'}

print(len(a)) # => 5
print('c' in a) # => True

a.add('x')
a.remove('y') # => KeyError: 'y'
a.discard('y')

collections.deque

両端から O(1) でデータを追加できるキュー。

from collections import deque
d = deque()

d.append('x')
d.appendleft('y')
d.append('z')
print(d) # => deque(['y', 'x', 'z'])
d.pop() # => 'z'
d.popleft() # => 'y

heapq

優先度付きキューのデータ構造があるわけではなく、list を heapq のように扱えるようにする。

import heapq

a = [3, 1, 2, 4, 5]
heapq.heapify(a)
print(a) # => [1, 3, 2, 4, 5]

heapq.heappush(a, 0)
heapq.heappop(a) # => 0

参考

Python のタプルとリストの違い、タプルの使いどころ - Life with Python