Python's built-in containers, collections.deque, and heapq

2023-01-09 python

(C)Python has deque in the collections module and priority queue in the heapq module in addition to build-in containers such as list, tuple, dict and set.

$ python3 --version
Python 3.10.7

list

list is implemented with not linked-list but variable-length arrays.

C++ STLのContainersとAlgorithms - sambaiz-net

a = ['d', 'ae', 'ff', 'b', 'c']

print(a[2:-1]) # => ['ff', 'b']

a.append('x')
print(a.pop()) # => 'x'
print(a) # => ['d', 'ae', 'ff', 'b', 'c']

print('ae' in a) # => True

print(sorted(a, key=len, reverse=True)) # => ['ae', 'ff', 'd', 'b', 'c']

v, w, x, y, z = a
print(x) # => ff
print(*a) # => d ae ff b c

Tuples are compared with the first different elements, whereas objects are ordered by the result of their __lt__() method to be sorted.

sorted([(1, 2), (2, 1), (1, 2, 3)]) # => [(1, 2), (1, 2, 3), (2, 1)]

class Value:
    def __init__(self, value):
        self.value = value
    def __lt__(self, other):
        return self.value < other.value

list(map(lambda x: x.value, sorted([Value(3), Value(1), Value(2)]))) # => [1, 2, 3]

By the way, functions that correspondents to lower_bound and upper_bound in binary_search are implented in bisect module.

import bisect

a = [1, 2, 2, 3, 5]
print(bisect.bisect_left(a, 2)) # => 1 
print(bisect.bisect_right(a, 2)) # => 3
print(bisect.bisect_left(a, 4)) # => 4
print(bisect.bisect_right(a, 4)) # => 4

tuple

tuple can be used similar to list, but it is immutable and hashable.

a = ('d', 'ae', 'ff', 'b', 'c')

# a.append('x') => AttributeError: 'tuple' object has no attribute 'append'
print(a + a) # => ('d', 'ae', 'ff', 'b', 'c', 'd', 'ae', 'ff', 'b', 'c')

# print({[100]: 100}) => TypeError: unhashable type: 'list'
print({tuple([100]): 100}) # => {(100,): 100}

I thought dict is implemented with a hash table and so does not keep the insertion order but in fact, the implementation from Python 3.6 keeps it, and that behavior was added as a language feature in Python 3.7. Originally, the data was stored directly in the sparse array, but now it is stored in another array in order and the sparse array has only the index. As a result, the memory usage rate was reduced by more than 20%, and the cache memory in CPUs can be used efficiently.

a = {'foo': 100, 'bar': 200}

print([k+str(v) for k, v in a.items()]) # => ['foo100', 'bar200']

print('foo' in a) # => True

print(sorted(a)) # => ['bar', 'foo']

del a['foo'] # => {'bar': 'a'}

Besides, if you use collections.defaultdict, the value of not found key is automatically initialized with default_factory.

from collections import defaultdict

a = defaultdict(set)

a["a"].add(100)
a["a"].add(100)
a["a"].add(200)

print("b" in a) # => False
print(a) # => defaultdict(<class 'set'>, {'a': {200, 100}})

set

set is implemented with hash table.

a = {'d', 'ae'} | {'ff', 'b', 'c'}

print(len(a)) # => 5
print('c' in a) # => True

a.add('x')
a.remove('y') # => KeyError: 'y'
a.discard('y')

collections.deque

deque can be appended data with O(1) to either direction.

from collections import deque
d = deque()

d.append('x')
d.appendleft('y')
d.append('z')
print(d) # => deque(['y', 'x', 'z'])
d.pop() # => 'z'
d.popleft() # => 'y

heapq

There is no priority queue data structure, but it allows you to treat a list as it.

import heapq

a = [3, 1, 2, 4, 5]
heapq.heapify(a)
print(a) # => [1, 3, 2, 4, 5]

heapq.heappush(a, 0)
heapq.heappop(a) # => 0

References

Python のタプルとリストの違い、タプルの使いどころ - Life with Python