#PythonTip: List Vs Set performance experiments

Lucas Magnum
2 min readApr 3, 2020

--

Hi, I’m Lucas Magnum and today we will do some experiments using list, tuple and set data structures from Python :)

Let’s work in the following scenario:

  • There are 30 million records in our collection
  • We have 100 records to validate whether they are present in our collection or not
  • We want to check the fastest way to check an element is in a collection using the in operator.

List solution

The list solution took between 20–30 seconds to run and it used 252 megabytes to store the collection list.

Tuple solution

Instead of using list to store our data, let’s try storing it on tuples.

The tuple solution took between 17–25 seconds to run and it used 228 megabytes to store the tuple collection.

A bit of improvement in running time and memory.

Set solution

To be fair with our comparisons I’ll do this test in 2 ways. The first way I’ll create a list with 30 million elements and convert it to a set and the second I’ll create the set with 30 million elements directly.

Convert list to set

Converting a list to a set collection + searching for the values took: 20–30 seconds and used 252 megabytes from the list + 1 gigabyte from the set.

Not much improvement here.

Create a set and search

Creating a set with 30 million items and searching on it took 2–5 seconds and used 1 gigabyte memory to store the collection.

We are searching ~10x faster but using ~4x more memory.

That is it for today! When we can exchange memory for performance the use of the sets could be handy!

TL;DR

  • Sets use a bunch of memory to store a huge collection of items
  • Sets are way faster than lists when using the in operator
  • Tuple uses less memory than lists but has almost the same performance

--

--

Lucas Magnum
Lucas Magnum

Written by Lucas Magnum

I will show you the world through my eyes, everything is a point of view. https://www.youtube.com/c/LucasMagnum

Responses (2)