11  Containers

Julia offers a wide selection of container types with largely similar interfaces. This chapter introduces Tuple, Range, and Dict; the next chapter covers Array, Vector, and Matrix.

These containers are:

for x  container ... end
x = container[i]

and some are also

Furthermore, there are several common functions, e.g.,

11.1 Tuples

A tuple is an immutable container of elements. You cannot add new elements or change existing values.

t = (33, 4.5, "Hello")

@show   t[2]               # indexable

for i  t println(i) end   # iterable
t[2] = 4.5
33
4.5
Hello

A tuple is an inhomogeneous type. Each element has its own type, which is reflected in the tuple’s type:

typeof(t)
Tuple{Int64, Float64, String}

Tuples are frequently used as function return values to return more than one object.

# Integer division and remainder:
# Assign quotient and remainder to variables `q` and `r`:

q, r = divrem(71, 6)
@show  q  r;
q = 11
r = 5

Parentheses can be omitted in certain constructs. This implicit tuple packing/unpacking is commonly used in multiple assignments:

x, y, z = 12, 17, 203
(12, 17, 203)
y
17

Some functions require tuples as arguments or always return tuples. A single-element tuple is written as:

x = (13,)         # a 1-element tuple
(13,)

The comma - not the parentheses - makes the tuple.

x= (13)         # not a tuple
13

11.2 Ranges

We have already used range objects in numerical for loops.

r = 1:1000
typeof(r)
UnitRange{Int64}

There are various range types. UnitRange, for example, is a range with step size 1. Their constructors are typically all named range().

The colon is a special syntax.

  • a:b is parsed as range(a, b)
  • a:b:c is parsed as range(a, c, step=b)

Ranges are iterable, immutable, and indexable.

(3:100)[20]   # the 20th element
22

Recall the semantics of the for loop: for i in 1:1000 does not mean ‘increment the loop variable i by one each iteration’; rather, it means ‘successively assign the values 1, 2, 3, …, 1000 to the loop variable from the container’.

Creating this container explicitly would be very inefficient.

  • Ranges are “lazy” vectors never stored as concrete lists. This makes them ideal as for loop iterators: memory-efficient and fast.
  • They are “recipes” or generators that respond to the query “Give me your next element!”.
  • In fact, the supertype AbstractRange is a subtype of AbstractVector.

The macro @allocated outputs how many bytes of memory were allocated during the evaluation of an expression.

@allocated r = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
224
@allocated r = 1:20
32

The collect() function converts a range to a concrete vector.

collect(20:-3:1)
7-element Vector{Int64}:
 20
 17
 14
 11
  8
  5
  2

Quite useful, e.g., when preparing data for plotting, is the range type LinRange.

LinRange(2, 50, 300)
300-element LinRange{Float64, Int64}:
 2.0, 2.16054, 2.32107, 2.48161, 2.64214, …, 49.5184, 49.6789, 49.8395, 50.0

LinRange(start, stop, n) generates n equidistant values from start to stop. Use collect() to obtain the corresponding vector if needed.

11.3 Dictionaries

  • Dictionaries (also known as associative arrays or lookup tables) are special containers.
  • Whereas vector entries are addressed by integer indices: v[i]; dictionary entries are addressed by more general keys.
  • A dictionary is a collection of key-value pairs with parameterized type Dict{S,T}, where S is the key type and T is the value type.

Create a dictionary explicitly:

# Population in 2020 in millions, source: wikipedia

Ppl = Dict("Berlin" => 3.66,  "Hamburg" => 1.85, 
          "München" => 1.49, "Köln" => 1.08)
Dict{String, Float64} with 4 entries:
  "München" => 1.49
  "Köln"    => 1.08
  "Berlin"  => 3.66
  "Hamburg" => 1.85
typeof(Ppl)
Dict{String, Float64}

and indexed with the keys:

Ppl["Berlin"]
3.66

Querying a non-existent key throws an error.

Ppl["Leipzig"]
KeyError: key "Leipzig" not found
Stacktrace:
 [1] getindex(h::Dict{String, Float64}, key::String)
   @ Base ./dict.jl:477
 [2] top-level scope
   @ ~/Julia/Book26/JuliaBook/chapters/6_ArraysEtcP1.qmd:179

Check beforehand with haskey()

haskey(Ppl, "Leipzig")
false

Or use get(dict, key, default), which returns the default value instead of throwing an error.

@show get(Ppl, "Leipzig", -1)   get(Ppl, "Berlin", -1);
get(Ppl, "Leipzig", -1) = -1
get(Ppl, "Berlin", -1) = 3.66

You can also request all keys and values as special containers.

keys(Ppl)
KeySet for a Dict{String, Float64} with 4 entries. Keys:
  "München"
  "Köln"
  "Berlin"
  "Hamburg"
values(Ppl)
ValueIterator for a Dict{String, Float64} with 4 entries. Values:
  1.49
  1.08
  3.66
  1.85

Iterate over the keys

for i in keys(Ppl)
    n = Ppl[i]
    println("The city $i has $n million inhabitants.")
end
The city München has 1.49 million inhabitants.
The city Köln has 1.08 million inhabitants.
The city Berlin has 3.66 million inhabitants.
The city Hamburg has 1.85 million inhabitants.

Or iterate directly over key-value pairs.

for (city, pop)  Ppl
    println("$city : $pop  Million.")
end
München : 1.49  Million.
Köln : 1.08  Million.
Berlin : 3.66  Million.
Hamburg : 1.85  Million.

11.3.1 Extending and Modifying

Add key-value pairs to a Dict

Ppl["Leipzig"] = 0.52
Ppl["Dresden"] = 0.52 
Ppl
Dict{String, Float64} with 6 entries:
  "Dresden" => 0.52
  "München" => 1.49
  "Köln"    => 1.08
  "Berlin"  => 3.66
  "Leipzig" => 0.52
  "Hamburg" => 1.85

Change a value:

# Update: Leipzig data was from 2010, not 2020

Ppl["Leipzig"] = 0.597
Ppl
Dict{String, Float64} with 6 entries:
  "Dresden" => 0.52
  "München" => 1.49
  "Köln"    => 1.08
  "Berlin"  => 3.66
  "Leipzig" => 0.597
  "Hamburg" => 1.85

Delete a pair by its key:

delete!(Ppl, "Dresden")
Dict{String, Float64} with 5 entries:
  "München" => 1.49
  "Köln"    => 1.08
  "Berlin"  => 3.66
  "Leipzig" => 0.597
  "Hamburg" => 1.85

Many functions work with Dicts like other containers.

maximum(values(Ppl))
3.66

11.3.2 Creating an Empty Dictionary

Without explicit types:

d1 = Dict()
Dict{Any, Any}()

With explicit types:

d2 = Dict{String, Int}()
Dict{String, Int64}()

11.3.3 Conversion to Vectors: collect()

  • keys(dict) and values(dict) return special container types.
  • collect() converts them to Vectors.
  • collect(dict) returns a Vector{Pair{S,T}}.
collect(Ppl)
5-element Vector{Pair{String, Float64}}:
 "München" => 1.49
    "Köln" => 1.08
  "Berlin" => 3.66
 "Leipzig" => 0.597
 "Hamburg" => 1.85
collect(keys(Ppl)), collect(values(Ppl))
(["München", "Köln", "Berlin", "Leipzig", "Hamburg"], [1.49, 1.08, 3.66, 0.597, 1.85])

11.3.4 Ordered Iteration over a Dictionary

We sort the keys. As strings, they are sorted alphabetically. With the rev parameter, sorting is done in reverse order.

for k in sort(collect(keys(Ppl)), rev = true)
    n = Ppl[k]
    println("$k has $n million inhabitants  ")
end
München has 1.49 million inhabitants  
Leipzig has 0.597 million inhabitants  
Köln has 1.08 million inhabitants  
Hamburg has 1.85 million inhabitants  
Berlin has 3.66 million inhabitants  

Let’s sort collect(dict), a vector of pairs. Use by to specify the sort key: the second element of each pair.

for (k,v) in sort(collect(Ppl), by = pair -> last(pair), rev=false)
    println("$k has $v million inhabitants")
end
Leipzig has 0.597 million inhabitants
Köln has 1.08 million inhabitants
München has 1.49 million inhabitants
Hamburg has 1.85 million inhabitants
Berlin has 3.66 million inhabitants

11.3.5 An Application of Dictionaries: Counting Frequencies

Let’s do “experimental stochastics” with 2 dice:

Let l be a vector containing 100,000 sums of two dice rolls (numbers from 2 to 12).

How frequently does each number from 2 to 12 occur?

Roll the dice:


l = rand(1:6, 100_000) .+ rand(1:6, 100_000)
100000-element Vector{Int64}:
  2
  5
 11
  3
  3
  9
  9
  7
 11
  6
  ⋮
  8
  5
  6
 11
  5
  6
  9
  5
  3

Count event frequencies using a dictionary. Use the event as the key and its frequency as the value.

# In this case, a simple vector would also work.
# A better use case for dictionaries is word frequency in texts,
# where keys are strings instead of integers.

d = Dict{Int,Int}()     # dictionary for counting

for i in l                    # for each i, increment d[i]
    d[i] = get(d, i, 0) + 1   
end
d
Dict{Int64, Int64} with 11 entries:
  5  => 11049
  12 => 2716
  8  => 13822
  6  => 13961
  11 => 5521
  9  => 11251
  3  => 5625
  7  => 16634
  4  => 8346
  2  => 2902
  10 => 8173

Result:

using Plots

plot(collect(keys(d)), collect(values(d)), seriestype=:scatter)
Precompiling packages...
   1310.5 msQuartoNotebookWorkerPlotsExt (serial)
  1 dependency successfully precompiled in 1 seconds

Explanatory image:

https://math.stackexchange.com/questions/1204396/why-is-the-sum-of-the-rolls-of-two-dices-a-binomial-distribution-what-is-define