I've been working on the Elixir track on Exercism for a while in order to improve and better my approach to problem solving with Elixir. Coming from an OO background, there's no question that functional programming requires some rewiring of how you approach problems.
So what I aim to do with this series is to go over a single problem per post and not only work out the solution, but also go over the thought process and reasoning behind my solution. By doing so I hope to share some information for new comers to the language, and also receive some feedback from the Elixir community.
I've decided to skip the Hello World
problem, and jump to Nucleotide Count
as the first problem.
Given a single stranded DNA string, compute how many times each nucleotide occurs in the string.
The genetic language of every living thing on the planet is DNA. DNA is a large molecule that is built from an extremely long sequence of individual elements called nucleotides. 4 types exist in DNA and these differ only slightly and can be represented as the following symbols: 'A' for adenine, 'C' for cytosine, 'G' for guanine, and 'T' thymine.
The solution requires two functions: count
which simply counts the number of a given nucleotide in the sequence, and histogram
which returns a map of the each nucleotide and its count in the sequence. Let's start with the simple one: count
The count
function takes a string and a char as its arguments. It is expected that the functions returns and integer representing the number of occurrences of the char in the string. From the tests:
NucleotideCount.count('CCCCC', ?C) == 5
So what's the best approach here? Likely we're going to iterate over the characters in the string and count how many times the characters match the second argument. A quick check of the Enum
module shows us Enum.count\2
count(enumerable, fun)
Returns the count of elements in the enumerable for which fun returns a truthy value.
Sounds perfect for our case, so let's write our function:
def count(strand, nucleotide) do
strand
|> Enum.count(fn x -> x == nucleotide end)
end
We pipe the strand
into the Enum.count\2
function and if the character matches nucleotide
it gets counted. Can we do better? We can make use of the ampersand or catch operator &
, to simplify the anonymous function
def count(strand, nucleotide) do
strand
|> Enum.count(&(&1 == nucleotide))
end
Does exactly the same, but is a little easier to read and a common pattern to replace the more verbose anonymous function syntax. The docs have more information on the capture operator if you want to read more.
So moving on to histogram
. Looking at the tests, we can see what's expected from us:
test "repetitive sequence has only guanine" do
expected = %{?A => 0, ?T => 0, ?C => 0, ?G => 8}
assert NucleotideCount.histogram('GGGGGGGG') == expected
end
We return the count of each nucleotide in the given string, including 0 occurrences. As we need to count each and every nucleotide in the string, even if it is absent, let's start by creating a constant at the top of our module
@nucleotides [?A, ?C, ?G, ?T]
We can enumerate over this and use our previous count
function to do our counting can't we? Ok, so we need to return a Map
so it's best to head to the Map docs and see what we can make use of. After some perusing we find
new(enumerable, transform)
Creates a map from an enumerable via the given transformation function.
and the example given
Map.new([:a, :b], fn x -> {x, x} end)
%{a: :a, b: :b}
Seems ideal. As we need to return a map in the format %{?A => 0, ?T => 0, ?C => 0, ?G => 8}
we should use the previously defined constant @nucleotides
as the enumerable, and write a function that returns the nucleotide mapped to its count. We've already written the count
function, so let's make use of that to give us
def histogram(strand) do
Map.new(@nucleotides, &{&1, count(strand, &1)})
end
I've used the capture operator &
here instead of an anonymous function. What we've said here is that for each item in @nucleotides
we want to return a map element of the type {key, value}
where the key is the nucleotide character, and the value is the result of counting the nucleotide in the provided strand
string.
If we run all the tests now we should see them all pass. Excellent, we've solved our first Elixir challenge on Exercism. Feel free to leave questions of other comments below.
Next week we'll tackle the Secret handshake
problem. See you then.