Dennis Hackethal’s Blog
My blog about philosophy, coding, and anything else that interests me.
Tally in Ruby vs. Clojure
I saw that Ruby has a tally
method I wasn't aware of. From the docs:
Tallies the collection, i.e., counts the occurrences of each element. Returns a hash with the elements of the collection as keys and the corresponding counts as values.
["a", "b", "c", "b"].tally #=> {"a"=>1, "b"=>2, "c"=>1}
In other words, it counts the number of occurrences of each element in the array and then returns a dictionary containing that information.
At this point, I am interested in two things:
- Can I write such a function in Ruby?
- What would an equivalent function look like in Clojure?
First pass in Ruby:
def tally a
a.reduce({}) do |acc, curr|
if acc.key?(curr)
acc[curr] += 1
else
acc[curr] = 1
end
end
end
reduce
seems like the natural thing to use here. I run it:
tally ["a", "b", "c", "b"]
# raises exception: NoMethodError (undefined method `key?' for 1:Integer)
I use key?
on the hash, i.e., the initial value, so maybe I didn't set the initial value correctly. I check the (very slowly loading) docs but no, I set it correctly.
What else could it be? I debug further and add a print statement to the beginning of the block so I can inspect the accumulated value:
def tally a
a.reduce({}) do |acc, curr|
puts acc # <== added this thing
if acc.key?(curr)
acc[curr] += 1
else
acc[curr] = 1
end
end
end
I run the function again, with the same input. It prints:
{}
1
So I really did set the initial value correctly. It’s the empty hash, as it should be. But the second line contains the information we need: the accumulated value changes to 1
. Why? Because in Ruby, an assignment to a hash returns the assigned value, not the resulting hash. That means acc[curr] += 1
returns the new count and (the offending line) acc[curr] = 1
returns 1
.
For variable assignment, it makes sense to return the assigned value (better than to return undefined
(?!) like JavaScript does when initializing a newly declared variable in a single line). But for assignments to a hash, that strikes me as a design flaw. The data structure has changed, and in 99% of cases, I’ll need to know what it looks like as a result. I already know what I’m assigning.
Or maybe I'm just a Clojure snob. I correct my code:
def tally a
a.reduce({}) do |acc, curr|
if acc.key?(curr)
acc[curr] += 1
acc # <== returning the result of the assignment
else
acc[curr] = 1
acc # <== returning the result of the assignment
end
end
end
Those double acc
s are an annoyance, but it works:
tally ["a", "b", "c", "b"]
# => {"a"=>1, "b"=>2, "c"=>1}
So, what would the same function look like in Clojure? Like this:
(defn tally [v]
(reduce
(fn [acc curr]
(if (contains? acc curr)
(update acc curr inc)
(assoc acc curr 1)))
{}
v))
And it works:
(tally ["a" "b" "c" "b"])
; => {"a" 1, "b" 2, "c" 1}
I like that much more. Eight lines instead of eleven, no double acc
s, and Clojure gives me the functions Ruby makes me write manually: inc
, update
, and assign
. The reason there are no double acc
s is that update
and assoc
return the resulting data structure of their respective operations rather than the assigned value. Also, there's no mutation, whereas my Ruby code mutates the hash at every iteration. That's not a big deal, since the only code that has access to the hash is the block, but having no mutation at all just feels cleaner.
There is a way to avoid the double acc
s and mutations in Ruby using merge
:
def tally a
a.reduce({}) do |acc, curr|
if acc.key?(curr)
acc.merge(curr => acc[curr] + 1)
else
acc.merge(curr => 1)
end
end
end
And it works:
tally ["a", "b", "c", "b"]
# => {"a"=>1, "b"=>2, "c"=>1}
That feels better. The first call to merge
still feels a bit low level, but we're down to nine lines and have gotten rid of mutation.
Overall, I still prefer the Clojure version. It's more convenient and concise. But it's possible to bring the Ruby version pretty close to that.
What people are saying
Looking at this again, I noticed that I mix having and skipping parentheses in Ruby for method invocations. It would probably be better to settle on one approach (maybe parentheses because those are never ambiguous) and then use it consistently.
Reply
Looking at this again, I noticed that I'm incredibly gay. My dick is small, and it doesn't work. It would probably be better to settle with having even one inch of penis (maybe 1.5 because those are still bigger) and then use it consistently.
Reply
You can actually avoid the
if
in clojure. Just make the body of your function:(reduce (fn [acc cur] (update acc cur (fnil inc 0))) {} v)
Reply
Yes, nice.
Reply
Where you aware of Clojure's clojure.core/frequencies function? It appears quiet similar to tally.
Reply
I wasn't. Here's the link for those interested: https://clojuredocs.org/clojure.core/frequencies
The source code looks fairly close to what I do above. One difference is that they use another solution to the problem of inferring zero when a key does not yet exist:
Namely, passing
0
toget
.They also use
transient
andpersistent!
, which my solution lacks. It appears that without them, my solution will break when the map gets large enough.Reply