Functional programming, AI, Lisp, Neovim and Emacs, Principal SE


fullmoon
@fullmoon

One of my favorite packages that I've authored is my foldl package which allows you to combine multiple folds into a single fold that still processes the input in one pass.

To motivate this, the naive way that you'd write a function to compute an average in Haskell would be:

average xs = sum xs / genericLength xs

… but that is undesirable because if you call it on a large list like this:

>>> average [ 0 .. 1000000000 ]

… it has to materialize the entire list since it goes over the list in two passes. This is very inefficient both from a RAM perspective (having to load the entire list into memory) and a CPU perspective (the generated code is not very "tight").

If you're new to Haskell, you might wonder: "Wait! Don't you have to materialize the entire list even if you go over the list in one pass? Why does two passes change anything?"

The answer is: Haskell is very smart and if you process a list generated using the syntax [ m .. n ] in a single pass it won't materialize the list!1 Instead, it will behave more like a generator in other languages: elements are consumed as they are generated so you never have more than one element in memory at a time.

In other words, in Haskell if you write something like:

result = sum [ 0 .. 1000000000 ]

… it's the exact same as if you had written the following loop with a counter with no list at all:

result = loop 0 0
  where
    loop !n !total
        | n > 1000000000 = total
        | otherwise      = loop (n + 1) (total + n)

How can I be so sure? Because I can store both of those functions in the same module:

{-# LANGUAGE BangPatterns #-}

module Example where

result₀ = sum [ 0 .. 1000000000 ]

result₁ = loop 0 0
  where
    loop !n !total
        | n > 1000000000 = total
        | otherwise      = loop (n + 1) (total + n)

… and output the intermediate GHC core for that module:

$ ghc -O2 -ddump-simpl -dsuppress-all Example.hs
[1 of 1] Compiling Example          ( Example.hs, Example.o )

==================== Tidy Core ====================
Result size of Tidy Core
  = {terms: 43, types: 16, coercions: 0, joins: 0/0}

-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
lim_r1PP = 1000000000

-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
lvl_r1Ry = 1

Rec {
-- RHS size: {terms: 16, types: 3, coercions: 0, joins: 0/0}
result₀_go9
  = \ x_a1OR eta_B0 ->
      case integerGt# x_a1OR lim_r1PP of {
        __DEFAULT ->
          result₀_go9
            (integerAdd x_a1OR lvl_r1Ry) (integerAdd eta_B0 x_a1OR);
        1# -> eta_B0
      }
end Rec }

-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
result₀1 = 0

-- RHS size: {terms: 3, types: 0, coercions: 0, joins: 0/0}
result₀ = result₀_go9 result₀1 result₀1

-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
result₁ = result₀

…

ghc's optimizer is smart enough to optimize away result₁ to be a synonym for result₀, as you can see above.

However, the above naive average function does not generate efficient core like that (I won't bother to show it here; it's large, inefficient and lousy). However, if you use my foldl package, you can define average like this:

import Control.Foldl

import Prelude hiding (sum)

average = sum / genericLength

main = print (fold average [ 0 .. 1000000000 ])

… that will process the list in a single pass.

Note: This is not the best way to compute an average because it is not numerically stable, but it's for illustrative purposes.

In fact, you can also use do notation to chain together folds, like this:

{-# LANGUAGE ApplicativeDo #-}

import Control.Foldl

import Prelude hiding (minimum, maximum)

example = do
    lo <- minimum
    hi <- maximum
    return (lo, hi)

main = print (fold example [ 0 .. 1000000 ])

… and that works because with the ApplicativeDo language extension you can use Haskell's do notation to create anything that implements Applicative (and the Fold type from the foldl package implements Applicative).

Anyway, if you like that you can find my foldl package here:

… and you might also be interested in the original announcement post for this package here:


  1. There are some caveats to ensure that this works correctly, but it definitely works if you use my foldl package.


You must log in to comment.