Writing CPython.Simple

The Setup

This is a story about how I came upon an idea for making a library much easier to use, entirely by accident. I write a lot of things in Haskell, but sometimes there just aren’t the right libraries yet for the kind of thing I want to do. One of those times involved some user task automation: basically simulating the keyboard and mouse to perform repetitive tasks by taking over the user’s control of the computer. I’d used PyAutoGUI for this kind of thing before, but there’s no simple binding to the APIs it’s using (I believe PyAutoGUI does different things on each of Windows, Mac OS, and Linux, so it would be a lot of work to replicate all that). Instead, I landed on the idea for my hsautogui project, writing bindings to the existing Python AutoGUI code.

It’s not too bad to have to talk to different languages from Haskell, but generally you have to go over the C FFI to get things working. Luckily for me, someone had already written Haskell bindings for the CPython API, which is a huge chunk of work I didn’t have to do, and am grateful for. Unluckily, it looked like the last time they had been touched was five years ago, for Python 3.4, and I wanted to use 3.9. In addition, because the bindings were so low-level—grabbing pointers to Python objects, losing any kind of type information because Python doesn’t really roll that way, etc.—it was a lot of work to write individual AutoGUI functions that I wanted in Haskell, and most of that work was repetitive wrapping/unwrapping boilerplate.

So, I did what any sane person would do, and decided to write a library in between the library I was writing and the library I was using, so I could use that library instead of the previous more-complicated library to write my library.

Thinking Like a User

While it was tempting to think like a library writer about the details of how things would work, instead I started out from the point of view of a user, as the writer of hsautogui. What would make my life the easiest? Who cares how much work the library writer has to do to support it—I’ll do that later.

What I wanted the most was to write the bare minimum wrapper over Python functions, turning them into Haskell functions. Something like this:

myFunction :: Arg1 -> Arg2 -> Something
myFunction arg1 arg2 =
  call "function" arg1 arg2

However, there are obviously problems with that. How will call know how many arguments it takes? I didn’t want to go too far into type-level magic, so changed the args to a list. How does Python know to call functions from within certain modules? We can add that argument as well. If users are seeing too much repetition there, currying makes it easy enough to write call' = callModule "pythonModuleName" if we keep it as the first argument.

myFunction :: Arg1 -> Arg2 -> Something
myFunction arg1 arg2 =
  call "module" "function" [arg1, arg2]

Also, Python has a nice system for keyword arguments. The natural type for these is probably a Map Text Arg, but it turned out that Maps were a bit too cumbersome to actually work with for such simple wrapper functions. So let’s go with the classic: a list of pairs.

myFunction :: Arg1 -> Arg2 -> KeywordArg -> Something
myFunction arg1 arg2 arg3 =
  call "module" "function" [arg1, arg2] [("arg3Name", arg3)]

This now looks pretty much exactly how call’s type signature ended up:

call
  :: FromPy a
  => Text          -- module name
  -> Text          -- function name
  -> [Arg]         -- arguments
  -> [(Text, Arg)] -- keyword arguments
  -> IO a

We’ll examine that Arg type in a bit.

Preserving Types Across the FFI Canyon

Another thing I wanted, as a user of my own upcoming library, was for it to be dead simple to shuttle data back and forth between Haskell and Python, while preserving as much of the Haskell types as I could. The simplicity was the key, though. From this idea we’re pretty quickly led to the idea of using To and From instances, so we can have e.g. 7 be an instance of FromPy and ToPy, and then use it as an argument without caring about how it’s getting converted. SomeObject is how haskell-cpython represents, well…some Python object. And that’s about all we know about these objects from looking at them.

class FromPy a where
  fromPy :: Py.SomeObject -> IO a

class ToPy a where
  toPy :: a -> IO Py.SomeObject

The instances for even super-simple types can get kind of out of hand. It’s nice that we no longer have to manually write them, as library users.

instance FromPy Bool where
  fromPy pyB = do
    isTrue <- Py.isTrue pyB
    isFalse <- Py.isFalse pyB
    case (isTrue, isFalse) of
      (True, False) -> pure True
      (False, True) -> pure False
      (False, False) -> throwIO . PyCastException . show $ typeRep (Proxy :: Proxy Bool)
      (True, True) -> throwIO . PyCastException $ (show $ typeRep (Proxy :: Proxy Bool)) ++
        ". Python object was True and False at the same time. Should be impossible."

These instances make the notion of “things that can be converted to/from Python” explicit. You may have seen this to/from instance pattern in other places, like Aeson’s ToJSON and FromJSON. It’s convenient to have typeclasses handle the marshalling of data behind the scenes, so we don’t have to think about it nearly as much. Just call toPy or fromPy.

Arg, I’m Feeling Existential

One thing we might notice is that Python essentially takes a list of arguments (myFunc(7, "hello", True)), but unlike Haskell lists, these arguments don’t have to have the same type. In fact, usually they won’t all have the same type.

A quick way to fix this is with existential types. Let’s create a type Arg that represents an argument to a Python function.

data Arg = forall a. ToPy a => Arg a

Of course, we want Arg itself to be an instance of ToPy, so we can grab the ToPy-able thing inside it, as Python.

instance ToPy Arg where
  toPy (Arg a) = toPy a

Because of the generality of a, all we know about what’s in a [Arg] is that each element has a ToPy constraint. This means just about the only thing we can do with such a list’s elements is call toPy on them.

With this handy, it’s now possible to build heterogeneous lists of things that have ToPy instances. Nifty.

sampleArgs :: [Arg]
sampleArgs =
  [ Arg (7 :: Integer)
  , Arg ("hello" :: Text)
  , Arg (True :: Bool)
  ]

Attribution, Errors

Besides calling functions in a module, you might also want to set some attribute in them, or read some attribute from them.

setAttribute
  :: ToPy a
  => Text -- ^ module name
  -> Text -- ^ attribute name
  -> a    -- ^ value to set attribute to
  -> IO ()

getAttribute
  :: FromPy a
  => Text -- ^ module name
  -> Text -- ^ attribute name
  -> IO a

What happens if, say, we try to getAttribute a Text but Python gives us back an int rather than some bit of unicode?

Well, here’s the magic:

easyFromPy
  :: (Py.Concrete p, Typeable h)
  => (p -> IO h)   -- ^ python from- conversion, e.g. Py.fromFloat
  -> Proxy h       -- ^ proxy for the type being converted to
  -> Py.SomeObject -- ^ python object to cast from
  -> IO h          -- ^ Haskell value
easyFromPy convert typename obj = do
  casted <- Py.cast obj
  case casted of
    Nothing -> throwIO $ PyCastException (show $ typeRep typename)
    Just x -> convert x

Now to explain. p and h are mnemonics for a Python object and a Haskell value, respectively. The Python object has to be Concrete. Basically, all the primitive types (numbers, strings, etc.) we want to eventually cast to are going to be Concrete, but you can read the source for more details there. The Haskell value has to be Typeable, which gives us the ability to get the name of the type of the value using typeRep. This type information is passed along via Proxy. For example, typeRep (Proxy :: Proxy Integer) gives us "Integer". It’s a nice way to chuck the name of a type into an error message, but it does require the user to give us a Proxy carrying the right type along. Usually, that’s pretty easy, since they’ll be calling easyFromPy in the context of writing a FromPy instance which knows which type it’s for. For example:

instance FromPy Double where
  fromPy = easyFromPy Py.fromFloat Proxy

Here, the compiler can infer that our Proxy is carrying along a Double. Using typeRep, we can throw an exception with information about which type we failed to cast to.

Type Inference

What’s the difference between randint and uniform here?

randint :: Integer -> Integer -> IO Integer
randint low high =
  call "random" "randint" [arg low, arg high] []

uniform :: Integer -> Integer -> IO Double
uniform low high =
  call "random" "uniform" [arg low, arg high] []

Besides calling different Python functions, they also return different types. Because FromPy is working behind the scenes here, we don’t even really need to think about this while writing a wrapper library. Because our type signatures are handily nearby, call knows which type to try to coax the SomeObject python returns into being. We can also use the TypeApplications language extension to explicitly tell call what type to marshal the value it gets into, if needed.

call @Double "random" "uniform" [arg low, arg high] []

Pare Programming

There’s quite a lot of consideration that goes into making something so simple! The API surface we ended up with is merely call to call Python functions, and getAttribute/setAttribute to get and set attributes. ToPy and FromPy instances handle type marshalling for us, and we use Arg to import heterogeneous argument lists from Python into Haskell land.

I guess it’s time to get back to improving hsautogui, now that it’s running on CPython.Simple.