Querying Data Structures in Lisp

Note: After enjoying the process of thinking and coding this solution, I discovered let-alist that is already built into Emacs, and works like a charm.

The other day I was calling a web service interface, retrieving a JSON document, and querying the results to do more work — rinse and repeat—you know the drill. If the job is quick-n-dirty, but sharing with colleagues at work, I often use jq (or if the results are in YAML, then I use yq). At this particular time, I didn’t need to share, which means, I could bang stuff out faster in Lisp—usually. For extracting particular values in a complex JSON document, I was stumbling, and wished I had a jq-like thing for Lisp.

I found the solution to the problem of extracting a value or two from a JSON document, simple. While I’m sure someone has solved it better, since I haven’t discovered it, I thought I would describe my version, and use it as a springboard for a lively discussion about the best interface.

Example

First, we need an example. Github is pretty good, and let’s suppose we want a list of the authors of the last five commits—and store in a variable github-committers. We can use the built-in url-retrieve function:

(require 'url)

(defun parse-github-commits (events)
  (goto-char url-http-end-of-headers)
  (setq github-committers
        (thread-last
          (json-read)
          (seq-map (lambda (obj)
                     (let* ((commit (gethash "commit" obj))
                            (author (gethash "author" commit))
                            (name   (gethash "name"   author))
                            (email  (gethash "email"  author)))
                       (format "%s <%s>" name email))
                     )))))

(url-retrieve "https://api.github.com/repos/stedolan/jq/commits?per_page=5" 'parse-github-commits)

The call to url-retrieve is asynchronous, so when the function downloads the data into a buffer, it calls our callback function, parse-github-commits, ready to parse the JSON in the buffer. We skip over the headers (included in the buffer) by using the variable url-http-end-of-headers.

The call to json-read returns the JSON document as a complex Emacs Lisp data structure, but the top level is a seq (a vector is actually the default for JSON arrays), so we call seq-map to gather the details, passing in each object as the variable obj.

But here is where it gets obnoxious. We first need to get the commit object, then the committer sub-object, then the name and email from that object. I suppose most people would use a let* to assign each of these to a variable. Calling jq with the following query:

.commit.committer.name

That query is Javascript-y, which is fine if we were in the Land of Javascript, but this is Lisp. Let’s make a function query-tree that could take an object, and a series of elements, like:

(format "%s <%s>"
        (query-tree obj 'commit 'committer 'name)
        (query-tree obj 'commit 'committer 'email))

The query-tree Function

The query-tree is all about the interface, where we punt the complexity to other functions:

(defun query-tree (data-structure &rest query)
  "Return value from DATA-STRUCTURE based on elements of QUERY."
  (query-tree-parse query data-structure))

All this does is make a list of the query elements, but query-tree-parse is a recursive function that will trim the first element of our query, car query, use a helper function to extract that value … then calls itself with the next element on query list and the current value:

(defun query-tree-parse (query tree)
  "Return value of QUERY from TREE data structure."
  (if (null query)
      tree
    (query-tree-parse (cdr query)
                      (query-tree--parse-node (car query) tree))))

Again, we punted more complexity yet again, yet we hace a little logic in addressing the type of tree.

The Parse Node Function

Now we need to look at this query element (node) and the top-level element in the structured tree, and return the value. Do we have a hash-table of an object? Call gethash. Got a plist? Call plist-get. That sort of idea:

(defun query-tree--parse-node (node tree)
  "Return results of TREE data structure of symbol, NODE.

If TREE is a hash-table, then NODE would be the key, either a
string or a symbol. Return value.

If TREE is a sequence and NODE is number, Return element indexed
by NODE.

If TREE is a Property List, return value of NODE as a key. This
should work with association lists as well."
  (cond
   ((null tree)                        nil)
   ((hash-table-p tree)                 (or (gethash node tree)
                                            (gethash (symbol-name node) tree)))
   ((and (numberp node)  (seqp tree))       (seq-elt tree node))
   ((and (eq node :first) (seqp tree))      (car node))
   ((and (eq node :last) (seqp tree))       (last node))

   ((assoc node tree)                       (alist-get node tree))
   ((plistp tree)                           (plist-get tree node))
   (t                                  (error "query-tree: Looking for '%s' in %s" node tree))))

While the JSON reader, by default, makes hash tables with strings for the keys, I like the idea of walking this tree by symbols, hence this code where I can convert an object to its string equivalent:

(or (gethash node tree)
    (gethash (symbol-name node) tree))

Most objects implement the seq interface, so to distinguish between a vector and a plist, I check to see if the query element is a number (we can also take some keywords of :first and :last).

Note: You may feel that my function is overkill, as the json-read function should only return a collection of hash tables and vectors, but one could change this with code like:

(let ((json-object-type 'plist)
      (json-key-type    'symbol)
      (json-array-type  'list))
  (let ((data (json-read)))
    ;; ...

In which case, I should make it more versatile.

If we wanted to expand this function to take arbitrary nested, data structures (more than the typical results from a call to json-read), we could add more conditionals and access. For instance, returning the value of a slot in a record or structure:

((and (numberp node) (recordp tree))     (aref tree node))
((and (numberp node) (cl-struct-p tree)) (aref tree node))

Or, if we were willing to specify the type of a struct along with the name (as a list), we could do something like:

((and (listp node)   (cl-struct-p tree)) (cl-struct-slot-value (car tree) (cadr tree) node))

I might be over-doing it.

Let’s make some unit tests to verify this function:

(ert-deftest query-tree-walk-tree-test ()
  ;; PLIST:
  (let ((data '(:a 1 :b 2 :c 3)))
    (should (eq (query-tree--parse-node :b data) 2)))
  ;; ALIST:
  (let ((data '((pine . cones) (oak . acorns) (maple . seeds))))
    (should (eq (query-tree--parse-node 'oak data) 'acorns)))
  ;; LIST, aka a sequence
  (let ((data '(a b c d e f)))
    (should (eq (query-tree--parse-node 3 data) 'd)))
  ;; VECTOR:
  (let ((data [a b c d e]))
    (should (eq (query-tree--parse-node 2 data) 'c)))
  ;; HASH TABLE with strings:
  (let ((h (make-hash-table :test 'equal)))
    (puthash "a" 1 h)
    (puthash "b" 2 h)
    (puthash "c" 3 h)
    (should (eq (query-tree--parse-node "b" h) 2))
    (should (eq (query-tree--parse-node 'b h) 2)))
  ;; HASH TABLE with symbols:
  (let ((h (make-hash-table)))
    (puthash 'a 1 h)
    (puthash 'b 2 h)
    (puthash 'c 3 h)
    (should (eq (query-tree--parse-node 'b h) 2))))

Function Verification

Let’s write some unit tests for our higher-level functions:

(ert-deftest query-tree-parse-tree-test ()
  (should (null (query-tree-parse nil nil)))
  (let ((data :final))
    (should (eq (query-tree-parse nil data) :final)))
  (let ((data [ a b c d e f ]))
    (should (eq (query-tree-parse '(3) data) 'd)))
  (let ((data (json-parse-string
               "{ \"a\": 1,
                  \"b\": {
                     \"x\": 10,
                     \"y\": [100, 101, 102, 103],
                     \"z\": 30
                  },
                  \"c\": 3
                }")))
    (should (hash-table-p data))
    (should (= (query-tree-parse '("a") data) 1))
    (should (= (query-tree-parse '("b" "x") data) 10))
    (should (= (query-tree-parse '("b" "y" 2) data) 102))
    (should (= (query-tree-parse '(a) data) 1))
    (should (= (query-tree-parse '(b x) data) 10))
    (should (= (query-tree-parse '(b y 2) data) 102))))

And a verification for query-tree:

(ert-deftest query-tree-test ()
  (let ((data (json-parse-string
               "{ \"a\": 1,
                  \"b\": {
                     \"x\": 10,
                     \"y\": [100, 101, 102, 103],
                     \"z\": 30
                  },
                  \"c\": 3
                }")))
    (should (= (query-tree data 'b 'y 2) 102))))

Demonstration

Let’s see this in action, but refactor the format call into a helper function to showcase the query-tree function:

(require 'url)

(defvar github-committers nil
  "A list of recent committers to a Github repository.")

(defun github-committer (commit-obj)
  (format "%s <%s>"
          (query-tree commit-obj 'commit 'author 'name)
          (query-tree commit-obj 'commit 'author 'email)))

(defun parse-github-commits (events)
  (goto-char url-http-end-of-headers)
  (setq github-committers
        (seq-map #'github-committer (json-read))))

(url-retrieve
 "https://api.github.com/repos/emacs-lsp/lsp-mode/commits?per_page=5"
 'parse-github-commits)

And the results of the call stored in github-committers:

  • Julien Cretin <git@ia0.eu>
  • Ivan Yonchovski <yyoncho@users.noreply.github.com>
  • Yogesh Chobe <ylchobe@gmail.com>
  • Suvayu Ali <suvayu@users.noreply.github.com>
  • Jen-Chieh Shen <jcs090218@gmail.com>

Let’s do another example. My work project that initially triggered this exercise had a JSON document that looked sort of like this:

{
    "type": "application/vnd.com.xpc.global.project-v0+json",
    "common_id": "a1bd8a50-c166-4576-8128-348bfef92c94",
    "data": {
        "href": "https://e501.eng.pdx.com/api/projects/847acd57-f51a-4064-8e44-d1efc88884b9?deleted=false",
        "id": "cc605e3e-96a7-4804-8804-fb5d67ef22b7",
        "name": "dev-yzyy.d501.eng.pdx.com",
        "version": 2,
        "xpc_connection": {
            "openstack_api_key": "***REDACTED***",
            "openstack_username": "***REDACTED***",
            "openstack_tenant": "dev-yzyy.d501.eng.pdx.com"
        },
        "xpc_defaults": {
            "default_flavor": "XPC.2.4.30.0",
            "default_network": "yzyy"
        },
        "global_property_overrides": {
            "system": "/etc/global/secure/system_property_overrides",
            "shell": "/etc/global/secure/shell_property_overrides",
            "service": "/etc/global/secure/service_property_overrides"
        }
    },
    "status": "SUCCESS"
}

For reliably extracting a single value from this document, I could use this expression:

(let ((data (json-parse-string json-data)))
  (when (equal "SUCCESS" (query-tree data 'status))
    (query-tree data 'data 'xpc_connection 'openstack_tenant)))

Returns:

dev-yzyy.d501.eng.pdx.com

Calling jq (and even using the jq-mode for Emacs) might be shorter, but I see this approach more readable, and more Lispy.

Alternatives

Sacha Chua mentioned on Mastodon that I could use let-alist. To get it to work with json-read, we need to tell it to use alist for JSON objects. So, to re-write our last example, we would write:

(let ((data (json-parse-string json-data :object-type 'alist)))
  (let-alist data
    (when (equal "SUCCESS" .status)
      .data.xpc_connection.openstack_tenant)))
dev-yzyy.d501.eng.pdx.com

To return the string, dev-yzyy.d501.eng.pdx.com

So while it might not be as versatile as the couple functions we wrote, being built-in, this is probably the solution you should try first.

Summary

This essay describes converting code like:

(let* ((commit (gethash "commit" commit-obj))
       (author (gethash "author" commit))
       (name   (gethash "name"   author))
       (email  (gethash "email"  author)))
  (format "%s <%s>" name email))

To code like this:

(format "%s <%s>"
        (query-tree commit-obj 'commit 'author 'name)
        (query-tree commit-obj 'commit 'author 'email))

You may not mind the original code, but I do feel that the second code is more intentional.

I know what you are thinking. What about the loops and other processing features of jq. Well, I don’t think I need them, as jq is compensating for limitations of what you can do in shell scripts and on the command line, where in this case, we already have a general language to use.