Querying Data Structures in Lisp
Note: After enjoying the process of thinking and coding this solution, I discovered let-alist that is already built into Emacs, and works like a charm.
The other day I was calling a web service interface, retrieving a JSON document, and querying the results to do more work — rinse and repeat—you know the drill. If the job is quick-n-dirty, but sharing with colleagues at work, I often use jq (or if the results are in YAML, then I use yq). At this particular time, I didn’t need to share, which means, I could bang stuff out faster in Lisp—usually. For extracting particular values in a complex JSON document, I was stumbling, and wished I had a jq-like thing for Lisp.
I found the solution to the problem of extracting a value or two from a JSON document, simple. While I’m sure someone has solved it better, since I haven’t discovered it, I thought I would describe my version, and use it as a springboard for a lively discussion about the best interface.
Example
First, we need an example. Github is pretty good, and let’s suppose we want a list of the authors of the last five commits—and store in a variable github-committers
. We can use the built-in url-retrieve
function:
(require 'url) (defun parse-github-commits (events) (goto-char url-http-end-of-headers) (setq github-committers (thread-last (json-read) (seq-map (lambda (obj) (let* ((commit (gethash "commit" obj)) (author (gethash "author" commit)) (name (gethash "name" author)) (email (gethash "email" author))) (format "%s <%s>" name email)) ))))) (url-retrieve "https://api.github.com/repos/stedolan/jq/commits?per_page=5" 'parse-github-commits)
The call to url-retrieve
is asynchronous, so when the function downloads the data into a buffer, it calls our callback function, parse-github-commits
, ready to parse the JSON in the buffer. We skip over the headers (included in the buffer) by using the variable url-http-end-of-headers
.
The call to json-read
returns the JSON document as a complex Emacs Lisp data structure, but the top level is a seq
(a vector
is actually the default for JSON arrays), so we call seq-map
to gather the details, passing in each object as the variable obj
.
But here is where it gets obnoxious. We first need to get the commit
object, then the committer
sub-object, then the name
and email
from that object. I suppose most people would use a let*
to assign each of these to a variable. Calling jq
with the following query:
.commit.committer.name
That query is Javascript-y, which is fine if we were in the Land of Javascript, but this is Lisp. Let’s make a function query-tree
that could take an object, and a series of elements, like:
(format "%s <%s>" (query-tree obj 'commit 'committer 'name) (query-tree obj 'commit 'committer 'email))
The query-tree
Function
The query-tree
is all about the interface, where we punt the complexity to other functions:
(defun query-tree (data-structure &rest query) "Return value from DATA-STRUCTURE based on elements of QUERY." (query-tree-parse query data-structure))
All this does is make a list of the query
elements, but query-tree-parse
is a recursive function that will trim the first element of our query, car query
, use a helper function to extract that value … then calls itself with the next element on query list and the current value:
(defun query-tree-parse (query tree) "Return value of QUERY from TREE data structure." (if (null query) tree (query-tree-parse (cdr query) (query-tree--parse-node (car query) tree))))
Again, we punted more complexity yet again, yet we hace a little logic in addressing the type of tree.
The Parse Node Function
Now we need to look at this query element (node
) and the top-level element in the structured tree, and return the value. Do we have a hash-table
of an object? Call gethash
. Got a plist
? Call plist-get
. That sort of idea:
(defun query-tree--parse-node (node tree) "Return results of TREE data structure of symbol, NODE. If TREE is a hash-table, then NODE would be the key, either a string or a symbol. Return value. If TREE is a sequence and NODE is number, Return element indexed by NODE. If TREE is a Property List, return value of NODE as a key. This should work with association lists as well." (cond ((null tree) nil) ((hash-table-p tree) (or (gethash node tree) (gethash (symbol-name node) tree))) ((and (numberp node) (seqp tree)) (seq-elt tree node)) ((and (eq node :first) (seqp tree)) (car node)) ((and (eq node :last) (seqp tree)) (last node)) ((assoc node tree) (alist-get node tree)) ((plistp tree) (plist-get tree node)) (t (error "query-tree: Looking for '%s' in %s" node tree))))
While the JSON reader, by default, makes hash tables with strings for the keys, I like the idea of walking this tree by symbols, hence this code where I can convert an object to its string equivalent:
(or (gethash node tree) (gethash (symbol-name node) tree))
Most objects implement the seq
interface, so to distinguish between a vector
and a plist
, I check to see if the query element is a number (we can also take some keywords of :first
and :last
).
Note: You may feel that my function is overkill, as the json-read
function should only return a collection of hash tables and vectors, but one could change this with code like:
(let ((json-object-type 'plist) (json-key-type 'symbol) (json-array-type 'list)) (let ((data (json-read))) ;; ...
In which case, I should make it more versatile.
If we wanted to expand this function to take arbitrary nested, data structures (more than the typical results from a call to json-read
), we could add more conditionals and access. For instance, returning the value of a slot in a record or structure:
((and (numberp node) (recordp tree)) (aref tree node)) ((and (numberp node) (cl-struct-p tree)) (aref tree node))
Or, if we were willing to specify the type of a struct
along with the name (as a list), we could do something like:
((and (listp node) (cl-struct-p tree)) (cl-struct-slot-value (car tree) (cadr tree) node))
I might be over-doing it.
Let’s make some unit tests to verify this function:
(ert-deftest query-tree-walk-tree-test () ;; PLIST: (let ((data '(:a 1 :b 2 :c 3))) (should (eq (query-tree--parse-node :b data) 2))) ;; ALIST: (let ((data '((pine . cones) (oak . acorns) (maple . seeds)))) (should (eq (query-tree--parse-node 'oak data) 'acorns))) ;; LIST, aka a sequence (let ((data '(a b c d e f))) (should (eq (query-tree--parse-node 3 data) 'd))) ;; VECTOR: (let ((data [a b c d e])) (should (eq (query-tree--parse-node 2 data) 'c))) ;; HASH TABLE with strings: (let ((h (make-hash-table :test 'equal))) (puthash "a" 1 h) (puthash "b" 2 h) (puthash "c" 3 h) (should (eq (query-tree--parse-node "b" h) 2)) (should (eq (query-tree--parse-node 'b h) 2))) ;; HASH TABLE with symbols: (let ((h (make-hash-table))) (puthash 'a 1 h) (puthash 'b 2 h) (puthash 'c 3 h) (should (eq (query-tree--parse-node 'b h) 2))))
Function Verification
Let’s write some unit tests for our higher-level functions:
(ert-deftest query-tree-parse-tree-test () (should (null (query-tree-parse nil nil))) (let ((data :final)) (should (eq (query-tree-parse nil data) :final))) (let ((data [ a b c d e f ])) (should (eq (query-tree-parse '(3) data) 'd))) (let ((data (json-parse-string "{ \"a\": 1, \"b\": { \"x\": 10, \"y\": [100, 101, 102, 103], \"z\": 30 }, \"c\": 3 }"))) (should (hash-table-p data)) (should (= (query-tree-parse '("a") data) 1)) (should (= (query-tree-parse '("b" "x") data) 10)) (should (= (query-tree-parse '("b" "y" 2) data) 102)) (should (= (query-tree-parse '(a) data) 1)) (should (= (query-tree-parse '(b x) data) 10)) (should (= (query-tree-parse '(b y 2) data) 102))))
And a verification for query-tree
:
(ert-deftest query-tree-test () (let ((data (json-parse-string "{ \"a\": 1, \"b\": { \"x\": 10, \"y\": [100, 101, 102, 103], \"z\": 30 }, \"c\": 3 }"))) (should (= (query-tree data 'b 'y 2) 102))))
Demonstration
Let’s see this in action, but refactor the format
call into a helper function to showcase the query-tree
function:
(require 'url) (defvar github-committers nil "A list of recent committers to a Github repository.") (defun github-committer (commit-obj) (format "%s <%s>" (query-tree commit-obj 'commit 'author 'name) (query-tree commit-obj 'commit 'author 'email))) (defun parse-github-commits (events) (goto-char url-http-end-of-headers) (setq github-committers (seq-map #'github-committer (json-read)))) (url-retrieve "https://api.github.com/repos/emacs-lsp/lsp-mode/commits?per_page=5" 'parse-github-commits)
And the results of the call stored in github-committers
:
Julien Cretin <git@ia0.eu>
Ivan Yonchovski <yyoncho@users.noreply.github.com>
Yogesh Chobe <ylchobe@gmail.com>
Suvayu Ali <suvayu@users.noreply.github.com>
Jen-Chieh Shen <jcs090218@gmail.com>
Let’s do another example. My work project that initially triggered this exercise had a JSON document that looked sort of like this:
{ "type": "application/vnd.com.xpc.global.project-v0+json", "common_id": "a1bd8a50-c166-4576-8128-348bfef92c94", "data": { "href": "https://e501.eng.pdx.com/api/projects/847acd57-f51a-4064-8e44-d1efc88884b9?deleted=false", "id": "cc605e3e-96a7-4804-8804-fb5d67ef22b7", "name": "dev-yzyy.d501.eng.pdx.com", "version": 2, "xpc_connection": { "openstack_api_key": "***REDACTED***", "openstack_username": "***REDACTED***", "openstack_tenant": "dev-yzyy.d501.eng.pdx.com" }, "xpc_defaults": { "default_flavor": "XPC.2.4.30.0", "default_network": "yzyy" }, "global_property_overrides": { "system": "/etc/global/secure/system_property_overrides", "shell": "/etc/global/secure/shell_property_overrides", "service": "/etc/global/secure/service_property_overrides" } }, "status": "SUCCESS" }
For reliably extracting a single value from this document, I could use this expression:
(let ((data (json-parse-string json-data))) (when (equal "SUCCESS" (query-tree data 'status)) (query-tree data 'data 'xpc_connection 'openstack_tenant)))
Returns:
dev-yzyy.d501.eng.pdx.com
Calling jq
(and even using the jq-mode for Emacs) might be shorter, but I see this approach more readable, and more Lispy.
Alternatives
Sacha Chua mentioned on Mastodon that I could use let-alist. To get it to work with json-read
, we need to tell it to use alist
for JSON objects. So, to re-write our last example, we would write:
(let ((data (json-parse-string json-data :object-type 'alist))) (let-alist data (when (equal "SUCCESS" .status) .data.xpc_connection.openstack_tenant)))
dev-yzyy.d501.eng.pdx.com
To return the string, dev-yzyy.d501.eng.pdx.com
So while it might not be as versatile as the couple functions we wrote, being built-in, this is probably the solution you should try first.
Summary
This essay describes converting code like:
(let* ((commit (gethash "commit" commit-obj)) (author (gethash "author" commit)) (name (gethash "name" author)) (email (gethash "email" author))) (format "%s <%s>" name email))
To code like this:
(format "%s <%s>" (query-tree commit-obj 'commit 'author 'name) (query-tree commit-obj 'commit 'author 'email))
You may not mind the original code, but I do feel that the second code is more intentional.
I know what you are thinking. What about the loops and other processing features of jq
. Well, I don’t think I need them, as jq
is compensating for limitations of what you can do in shell scripts and on the command line, where in this case, we already have a general language to use.