Replace Shell Pipes in Emacs

An Itch

Don’t misunderstand. I may be an Emacsian, but like you, I’m also a polyglot and spend time every day in remote shells and REPLs. However, I have an itch that I’ve been scratching, and while I haven’t reached any endorphin rush, I thought I would share an idea to start a bit of a discussion.

How would you solve the following challenge:

Read a file of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies.

Yeah, you may recognize the origin of this question, and you might quickly bang out this answer:

tr -cs A-Za-z '\n' | tr A-Z a-z | sort | uniq -c | sort -rn | sed ${1}q

But one does not just cast a spell of that wizardry level on the first try. Instead we iterate over each command, pipe by pipe. Nothing wrong with this…unless the first step doesn’t come from a file, but from a command that takes a bit of time.

The other day, I was looking for an image on our cloud system, and I didn’t know its name. Since I had suspicions, I issued calls to openstack and started piping to grep. However, let me illustrate the problem:

$ time openstack image list | wc -l
2404

real    0m51.432s
user    0m2.895s
sys     0m0.156s

With a couple thousand images in one cluster, an openstack command may takes almost a minute to complete. A challenge to iterate since the command can’t be memoized. The immediate solution is to write that to a temporary file, and begin grepping on that.

I have another solution (obviously, using Emacs), but humor me with another story that prompted the solution.

The Prompt

Last week, I noticed that Planet Emacsen referenced an Irreal article that was particularly not Emacs-centric, Text Manipulation with Command Line Utilities:

If your computing experience has always involved GUI tools, you may be unfamiliar or rusty with the command line tools that Unix provides. Similarly, if, like me, you do almost all your text manipulation from within Emacs, you might also be rusty with the command line tools. Happily, Kade Killary has a solution.

Killary has a very useful post entitled Command Line Tricks for Data Scientists that discusses those tools and how to use them…

The article starts:

For many data scientists, data manipulation begins and ends with Pandas or the Tidyverse. In theory, there is nothing wrong with this notion. It is, after all, why these tools exist in the first place. Yet, these options can often be overkill for simple tasks like delimiter conversion. Aspiring to master the command line should be on every developer’s list, especially data scientists. Learning the ins and outs of your shell will undeniably make you more productive.

My personal problem is I remember the command line utilities and forget the often better and easier approach to just use Emacs.

Usually I won’t think, start a terminal tab, kick off the openstack command with a pipe for further processing, and then curse after a minute waiting for the results, without remembering that shell-command (either C-x ! or SPC !) will put the output of the command in a buffer (named *Shell Command Output*),1 allowing me to repeatedly call the keep-lines and flush-lines commands until I have what I want.

Using Emacs is especially nice with these time-consuming commands, since automatically putting the output in a buffer is essentially memoizing the call.

Map of Unix Tool to Emacs Function

The article covers the following tools, and I’m appending a similar Emacs command. However, even without functional equivalents, I could simply shell out to the particular command:

Iconv
set-buffer-file-coding-system (C-x <Return> f)
Head
n/a, try narrow-to-region (C-x n n)
Tr
vr/replace / downcase-region
Wc
count-words (M-=)
Split
n/a as Emacs now handles this automatically.
Sort & Uniq
sort-lines / delete-duplicate-lines (or Spacemacs’ uniquify-lines).
Cut
this is useful, but I don’t know of anything similar in Emacsland.
Paste
combining two files (no, not appending, more like zipping) is a highly specific use case.
Join
like a text-oriented sql from text files that I have never used, mostly due to the lack of properly formatted files.
Grep
keep-lines / flush-lines
Sed
vr/replace
Awk
Emacs Lisp ;-)

After going through the list, I realize that there are a couple of features that without shelling out to the command, would probably require a quick macro, and run apply-macro-to-region-lines (C-x C-k r). However, cut is pretty useful from parsing CSVs or tabular output, and it wouldn’t be too difficult (and fun) to write a version in Lisp.

However, many of these commands actually don’t have a keybinding, since they aren’t used much, and I’m afraid I don’t always remember the function names. Perhaps it might be nice to have a minor-mode that could be hooked to shell-command output buffer with single keys to various functions, but perhaps I’m getting ahead of myself…

Pipe Replacement

The first command function that seems useful would be a replacement for the shell’s pipe. Essentially, we’d take the contents of a buffer, send it to a command, and replace the buffer’s contents with the output from the command. Let’s get highly creative, and call it pipe:

(defun pipe (command)
  "Replaces the contents of the buffer with the output from the
command given."
  (interactive "sCommand: ")
  (let ((current-prefix-arg '(4)))
    (shell-command-on-region (point-min) (point-max) command)))

Setting the current-prefix-arg variable like that fakes out Emacs’ interactive system to make the function believe the user pressed C-u before calling the function, which replaces the contents of the buffer with the output from the command.

Let me try it, by running the shell-command and typing openstack server list

Oh yeah, I need to set up my environment variables. Since I work with more than one cluster, I stored those in files that I call source. Might be fun to hack a Lisp function that reads a file into a temporary buffer, searches the buffer for KEY=VALUE pairs, and calls setenv on each:

(defun source-environment (file)
  "Add all environment variable settings from a script file into
the current Emacs environment, via the `setenv' function."
  (interactive "fSource file:")
  (save-excursion
    (with-temp-buffer
      (insert-file-contents file)
      ;; This hairy regular expression matches KEY=VALUE shell expressions:
      (while (re-search-forward "\\([A-z_]*\\) *= *[\"']?\\(.*?\\)[\"']?$" nil t)
        (let* ((key (match-string 1))
               (env-value (match-string 2))
               ;; Since the value could contain references to other environment
               ;; variables, we'll try to substitute what we find:
               (value (replace-regexp-in-string "${?\\([A-z_]*\\)}?"
                                                (lambda (p)
                                                  (getenv (match-string 1 p)))
                                                env-value t)))
          (setenv key value)
          (message "Stored environment variable %s = %s" key value))))))

[ After jamming that function out, I noticed someone had already made a project to do just this. ]

Now calling the command brings up a buffer, *Shell Command Output* with the output from the command:

+---------------------------------+------------------------+--------+------------------------+---------------------+
| ID                              | Name                   | Status | Networks               | Image Name          |
+---------------------------------+------------------------+--------+------------------------+---------------------+
| b0f1916f-c6ee-460f-ae2a-b89dcf6 | wpc-tempest            | ACTIVE | cedev13=192.168.146.51 | c7_inf_base_0.22.15 |
| e2a4e57e-9c3f-4e67-84c3-ef0f2f1 | wpc-infrastructure-3   | ACTIVE | cedev13=192.168.146.52 | c7_inf_base_0.22.15 |
| e83cde9d-8c10-4877-81ba-bbee901 | wpc-infrastructure-1   | ACTIVE | cedev13=192.168.146.53 | c7_inf_base_0.22.15 |
| 419bf293-4d65-4bb7-88e4-5adcfb6 | network-controller-1   | ACTIVE | cedev13=192.168.146.54 | c7_inf_base_0.22.15 |
| 3fa945e9-6d62-496e-9779-7ab2841 | wpc-compute            | ACTIVE | cedev13=192.168.146.55 | c7_inf_base_0.22.15 |
| e47aaa36-380b-4cc0-9695-cf8b605 | wpc-controller-1       | ACTIVE | cedev13=192.168.146.56 | c7_inf_base_0.22.15 |
| 3e69fc6e-48fc-4f41-9f46-47f9668 | wpc-infrastructure-2   | ACTIVE | cedev13=192.168.146.57 | c7_inf_base_0.22.15 |
| 250f6a42-1a88-40c0-a075-2338f51 | wpc-network-analytics  | ACTIVE | cedev13=192.168.146.58 | c7_inf_base_0.22.15 |
| ad430c07-403e-417b-8ac4-d73adfb | wpc-load-balancer-1    | ACTIVE | cedev13=192.168.146.59 | c7_inf_base_0.22.15 |
| 157a06c0-f442-4647-b50b-34df9b3 | wpc4-chef-server       | ACTIVE | cedev13=192.168.146.60 | wpc4-chef-server    |
+---------------------------------+------------------------+--------+------------------------+---------------------+

If I wanted to get the IP address of each of my virtual machines, I could begin by calling flush-lines with a parameter of --- to remove all the table separator lines. Of course, that doesn’t get rid of the header row, but deleting a single line is easy enough (Actually, I should have just called keep-lines with a parameter of ACTIVE since that seems to be common text on each line).

Now let’s try my new function. I call pipe with a command parameter of cut -d= -f2 and then call it again with cut -d' ' -f1 and I’m left with a buffer of IP addresses. I’m intrigued with this workflow.

As a work in progress, check out these new functions here.

Piper Mode

With a couple of functions scoped out, let’s figure out a way to easily present and use them.

Suppose we had a minor-mode that gets instantiated with the shell output buffer, and it has a single key-binding, the vertical bar key, |, that shows either a magit-popup buffer or a hydra with these emacs-equivalent command-functions. Since we might still want to have normal navigation and editing capability in the buffer, we should make few special bindings beyond what the Fundamental major mode provides.

(defvar piper-mode-map
  (let ((map (make-sparse-keymap)))
    (define-key map (kbd "<SPC>") 'scroll-up-command)
    (define-key map (kbd "<DEL>") 'scroll-down-command)
    (define-key map (kbd "C-|") 'piper-popup)
    map)
  "Keymap for `piper-mode'.")

(define-minor-mode piper-mode
  "Toggle Filtering mode."
  ;; The initial value.
  :init-value nil
  ;; The indicator for the mode line.
  :lighter " |"
  ;; The minor mode bindings.
  :keymap piper-mode-map
  :group 'piper)

(add-to-list 'emulation-mode-map-alists `((piper-mode . ,piper-mode-map)))

;; if using Evil-ness, simply do this:
(define-key evil-normal-state-map (kbd "|") 'piper-popup)

(magit-define-popup piper-popup
   "Show popup buffer featuring shell-like piper commands."
   'piper-commands
   :actions  '((?f "Flush"        flush-lines)
               (?k "Keep"         keep-lines)
               (?c "Cut Columns"  columns-lines)
               (?s "Sort"         sort-lines)
               (?u "Unique Lines" uniquify-lines)
               (?! "Next Pipe"    pipe))
   :default-action 'pipe)

Getting the details correct is left as an exercise to the Reader, and I am currently months into my experiment with Spacemacs.

With Spacemacs, this is much simpler, especially since many of these sorts of commands are bound to the prefix, SPC x l (for text lines), so, in order to sort the buffer, type: SPC x l s, and to reverse sort it, SPC u SPC x l s, which may seem like a lot of typing, but SPC u is Evil’s way of doing the prefix (C-u). Let’s do more:

(spacemacs/set-leader-keys "x l !" 'pipe)
(spacemacs/set-leader-keys "x l f" 'flush-lines)
(spacemacs/set-leader-keys "x l k" 'keep-lines)

Grab my current bindings here.

project-piper-01.png

Summary

In discussing this with Ken, he made the astute observation that instead of manipulating flat textual data, the first step should be to parse and render the data into an data structure within Emacs Lisp, and then transform and filter the data using normal programming techniques.

If the parsing is trivial, this makes perfect sense, as filtering with Dash’s -filter function is far preferable to grep. The openstack CLI has a little known option, --format, for spitting out parseable data in JSON and other formats, but I perhaps Ken is correct when he wrote to me:

inspired by the R statistical language, which possesses sophisticated facilities for transforming input data into R’s canonical data structure representation, then operates on that data structure until it produces a result. R deliberately distinguishes the ingestion, processing, and presentation phases, which is a useful mental model for what you hope to accomplish…

What approach would be less effort and greater reward in the long run? How much developer effort would be expended writing library code for each phase in order to parse it with a real language? It might be a significant challenge to build abstractions for handling the myriad text formats we encounter day-to-day, but would that be more effort than wiring together sequences of shell tool equivalents for each text format?

…I see a lot of value in a library that could intelligently parse common tabular data into an S-expression. OpenStack’s table format, for example, is an elaborate variation on CSV and it’s worth evaluating whether an extension to csv.el could close the gap.

Converting shell output formatted as a table into a list of lists data structure, is pretty trivial:

(defun table-to-lists (delimiter &optional trim)
  "Converts the contents of the buffer (or a region) into a list
of lists where each line should be separated by DELIMITER. If
TRIM to `t', each column element is trimmed of whitespace."
  (let ((start (if (region-active-p) (region-beginning) (point-min)))
        (end   (if (region-active-p) (region-end)       (point-max))))
    (mapcar (lambda (line) (split-string line delimiter nil trim))
            (split-string (buffer-substring-no-properties start end) "\n"))))

For the command line tools we use daily, creating these sorts of converters could be a big advantage, and in that case, the literate programming concepts I’ve described before, could really be helpful.

However, this essay is a workflow enhancement to general (almost single-use) commands. Clearly, this is also an experiment, as I’m not sure if I will actually finish this. However, I’m intrigued with this new workflow and would love to discuss these ideas.

Footnotes:

1

While calling shell-command always puts the output in this *Shell Command Output* buffer, it only displays this buffer automatically if the output is large. After running the command, simply switch-buffer to it.

Note: If you ran the command in a subprocess asychronously (either by appending an & character, or by calling async-shell-command), the output will be held in a different buffer, called *Async Shell Command*.