在Emacs中使用recoll搜索文件

1. Building Recoll
2. Configuring Recoll
3. Using Recoll from Emacs
4. Outro

我知道大多数Emacs hackers都很喜欢grep的简洁性和可用性,但是它也有不好用的时候. 比如在我的Org-mode目录下包含了许多的org文件和PDF文件. 这些文件太多了即使用grep也不够快,而且PDF的结构使之也不能用grep来搜索内容. 在这种情况下,我们需要另一种工具:desktop database.

我是在阅读了John Kitchin一篇关于swish-e 的博文后才了解这个领域的, 然而这个软件在我这一直不能正常运作. 好在,作为对他这篇博文的回应,我在Org-mode的mailing list中发现了另一个工具 - recoll. 在本文中,我将会一步步的指引你在Emacs中使用Recoll.

1 Building Recoll

我假设你使用的是GNU/Linux操作系统, 毕竟在我印象中,这是最适合用来构建软件的操作系统了. 而且我也只用过这个操作系统,要我说其他操作系统上的步骤会很困难.

如果你只想玩玩Recoll的图形端,你可以用下面语句来安装:

sudo apt-get install recoll

然而可惜的是,这个package并不包含 recollq 这个shell工具, 我们需要下载源代码来编译. 当前版本是1.20.6.

1.1 Extract the archive

下载压缩包后, 我在 dired 中打开 ~/Downloads 这儿目录然后按下 & (dired-do-async-shell-command). Emacs会根据tar.gz的后者推测要执行的命令应该是 tar zxvf. 直接按下回车后,压缩包就解压缩到当前目录中了. 事实上,我用 ~/Software/ 来存放那些从tar包中安装的东西,因为我不希望 ~/Downloads 目录下有太多的东西.

1.2 Open ansi-term

在 dired 中进入 recoll-1.20.6/ 目录,然后按下 ` 键,在当前目录下打开一个 *ansi-term* buffer.

下面是相关配置(摘自于我的完整配置):

(defun ora-terminal ()
  "Switch to terminal. Launch if nonexistent."
  (interactive)
  (if (get-buffer "*ansi-term*")
      (switch-to-buffer "*ansi-term*")
    (ansi-term "/bin/bash"))
  (get-buffer-process "*ansi-term*"))

(defun ora-dired-open-term ()
  "Open an `ansi-term' that corresponds to current directory."
  (interactive)
  (let ((current-dir (dired-current-directory)))
    (term-send-string
     (ora-terminal)
     (if (file-remote-p current-dir)
         (let ((v (tramp-dissect-file-name current-dir t)))
           (format "ssh %s@%s\n"
                   (aref v 1) (aref v 2)))
       (format "cd '%s'\n" current-dir)))
    (setq default-directory current-dir)))

(define-key dired-mode-map (kbd "`") 'ora-dired-open-term)

1.3 Configure and make

下面是典型的构建过程.

./configure && make
sudo make install
cd query && make
which recoll
sudo cp recollq /usr/local/bin/

我在5年前完全是个Linux新手,对这些shell命令完全无感. 仅仅运行前两个命令你应该就能构建并安装大部分的软件了,如果你想要学习一些软件的话,你一般需要先通过这两句命令来构建出这些软件来.

在执行第一条命令(事实上仅仅在执行./configure)时报了一个错误说我缺少一个库,我需要先安装好这个库然后重新执行一次 ./configure.

sudo apt-get install libqt5webkit5-dev

我想,下面这条命令应该也行:

sudo apt-get build-dep recoll

2 Configuring Recoll

我仅仅在选择索引目录时才运行图形界面. 默认情况下它使用的是home目录,我不想这样所以把它替换成了 ~/Dropbox/org/. 很明显,通过cron job,我们可以让recoll自动生成索引; 你甚至可以在图形界面中配置自动生成索引的时间.

3 Using Recoll from Emacs

Emacs有大量的选项用于处理shell命令的输出结果. 所以第一步,我需要搞清楚这个shell命令的数据结果是怎么样的. 我们试试用它来输出一系列包含"Haskell"的文件:

recollq -b 'haskell'

之后就可以将该命令的结果通过异步的 ivy-read 界面来展示:

(defun counsel-recoll-function (string &rest _unused)
  "Issue recallq for STRING."
  (if (< (length string) 3)
      (counsel-more-chars 3)
    (counsel--async-command
     (format "recollq -b '%s'" string))
    nil))

(defun counsel-recoll (&optional initial-input)
  "Search for a string in the recoll database.
You'll be given a list of files that match.
Selecting a file will launch `swiper' for that file.
INITIAL-INPUT can be given as the initial minibuffer input."
  (interactive)
  (ivy-read "recoll: " 'counsel-recoll-function
            :initial-input initial-input
            :dynamic-collection t
            :history 'counsel-git-grep-history
            :action (lambda (x)
                      (when (string-match "file://\\(.*\\)\\'" x)
                        (let ((file-name (match-string 1 x)))
                          (find-file file-name)
                          (unless (string-match "pdf$" x)
                            (swiper ivy-text)))))))

这段代码挺简单的:

输入至少3个字符后才开始搜索,这样能避免产生太多的结果.
:dynamic-collection t 意味着每次输入一个新字符都会调用一次 recollq.
在 :action 参数中,我指定一个函数打开选中的文件,然后在该文件中用当前的输入作为参数运行swiper.

4 Outro

有一些东西希望你能注意到:

cd ~/Dropbox/org && du -hs
# 567M .

我的目录下有近半个G的文件,所有这些文件都被索引了,而且在Emacs中每输入一个新字符都需要更新一下文件列表.

如果你还知道什么工具要搞过recoll的(我对它通过 -A 选项的输出内容不是很满意), 定一定分享出来. 而且, 我才发现已经有了一个helm-recoll package了,如果你喜欢Helm的话可以直接使用它.

EMACS-DOCUMENT