Reading-For-Programmers
目录
内容提要: 使用三步阅读法(verify(查证)-grasp(领会)-analyse(分解))轻松阅读 数百篇论文. 使用 org-mode (w/ helm-bibtex, org-ref and interleave) 来记笔记. Abstract: using the three-step reading method (verify (verification) - grasp (understand) - analyse (decomposition)) easily read [[https://twitter.com/peel/status/840604048629874688] [hundreds of paper]]. Use the org - mode (w/helm - bibtex, org - ref and interleave) to take notes.
Introduction
*介绍
随着计算资源迅速变得廉价以及分布式编程的兴起,软件商务与学术贴合的越来越紧密. 现在软件得以,而且确实严重依赖于学术的进步. 前沿研究应用于日常代码也开始变得很常见. 我们当然可以忽略代码背后的那些概念,但是也可以拥抱变化,努力适应这一变化. 你可以使用现成的库和开源方案并忽略其内部原理,也可以尝试去理解这些方案的理念,查看其内部实现. 要跟上科学进步的步伐需要有强大的意志力,甚至有时为了阅读一片科技论文需要进行10个阅读步骤. (Following the scientific advancements requires a significant willpower and either way it merely leads toward 10 Stages of Reading a Scientific Paper1.) (跟随着科学的进步需要强大的意志力,不管怎样,阅读一篇科学论文只需要10个阶段。) 不过你大可不必如此,只需要使用某个方法就可以跳过那些令人痛苦的步骤...
Three-pass reading
*三通式阅读
不止一次地,有人提出2, 3 阅读论文 (事实上是阅读任何需要充分理解高级概念的东西) 需要分多步来进行. More than once, someone put forward [[https://codearsonist.com/reading-for-programmers#fn.2], [2]], [[https://codearsonist.com/reading-for-programmers#fn.3], [3]] reading papers (in fact is reading anything need to fully understand the advanced concepts) need more steps. 其主要思想是,在每次阅读论文时,你的头脑中都需要有一个明确的目标. 第一步,你需要查证这篇文章及其内容; 第二步,你需要领会文章的大意; 第三步,你完全理解这些概念 and get your notes flowing. First, you need to verify the article and its contents. Second, you need to get the general idea of the passage; Step three, you fully understand the concepts and get your notes flowing.
First pass - Verify
**首次通过-验证
第一步需要你从高层视角来审视论文的内容(The first pass provides you with a high-level view of the contents.) The first pass provides you with a high-level view of The contents. 它使你不会因为糟糕的写作,不感兴趣的内容,错误的或者不足的知识背景而卡住. 为了充分利用这第一步,我常常从引用文献开始. 虽说根据统计, 80%的作者都不会通读其所引用的文字 4, 但是这些文献在很大程度上能告诉你能从这篇文章中收获到哪些东西. In order to make full use of this first step, I often start from references. According to statistics, 80% of the authors will not read through the reference of text [[https://codearsonist.com/reading-for-programmers#fn.4], [4]], but these documents can tell you can to a great extent to which benefited from this article. 我在开始阅读一篇又长又复杂的论文之间,我都会先标记一下哪些被引用的文章是我之前已经阅读过的,这样方便我查回之前所记录的笔记并且查证论文的背景知识. 当第一次阅读论文时,我会仔细阅读它的标题.摘要和引言. 他们决定了这篇文章是否值得阅读. 然后,我再扫读一遍这篇文章,主要关注它的章节标题,图形和数学公式. 我还会阅读论文的结论和讨论部分(I also read the results and discussion section to have an overview of where I will be lead with the article). I will also read the results and discussion section to have an overview of where I will be lead with the article. 最近风靡HackerNews的论文 "How to read papers" 2 的作者建议你在第一次阅读时应该回答5个以C开头的问题: Recently popular HackerNews paper "How to read the cca shut" [[https://codearsonist.com/reading-for-programmers#fn.2], [2]] in reading for the first time authors suggest that you should begin with C 5 questions:
- Category - 这是那种类型的论文?
Plus Category -- what kind of paper is this?
- Context - 它还与那些论文相关? 论文中使用了那些理论基础来分析问题?
- Context - it's also relevant to those papers, right? What theoretical basis is used in the paper to analyze the problem?
- Correctness - 论文所基于的假设条件是否真实?
Are the assumptions on which the paper is based true?
- Contributions - 这篇文章的主要贡献有哪些?
- Contributions - what are the main Contributions of this article?
- Clarity - 这篇文章写够不够好?
- Clarity - is this article good enough?
就我自己来说,第一次阅读除了让我了解论文的基本脉络之外,还决定了是否要进行第二次的阅读. 我会问我自己:
- 我能从中学到什么?
- 我是否有足够的背景知识来理解这篇文章?
阅读论文诚然可以极大地丰富你的知识,提供实践方案,学到做事的方法论,甚至提供实验设计的概念(experimental design concepts). 但它也要求你有相应的先验知识(背景知识).
Reading papers can greatly enrich your knowledge, provide practical solutions, learn methods of doing things, and even provide experimental design concepts. But it also requires prior knowledge (background knowledge).
当我在第一次阅读时发现有缺失的背景知识时,我会将这篇文章标记为 TODO
, 并增加一个 :advanced:
标签,然后把它放在一边直到我通过阅读它的引用文献或通过Internet补足了背景知识后才再次阅读.
When I find a lack of background on my first reading, I mark the article as TODO
, add a :advanced:
tag, and then put it aside until I've supplemented the background by reading its references or on the Internet.
Second pass - Grasp concepts
**第二遍-掌握概念
第二次阅读的目的在于理解概念及其支持证据. 我一般将论文放在的 Onyx Boox 大屏电子纸阅读器中而不会打印出来.
The purpose of the second reading is to understand the concepts and supporting evidence.
这使得我可以很方便地跟上思维的处理过程(follow thought process), 将注意力放在概念上而不是具体的细节上.
This makes it easy for me to follow the thought process and focus on concepts rather than details.
在阅读的过程中,我会标记,高亮重点并在纸张的边上写下笔记. 之后,我只要快速扫一下就能知道这篇文章的重点了.
我会尝试通过表格和图形展示来加深理解.
当遇到我不能理解的东西时, 要看它是否会影响到后面概念的理解,还是说我可以把它记录下来下次再来学习它.
若会影响到后面概念的理解,则在进行后续的阅读前必须先把它搞定. 所以… 在google上搜索它, 或者翻阅之前的笔记, 总之要把它学会.
If it will affect the understanding of the latter concepts, it must be settled before further reading. So... Google it, or flip through your notes, and learn it.
第二次阅读时我还会关注 内联引用(inline references)
. 有些引用是 instant hook, 我会把他们添加到待阅读的论文列表中.
On the second reading I will also focus on = inline references =. Some references are instant hooks and I will add them to the list of papers to read.
这样在第二次阅读之后,我就有了一个满是笔记的,标注过的,高亮过的论文了. 而且我还理解了它的概念并且能将这些理念分享给他人.
不过对于那些很重要的概念来说,这还不够,还需要再阅读第三遍.
Third pass - Critique
**第三关-评论
只有对那些真正感兴趣的论文才需要进行第三阶段的阅读. 第三阶段阅读是对第二阶段阅读的补充, I work my way through the parts I missed previously and try to see the results/discussion in a larger spectrum. 我通过我以前错过的部分努力工作,并试图在更大的范围内看到结果/讨论。 Ie. How does profunctor optics relate to extensive domains we've built? Ie。前函光学与我们已经建立的广泛领域有什么关系? 这第三阶段的阅读也是我有别于某些学者的地方. 俗话说,在评论一片论文之前,你先要能用自己的话复述出这篇论文来. 只有对我的工作至关重要的论文才值得我进行第三阶段的阅读. 我会尝试使用自己的方法来重新论证该论文. 这个过程确实很让人烦躁,但是最终你会发现这样做是值得的. The third pass usually ends up with a bit of code and notes. Or just the notes. But the notes are where it all belongs. 第三步通常以一些代码和注释结束。或者只是笔记。但笔记才是一切的归宿。 Third pass is for the active reader. Even if the article is not core to the things I do and I decide not to fiddle around in REPL I am an active reader. 第三步是主动阅读者。即使这篇文章不是我所做事情的核心,我也决定不在REPL中浪费时间,但我是一个积极的读者。 我标注了更多的内容,记录了更多的笔记. 我不会直接复制论文中的内容,而是使用自己的话来复述,以此保证我理解了它的意思. 我使用大纲来使笔记呈现出结构化的形式来. 我极度关注论文中不连贯的地方. 它的理由是否正确? 案例的数量是否足够?
The Notes
*笔记
多年来,我一直在改进记笔记的流程. 当具有巨量的笔记和论文时,找到相关的笔记就变得越来越困难了. 而且要将它们全都保存在纸质笔记本中也是不可能的. 作为一名Emacs热爱者,我使用无所不能的 org-mode 来记笔记. org-mode 是一个可扩展的 Emacs major mode, 它能用来处理任何与文本/数据相关的事情. As an Emacs enthusiast, I use the omnipotent org-mode to take notes. 借助 org-mode 简洁的语法以及它的树状布局, 很容易就能构建一个简单的,单文件的知识库. With the clean syntax of org-mode and its tree-like layout, it's easy to build a simple, single-file knowledge base.
The workflow
*工作流
这几年来,我在 papers.org
中积攒了大量的笔记, papers.bib
中也保存了大量的文献引用.
Over the years, I have accumulated a large number of notes in papers.org
and a large number of citations in papers.bib
.
这两个文件中包含了大量的标记为TODO的书籍,论文和文章. 通常为了防止在阅读文章过程中太过分心,我会用 org-capture5 模板快速创建一个TODO事项(可能是论文,文章,链接等等)
The two file contains a large number of marked as TODO books, papers and articles. Usually in order to prevent too much in the process of reading the article, I'll use org - the capture [[https://codearsonist.com/reading-for-programmers#fn.5] [5]] template to quickly create a TODO item (could be papers, articles, links, etc.)
我会不时的(通常在阅读另一片论文前)将整个文件中快速创建的TODO事项保存到文献引用中.
From time to time (usually before reading another paper) I will save the TODO items that are quickly created throughout the document to a reference.
Figure 2: Bibtex has been a de-facto standard reference management system for years now 图2:多年来Bibtex一直是事实上的标准参考管理系统
Bibtex has been a de-facto standard reference management system for years now. Hence it 多年来,Bibtex一直是事实上的标准参考资料管理系统。因此它 is perfectly possible to grab all the necessary document details from the Internet. Either by 完全有可能从互联网上获取所有必要的文档细节。通过 searching by name, title, tag or… a pdf file. I usually either drag and drop a downloaded pdf 搜索名字,标题,标签或…一个pdf文件。我通常拖放下载的pdf文件 onto Emacs window with references files so it fetches the data on it's own. Or… just use the 到Emacs窗口与引用文件,以便它获取自己的数据。或者用 beautiful helm-bibtex which allows me to quickly access all the major scientific search 美丽的helm-bibtex,这让我可以快速访问所有主要的科学搜索 engines from arxiv to google scholar. 从arxiv到谷歌scholar的引擎。
Figure 3: helm-bibtex allows quick access to references 图3:helm-bibtex允许快速访问引用
I also turn the capture TODO into a document TODO task in the papers.org itself. However to 在papers.org中,我还将捕获TODO转换为文档TODO任务。然而, keep thing optimised, it gets done using the reference - enter org-ref. A quick shortcut and 保持事情的优化,它完成使用的参考-进入org-ref。一个快速的捷径和 the reference and TODO are now linked. My usual workflow for taking notes starts with the 引用和TODO现在链接了。我通常做笔记的工作流程是从 third pass which I usually do in Emacs' pdf-tools anyway. Running a REPL or a worksheet 我通常在Emacs的pdf-tools中进行第三次传递。运行REPL或工作表 side-by-side with a paper is invaluable. Same goes for taking notes. And guess what, 与论文并列是无价的。记笔记也是一样。你猜怎么着, everything I have done so far enables me to use a single command to link notes to specific 到目前为止,我所做的一切都使我能够使用单个命令将注释链接到特定的 places in a pdf. Enabling interleave mode (M-x interleave, duh) on given subtree (with 在pdf中的位置。在给定的子树上启用交织模式(M-x交织,duh) :INTERLEAVE_PDF: property set) allows that by simply attaching pdf location. And voila: :INTERLEAVE_PDF:属性设置)允许简单地附加pdf位置。瞧:
Figure 4: iterleave allows linking notes to pdf parts 图4:iterleave允许将注释链接到pdf部分
With that at hand I'm able to effectively keep the notes neatly connected to source material. 有了它,我就能有效地把笔记整齐地与原始资料联系起来。 And between each other using org-mode subtree search and tags. 并在彼此之间使用组织模式的子树搜索和标签。
The setup
*的设置
配置十分简单,只需要安装一些包然后设置极少的一些配置项就行了. 我将我的dotfiles连同spacemancs的配置一同存储在github仓库中. Configuration is simple, just install a few packages and set up a few configuration items. I store my dotfiles with spacemancs configuration in the github repository. 请随意取用-dotfiles.codearsonist.com. [[https://dotfiles.codearsonist.com] [please help yourself - dotfiles.codearsonist.com]].
pdf-tools
* * pdf工具
你需要预选安装好这个包后才能让Emacs正确地现实pdf文件. 我直接使用的默认配置,没有做任何其他的更改. You will need to pre-install this package to get Emacs to properly live PDF files. I used the default configuration directly without making any other changes.
org-ref
* * org-ref
org-ref 也只要做很少的一些配置就行了,它这里的配置跟 helm-bibtex
配置要一致:
The org-ref also only has to do a little bit of configuration, which is the same as helm-bibtex
configuration:
# + BEGIN_SRC emacs lisp (setq org-ref-notes-directory "$SOME" (setq org-ref-notes-directory”一些“美元 org-ref-bibliography-notes "$SOME/index.org" org-ref-bibliography-notes“一些美元/ index.org” org-ref-default-bibliography '("$SOME/index.bib") org-ref-default-bibliography”(“一些美元/ index.bib”) org-ref-pdf-directory "$SOME/lib/") org-ref-pdf-directory一些/ lib /美元)
helm-bibtex
* * helm-bibtex
我觉得的应该从 org-ref
的配置中导出 helm-bibtex
的配置. 不过你也可以直接配置 helm-bibtex
:
I think you should export the configuration helm-bibtex
from the configuration org-ref
, but you can also configure the configuration helm-bibtex
:
# + BEGIN_SRC emacs lisp (setq helm-bibtex-bibliography "$SOME/index.bib" ;; where your references are stored (setq helm-bibtex-bibliography " $ /索引。围涎”;;您的引用存储在哪里 helm-bibtex-library-path "$SOME/lib/" ;; where your pdfs etc are stored helm-bibtex-library-path“一些/ lib /美元”;;你的pdf文件等存放在哪里 helm-bibtex-notes-path "$SOME/index.org" ;; where your notes are stored helm-bibtex-notes-path“一些美元/ index.org”;;你的笔记放在哪里 bibtex-completion-bibliography "$SOME/index.bib" ;; writing completion bibtex-completion-bibliography " $ /索引。围涎”;;编写完成 bibtex-completion-notes-path "$SOME/index.org" bibtex-completion-notes-path“一些美元/ index.org” )
interleave
* *交错
没什么好配置的. 在 papers.org
的子树中设置好 :INTERLEAVE_PDF:
属性就行了 🎉️
null
Picking the next paper
*选择下一张纸
As a side note. The Internet is full of papers. Hackernews, Twitter stream, Reddit produce 作为补充说明。互联网上到处都是文件。Hackernews, Twitter流,Reddit农产品 must read items quicker than we will ever be able to follow. From my personal experience 必须阅读项目比我们将能够跟上。就我个人经验而言 though the best source of papers are simply references from other papers. Each specialty 虽然最好的论文来源只是参考其他论文。每个专业 has its own paper 'canon'. Start with them and gradually work your way towards others 有自己的纸“佳能”。从他们开始,慢慢地向其他人靠近 either by following citations (CiteSeer, Google Scholar) or references directly. Keep in mind 可以通过引用(CiteSeer,谷歌Scholar)或直接引用。请记住 that citations number is a pretty good sanity check whenever a paper is getting 无论什么时候,引用数都是一个很好的完整性检查 recommended. 推荐。
Summary
*总结
Armed with a method of reading scientific material I have read numerous deeply technical 以阅读科学材料的方法为武装,我已经阅读了大量深入的技术性材料 papers. Often beyond my usual knowledge level. The approach allows me for improving my 论文。经常超出我通常的知识水平。这种方法使我能够提高我的 reading skills (also see: 6) with each paper I read. The more I read the better my 阅读技巧(参见:[[https://codearsonist.com/reading-for#fn.6[6]])。我读得越多越好 understanding is. I am able to share the knowledge by discussing it with other people. All 的理解是。我能够通过与他人讨论来分享知识。所有 the above is the basic workflow idea I have been working with and find it perfect for my 以上是我一直使用的基本工作流程思想,并发现它非常适合我的工作 needs. There is more to it including automated tag dependency graphing I have 的需求。还有更多,包括自动标签依赖绘图我有 implemented. But that is a separate (long) story… 实现的。但那是另一回事了……
Footnotes: 脚注:
1 Ruben, A. (2016). How to read a scientific paper. Accessed at 07/06/17 [https://codearsonist.com/reading-for#fnr.1][1]]鲁本,A.(2016)。如何阅读科学论文。访问07/06/17
2 Pain, E. (2016). How to (seriously) read a scientific paper. Accessed at 07/06/17 [https://codearsonist.com/reading-for#fnr.2][2]] Pain, E.(2016)。如何(严肃地)阅读一篇科学论文。访问07/06/17
3 Keshav, S. (2013). How to Read a Paper. Accessed at 07/06/17 [https://codearsonist.com/reading-for#fnr.3][3]] Keshav, S.(2013)。如何阅读报纸。访问07/06/17
4 Simkin, M.V. and Roychowdhury V.P. (2002). Read before you cite! Accessed at 07/06/17 [https://codearsonist.com/reading-for#fnr.4][4]] Simkin, M.V.和Roychowdhury V.P.(2002)。在你引用之前阅读!访问07/06/17
5 A quick-access scrapnote-taking utility 5一个快速访问的剪贴簿工具
6 Bayard, P. (2009). How to Talk About Books You Haven't Read. Bloomsbury USA [https://codearsonist.com/reading-for#fnr.6][6]] Bayard, P.(2009)。如何谈论你没读过的书?布卢姆茨伯里派美国