Browser Extension with Notion database Integration — Building ArXiv Enhanced (1)

ArXiv Enhanced is a browser extension which can be used to add comments, tags and other details for an arXiv paper and store them in a notion database. This article covers tips on building the chrome plugin, building with webpack, and touches upon the notion API. In a later article I will discuss other features of the extension — adding notes to pdf, highlighting and OAuth.

Akash Raj
4 min readNov 27, 2021
Notion database showing the various columns

TL;DR
Implementing a Chrome plugin that connects with a Notion database

1. Problem

One of my current roles is deep learning researcher. Deep Learning (DL) being a fast-paced research field, to keep abreast of the advances in DL/AI, I start most of my days scanning arXiv for innovations (in sound and vision topics). To keep up with the glut of research going on, for a long time I maintained a google sheet with headings: [title, paper link, what’s novel, read again]. A few months ago, I started using Notion and instantly fell in love with the tool.

Tracking read papers, highlighting important bits, adding notes got a bit time consuming. I explored if I could automate parts of it — and decided to write a chrome plugin.

MVP Requirements

  1. Add comments, tags, and a checkbox (completed reading)
  2. Only submit API call if the active tab is https://arxiv.org/pdf/*.pdf
  3. If there is already an entry, populate the extension popup with the previously filled details
  4. Scrape https://arxiv.org/abs/* for the paper title

2. Integration with Notion

Notion is quite handy to manage projects, personal life, to-do lists, writing articles, etc. (and in a collaborative manner). For this article, I will assume a private database (table), however it can also shared with other accounts or make public which allows multiple people to track and use the database.

Step 1: Setting up and sharing the database
Notion allows app access using the Notion API. Here we want to set up a database (table) that records the arXiv paper details. (a) Create an integration named “ArXiv Enhanced Database” in my-integrations. (b) Next, create a full-page table with the necessary columns; which represents the database. The ID of the database is in the URL: https://notion.so/workspace/<database_id>. Share the database (full-page table) with the integration using the Share button. Check out [1] for more a more detailed explanation in creating an integration.

Step 2: Notion javascript Client

3. Chrome Plugin Development

The main concepts of a browser extension are background script, content script, UI elements and message passing.

  • Background scripts respond to browser events and perform certain actions (for example fetching data).
  • “Content scripts are files that run in the context of web pages”. Details of the web pages like changes that are made, etc. can be accessed in them.
  • UI elements include a browser popup page (when the extension is clicked) and options page.
  • Since content scripts run within the context of a web page, they require a mode of communication with the rest of the plugin. This is message passing

For the current plugin, we require a popup which has 2 input fields (comments and tags) and a checkbox to indicate whether the paper has been read. I use React and Chakra UI for this.

Scraping the title of the arXiv paper:
For a pdf URL in arXiv, there’s a abs URL which contains the abstract and title. We can fetch the title as follows in the background script:

// ***** in the background script *****
async function getPaperTitle(link) {
let arxivAbstractLink = paperLink.substring(0, link.length - 4);
arxivAbstractLink = arxivAbstractLink.replace('pdf', 'abs');
const response = await axios.get(arxivAbstractLink);
let htmlString = response.data;
const result = htmlString.match(/<title>(.+)<\/title>/)
try {
// result is of the form "[xxxx:xxxxx] paper title"
return result[1].split("] ")[1];
} catch (error) {
console.error(error);
return "";
}
}

Message Passing between background script and popup:
We will use a long-lived connection to communicate between the background script and popup. In the background script, we add a listener for onMessage function. On receiving a message from the popup (get-current-url), the background script gets the URL of the current tab and fetches the title of the current arXiv paper.

// ***** in the popup script *****
let connection = chrome.runtime.connect({ name: "Connection" });
connection.postMessage();
connection.onMessage.addListener(function(message) {
let title = message;
})
// ***** In the background script *****
async function getCurrentTab() {
let queryOptions = { active: true, currentWindow: true };
let [tab] = await chrome.tabs.query(queryOptions);
return tab;
}
chrome.runtime.onConnect.addListener(function(connection) {
connection.onMessage.addListener(function(msg) {
getCurrentTab()
.then((tab) => {
getPaperTitle(tab.url)
.then((title) => {
connection.postMessage(title);
});
});
})
})

4. Parcel / Webpack

We can build the previously discussed features using vanilla javascript as the extension expects, but we lose the convenience of using packages. Here, we used quite a few libraries (and React); we need to build it before using the extension on Chrome or other browsers.

In this process I discovered Parcel which I think is awesome. They also have a web-extension config which currently works only for Manifest V2 (for V3 it is under development: issue #6079). For this case, we can separately build the entries: popup.html, popup.js, and background.js.

An alternative is to use webpack. Looking around I found this super useful boilerplate repository: chrome-extension-boilerplate-react, which works like a charm.

5. Conclusion and Future Work

  • At the moment, I used secret_key method for notion authentication by storing the secret_key in localstorage. However, this is not safe. In the next article, I will look at using notion’s OAuth for authorization
  • Text highlighting and addition of notes to the pdf (and store it in the database)
  • Creating arXiv links for the Reference papers

References

  1. https://developers.notion.com/docs/getting-started
  2. GitHub repo: https://github.com/akashrajkn/arxiv-enhanced

--

--