Browser Extension with Notion database Integration — Building ArXiv Enhanced (1)
ArXiv Enhanced is a browser extension which can be used to add comments, tags and other details for an arXiv paper and store them in a notion database. This article covers tips on building the chrome plugin, building with webpack, and touches upon the notion API. In a later article I will discuss other features of the extension — adding notes to pdf, highlighting and OAuth.
TL;DR
Implementing a Chrome plugin that connects with a Notion database
1. Problem
One of my current roles is deep learning researcher. Deep Learning (DL) being a fast-paced research field, to keep abreast of the advances in DL/AI, I start most of my days scanning arXiv for innovations (in sound and vision topics). To keep up with the glut of research going on, for a long time I maintained a google sheet with headings: [title, paper link, what’s novel, read again]
. A few months ago, I started using Notion and instantly fell in love with the tool.
Tracking read papers, highlighting important bits, adding notes got a bit time consuming. I explored if I could automate parts of it — and decided to write a chrome plugin.
MVP Requirements
- Add comments, tags, and a checkbox (completed reading)
- Only submit API call if the active tab is
https://arxiv.org/pdf/*.pdf
- If there is already an entry, populate the extension popup with the previously filled details
- Scrape
https://arxiv.org/abs/*
for the paper title
2. Integration with Notion
Notion is quite handy to manage projects, personal life, to-do lists, writing articles, etc. (and in a collaborative manner). For this article, I will assume a private database (table), however it can also shared with other accounts or make public which allows multiple people to track and use the database.
Step 1: Setting up and sharing the database
Notion allows app access using the Notion API. Here we want to set up a database (table) that records the arXiv paper details. (a) Create an integration named “ArXiv Enhanced Database” in my-integrations. (b) Next, create a full-page table with the necessary columns; which represents the database. The ID
of the database is in the URL: https://notion.so/workspace/<database_id>
. Share the database (full-page table) with the integration using the Share button. Check out [1] for more a more detailed explanation in creating an integration.
Step 2: Notion javascript Client
3. Chrome Plugin Development
The main concepts of a browser extension are background script, content script, UI elements and message passing.
- Background scripts respond to browser events and perform certain actions (for example fetching data).
- “Content scripts are files that run in the context of web pages”. Details of the web pages like changes that are made, etc. can be accessed in them.
- UI elements include a browser popup page (when the extension is clicked) and options page.
- Since content scripts run within the context of a web page, they require a mode of communication with the rest of the plugin. This is message passing
For the current plugin, we require a popup which has 2 input fields (comments and tags) and a checkbox to indicate whether the paper has been read. I use React and Chakra UI for this.
Scraping the title of the arXiv paper:
For a pdf
URL in arXiv, there’s a abs
URL which contains the abstract and title. We can fetch the title as follows in the background script:
// ***** in the background script *****
async function getPaperTitle(link) { let arxivAbstractLink = paperLink.substring(0, link.length - 4);
arxivAbstractLink = arxivAbstractLink.replace('pdf', 'abs'); const response = await axios.get(arxivAbstractLink);
let htmlString = response.data;
const result = htmlString.match(/<title>(.+)<\/title>/)try {
// result is of the form "[xxxx:xxxxx] paper title"
return result[1].split("] ")[1]; } catch (error) {
console.error(error);
return "";
}
}
Message Passing between background script and popup:
We will use a long-lived connection to communicate between the background script and popup. In the background script, we add a listener for onMessage
function. On receiving a message from the popup (get-current-url
), the background script gets the URL of the current tab and fetches the title of the current arXiv paper.
// ***** in the popup script *****
let connection = chrome.runtime.connect({ name: "Connection" });
connection.postMessage();
connection.onMessage.addListener(function(message) {
let title = message;
})// ***** In the background script *****
async function getCurrentTab() {
let queryOptions = { active: true, currentWindow: true };
let [tab] = await chrome.tabs.query(queryOptions);
return tab;
}chrome.runtime.onConnect.addListener(function(connection) {
connection.onMessage.addListener(function(msg) {
getCurrentTab()
.then((tab) => {
getPaperTitle(tab.url)
.then((title) => {
connection.postMessage(title);
});
});
})
})
4. Parcel / Webpack
We can build the previously discussed features using vanilla javascript as the extension expects, but we lose the convenience of using packages. Here, we used quite a few libraries (and React); we need to build it before using the extension on Chrome or other browsers.
In this process I discovered Parcel which I think is awesome. They also have a web-extension config which currently works only for Manifest V2 (for V3 it is under development: issue #6079). For this case, we can separately build the entries: popup.html
, popup.js
, and background.js
.
An alternative is to use webpack. Looking around I found this super useful boilerplate repository: chrome-extension-boilerplate-react, which works like a charm.
5. Conclusion and Future Work
- At the moment, I used secret_key method for notion authentication by storing the secret_key in
localstorage
. However, this is not safe. In the next article, I will look at using notion’s OAuth for authorization - Text highlighting and addition of notes to the pdf (and store it in the database)
- Creating arXiv links for the Reference papers