In this article, I'll walk you through our process of creating a Chrome extension to scrape comments from LinkedIn posts. This project emerged from a need to analyze engagement on LinkedIn content more effectively, and while the journey had its challenges, we ultimately developed a functional solution that extracts meaningful data.
The Initial Challenge
LinkedIn's dynamic interface doesn't make it easy to extract comments at scale. Whether you're conducting social media analysis, gathering feedback on company announcements, or researching professional discourse, manually copying comments is impractical for posts with dozens or hundreds of responses.
Our goal was to build a browser extension that could:
- Load all comments on a LinkedIn post
- Extract the comment text along with author information
- Save the data in a structured format for analysis
Here is the code repository
Setting Up the Chrome Extension
We started by creating a basic Chrome extension structure with these files:
manifest.json- Configuration file for the extensionpopup.html- The user interface for our extensionpopup.js- The script that handles user interactions and communicates with the content script
Our manifest.json defined the necessary permissions:
{
"manifest_version": 3,
"name": "LinkedIn Comments Scraper",
"version": "1.0",
"description": "Extract comments from LinkedIn posts",
"action": {
"default_popup": "popup.html",
"default_icon": {
"16": "icons/icon16.png",
"48": "icons/icon48.png",
"128": "icons/icon128.png"
}
},
"permissions": [
"activeTab",
"scripting",
"downloads"
],
"host_permissions": [
"https://*.linkedin.com/*"
]
}The popup interface was kept simple - a button to trigger the scraping and a checkbox to enable auto-loading of all comments:
<button id="scrapeButton">Scrape Comments</button>
<div class="option">
<label>
<input type="checkbox" id="autoLoadComments" checked>
Auto-load all comments
</label>
</div>First Roadblock: Injecting the Content Script
Our first challenge came when we tried to execute the script on the LinkedIn page. We initially used a background script approach, but ran into issues with Manifest V3 limitations. After several attempts, we simplified our approach to directly inject the script from the popup:
const results = await chrome.scripting.executeScript({
target: { tabId: tab.id },
func: scrapeLinkedInPost,
args: [autoLoadComments]
});This approach worked better, though we still encountered syntax errors and had to ensure our script was well-formatted.
Building the Comment Scraper Logic
The heart of our extension was the scrapeLinkedInPost function. This function had several key components:
1. Auto-scrolling the Page
LinkedIn loads comments dynamically as you scroll, so we implemented an auto-scroll function:
async function autoScroll() {
return new Promise((resolve) => {
const maxScrolls = 20;
let scrollCount = 0;
let lastHeight = document.body.scrollHeight;
const timer = setInterval(() => {
window.scrollBy(0, 800);
scrollCount++;
// Check if we've reached the bottom
setTimeout(() => {
const newHeight = document.body.scrollHeight;
if (newHeight === lastHeight && scrollCount > 3) {
clearInterval(timer);
resolve();
}
lastHeight = newHeight;
}, 300);
if (scrollCount >= maxScrolls) {
clearInterval(timer);
resolve();
}
}, 600);
});
}2. Finding "Load More Comments" Buttons
We needed to click "Load More Comments" buttons to expand the comment section fully:
// Helper function to find buttons by text content
function findButtonsByText(text) {
const allButtons = document.querySelectorAll('button');
return Array.from(allButtons).filter(button =>
button.textContent &&
button.textContent.toLowerCase().includes(text.toLowerCase())
);
}
// Using the function to find comment loading buttons
const textButtons1 = findButtonsByText("Load more comments");
const textButtons2 = findButtonsByText("Show more comments");3. Finding Comment Elements
LinkedIn's DOM structure is complex and can change, so we used multiple selectors to identify comments:
const commentSelectors = [
'.comments-comment-item',
'[data-test-id^="comments-comment-"]',
'.scaffold-finite-scroll__content > div',
'.comments-comments-list > div'
];
for (const selector of commentSelectors) {
try {
const elements = document.querySelectorAll(selector);
if (elements.length > 0) {
allCommentElements = [...allCommentElements, ...Array.from(elements)];
}
} catch (e) {
console.log('Error with comment selector:', selector, e);
}
}Major Challenges and Solutions
Challenge 1: Invalid CSS Selectors
We initially used jQuery-style :contains() selectors, which aren't supported in standard DOM APIs:
// This doesn't work in standard JavaScript
'button:contains("Load more comments")'Solution: We created a custom function to find elements by their text content:
function findButtonsByText(text) {
const allButtons = document.querySelectorAll('button');
return Array.from(allButtons).filter(button =>
button.textContent &&
button.textContent.toLowerCase().includes(text.toLowerCase())
);
}Challenge 2: Duplicate Comments
Our initial implementation picked up profile elements as comments and created duplicate entries:
Solution: We improved our comment processing logic:
- Added filtering to exclude profile sections
- Created a tracking system using Map to prevent duplicates
- Extracted profile titles into a separate field
// Create a unique key for this comment to avoid duplicates
const commentStart = comment.text.substring(0, 30);
const commentKey = `${comment.author}:${commentStart}`;
// Check if we've seen this comment before
if (!processedCommentKeys.has(commentKey)) {
processedCommentKeys.set(commentKey, true);
processedComments.push(comment);
}Challenge 3: Download Mechanism
We encountered issues with URL.createObjectURL in the extension context:
Solution: We used a data URI approach instead:
chrome.downloads.download({
url: 'data:application/json;charset=utf-8,' + encodeURIComponent(jsonString),
filename: `linkedin_post_${Date.now()}.json`,
saveAs: true
});Final Implementation and Testing
After resolving these challenges, we had a working extension that successfully scraped LinkedIn post comments. Our process for each scrape was:
- User clicks the "Scrape Comments" button in the extension popup
- The script is injected into the current LinkedIn post page
- The page is scrolled and "Load more comments" buttons are clicked
- Comment elements are identified and processed
- A JSON file is generated and downloaded with the post data and comments
The JSON included:
- Post author and content
- Post metadata (timestamp, URL, like count)
- Comments with author name, profile URL, text, and timestamp
Lessons Learned
Throughout this project, we learned several important lessons:
- Browser Extensions Have Limitations: Manifest V3 introduces constraints on how scripts can be executed and communicate.
- DOM Traversal Requires Robustness: LinkedIn's DOM structure can vary, so using multiple selector approaches provides resilience.
- Duplicate Detection is Critical: When scraping content, implementing proper deduplication logic is essential.
- Error Handling Matters: Building in extensive error handling and logging helped identify and fix issues quickly.
- Testing in Real Scenarios: What works in a development environment may fail in the real LinkedIn interface, making thorough testing crucial.
Conclusion
While our LinkedIn comment scraper isn't perfect, it successfully extracts valuable data that would be tedious to collect manually. With each iteration, we improved its reliability and accuracy. The extension now provides a solid foundation for analyzing LinkedIn engagement, though like any web scraping tool, it may require updates as LinkedIn's interface evolves.
This project demonstrates that with persistence and problem-solving, it's possible to build effective tools for extracting and analyzing social media data, even from complex platforms like LinkedIn.