I am writing a webscraper in Python and I have figured out how to write the output to a CSV, but I’m not getting the right amount of links I need. I need to make sure I’m translating relative URL’s to absolute URLs and filtering out duplicates, but I’m not sure i have the code to do so. Here’s my code below.

from bs4 import BeautifulSoup

import requests

import csv

import re

url = ‘https:// up the connection and grabbing the page

r = requests.get(url).content

#passing the HTML through a parser

soup = BeautifulSoup(r, ‘lxml’)

#extracting urls

data = []

for link in soup.find_all(‘a’, href=True):

print(link[‘href’])

data.append(link[‘href’])

print(data)

#writing to a csv file

with open(‘assignment1.csv’, ‘w’, newline = ”) as f:

write = csv.writer(f, delimiter = ‘ ‘)

write.writerows([‘Links’])

write.writerows(data)

f.close()

write my assignment 8855

"Not answered?"

Get a Quote

Writing Services

Other Services

Contact Us