This project involves extracting textual data from a list of URLs and performing comprehensive text analysis using Python. The process includes web scraping to retrieve article content, followed by natural language processing (NLP) to compute various sentiment and readability metrics. The analysis calculates key variables such as Positive Score, Negative Score, Polarity Score, Subjectivity Score, FOG Index, and more, providing insights into the sentiment and complexity of the extracted articles.
Key Features:
Data Extraction: Utilized Python libraries like BeautifulSoup, Selenium, or Scrapy for efficient and accurate data extraction from web articles.
Text Analysis: Implemented various NLP techniques to compute sentiment and readability metrics including Positive/Negative Score, Subjectivity, FOG Index, etc.
Automated Workflow: Developed a Python script to automate the data extraction and analysis process, with outputs formatted as per the required structure.
Documentation: Included detailed instructions on running the script and replicating the analysis.
Technologies Used:
Python, BeautifulSoup, Selenium, Scrapy, NLP, pandas, Excel/CSV