NEWS DATA SUMMARIZER APP

Enter or paste the URL link of a news article to extract the news data summary:


Can't think of or find URL link of a news article?

* please note many major news sites can not be scraped due to site protection, cookies, etc
so if a link is not working, please pick a link from examples below.

Here are few examples of URL link you can copy and paste into the form above!

Please visit my Portfolio site or my github for the code of this app.

About this app

Previously I built a simple News Scraper APP on the web using Python to scrapethe latest news from a specific news site using Beautiful Soup and Flask.

This time, I built slightly more advanced version of the app to scrape news data from a news article using Python package newspaper3k, then deployed the app using Flask and on Google App Engine.

First of all, when the URL link form above captures the URL link of a news article, the newspaper3k package will extract and parse the data of the article with its Natural Language Processing.For form input handling and validation, I used WTForms and requests libraries to grab the URL link entered in the form. Then, from the data extracted I extract following data to render on the first part of my result page:

Title
Published date
Author
Top image (source link)

At the same time, using the full text of the article extracted,my app also generates WordCloud for the news article.The WordCloud on the result page will display the words that are the most frequent among the news text extracted.io library is used to keep the WordCloud image in memory and base64 to convert the resulting bytes to base64 in order to return the image as part of our HTML response and render the image. * Please note WordCloud is currently disabled due to image storage issue

Lastly newspaper3k can also run its simple natural language processing to extract keywords from the news and also produce the summary of the article text.

Keywords (WordCloud image)
Summary

Keywords(WorldCloud) image and the summary of the news text will be displayed as the second part of the result page.

Please visit my Portfolio site or my github for the code of this app.