Hi all,
for sientific purposes I need to split the downloaded news in a bunch of files. So for each article in the newspaper I would get something like this:
[title]_[language]_[date]_[time].txt
Which I could archive every day.
The txt should only contain the title and the text of the article.
I don't want any images or table of contens or anything.
For NY-Times from today for instance
NyTimesSub_en_20170112_2045.txt
NyTimesSub_en_20170113_2159.txt
.
.
.
NyTimesSub_en_20170113_1041.txt
NyTimesSub_en_20170113_1045.txt
NyTimesSub_en_20170113_1053.txt
What would be the most easy way to achieve this?
Thanks for any hint!
for sientific purposes I need to split the downloaded news in a bunch of files. So for each article in the newspaper I would get something like this:
[title]_[language]_[date]_[time].txt
Which I could archive every day.
The txt should only contain the title and the text of the article.
I don't want any images or table of contens or anything.
For NY-Times from today for instance
NyTimesSub_en_20170112_2045.txt
NyTimesSub_en_20170113_2159.txt
.
.
.
NyTimesSub_en_20170113_1041.txt
NyTimesSub_en_20170113_1045.txt
NyTimesSub_en_20170113_1053.txt
What would be the most easy way to achieve this?
Thanks for any hint!