I'm currently trying to write an html conversion plugin for Sigil which runs on python 3.4 (external) or Sigil's bundled python 3.5+(internal). As part of the html sanitization process I currently use bs4 with python 3.4 and this works fine.
But when I use sigil_bs4 or gumbo_bs4.parse from the bundled python I do not get the same results as using bs4 -- because it simply doesn't work. Here is the code:
When I use this code with bs4 on my python 3.4 it works fine:
But when I write this code using sigil_bs4 or gumbo_bs4.parse with the bundle python swtched on it doesn't do the job and also doesn't give any specific errors.
I'm on Windows 8.
It seems that sigil_bs4 and gumbo_bs4.parse do not produce a callable BS object(taking no arguments) which is what I need for the above code to work. I've also used sigil_bs4 quite successfully throughout my plugin as a line by line parser for other formatting(but not as a callable object as above) throughout the html sanitization process.
Any further suggestions to make this code work for sigil_bs4 or gumbo_bs4 would be greatly appreciated.
This is my first python plugin(or major python app of any note).
But when I use sigil_bs4 or gumbo_bs4.parse from the bundled python I do not get the same results as using bs4 -- because it simply doesn't work. Here is the code:
When I use this code with bs4 on my python 3.4 it works fine:
Code:
from bs4 import BeautifulSoup as bs
html = open(file, 'rt', encoding='utf-8').read()
soup = bs(html, 'html.parser')
for tag in soup():
for attribute in ["lang", "id", "dir", "name" "link"]:
del tag[attribute]
Code:
from sigil_bs4 import BeautifulSoup as bs
(or import sigil_gumbo_bs4_adapter as gumbo_bs4)
html = open(file, 'rt', encoding='utf-8').read()
soup = bs(html, 'html.parser')
(or soup = gumbo_bs4.parse(html))
for tag in soup():
for attribute in ["lang", "id", "dir", "name" "link"]:
del tag[attribute]
It seems that sigil_bs4 and gumbo_bs4.parse do not produce a callable BS object(taking no arguments) which is what I need for the above code to work. I've also used sigil_bs4 quite successfully throughout my plugin as a line by line parser for other formatting(but not as a callable object as above) throughout the html sanitization process.
Any further suggestions to make this code work for sigil_bs4 or gumbo_bs4 would be greatly appreciated.
This is my first python plugin(or major python app of any note).