Quantcast
Channel: MobileRead Forums - Reading and Management
Viewing all articles
Browse latest Browse all 24040

Problems using sigil_bs4 and gumbo_bs4.parse

$
0
0
I'm currently trying to write an html conversion plugin for Sigil which runs on python 3.4 (external) or Sigil's bundled python 3.5+(internal). As part of the html sanitization process I currently use bs4 with python 3.4 and this works fine.

But when I use sigil_bs4 or gumbo_bs4.parse from the bundled python I do not get the same results as using bs4 -- because it simply doesn't work. Here is the code:

When I use this code with bs4 on my python 3.4 it works fine:

Code:

from bs4 import BeautifulSoup as bs

    html = open(file, 'rt', encoding='utf-8').read()
    soup = bs(html, 'html.parser')
   
    for tag in soup():
        for attribute in ["lang", "id", "dir", "name" "link"]:
            del tag[attribute]

But when I write this code using sigil_bs4 or gumbo_bs4.parse with the bundle python swtched on it doesn't do the job and also doesn't give any specific errors.

Code:

from sigil_bs4 import BeautifulSoup as bs
(or import sigil_gumbo_bs4_adapter as gumbo_bs4)

    html = open(file, 'rt', encoding='utf-8').read()
    soup = bs(html, 'html.parser')
    (or soup = gumbo_bs4.parse(html))

    for tag in soup():
        for attribute in ["lang", "id", "dir", "name" "link"]:
            del tag[attribute]

I'm on Windows 8.

It seems that sigil_bs4 and gumbo_bs4.parse do not produce a callable BS object(taking no arguments) which is what I need for the above code to work. I've also used sigil_bs4 quite successfully throughout my plugin as a line by line parser for other formatting(but not as a callable object as above) throughout the html sanitization process.

Any further suggestions to make this code work for sigil_bs4 or gumbo_bs4 would be greatly appreciated.

This is my first python plugin(or major python app of any note).

Viewing all articles
Browse latest Browse all 24040

Trending Articles