We are developing an input plugin for OSIS - an XML based format used for Bibles. We are finding that on conversion to EPUB, soft hyphens are being lost. These are actual UTF-8 soft hyphen characters generated by the input plugin as opposed to entities. It appears that these characters are being removed by the EPUB output plugin, since when using the --debug-pipeline option they are there in the html files in the processed directory, but when files are unzipped from the EPUB output they have gone. We cannot rely on HTML5 soft-hyphen dictionaries since we work with documents in many languages for which there is no hyphen dictionary (e.g. Uzbek). Is there a way of preventing these characters from being removed?
↧