SickGear/lib/unidecode/__init__.py

# -*- coding: utf-8 -*-
# vi:tabstop=4:expandtab:sw=4
"""Transliterate Unicode text into plain 7-bit ASCII.

Example usage:
>>> from unidecode import unidecode
>>> unidecode(u"\u5317\u4EB0")
"Bei Jing "

The transliteration uses a straightforward map, and doesn't have alternatives
for the same character based on language, position, or anything else.

In Python 3, a standard string object will be returned. If you need bytes, use:
>>> unidecode("Κνωσός").encode("ascii")
b'Knosos'
"""
import warnings
from sys import version_info

Cache = {}


def _warn_if_not_unicode(string):
    if version_info[0] < 3 and not isinstance(string, unicode):
        warnings.warn(  "Argument %r is not an unicode object. "
                        "Passing an encoded string will likely have "
                        "unexpected results." % (type(string),),
                        RuntimeWarning, 2)


def unidecode_expect_ascii(string):
    """Transliterate an Unicode object into an ASCII string

    >>> unidecode(u"\u5317\u4EB0")
    "Bei Jing "

    This function first tries to convert the string using ASCII codec.
    If it fails (because of non-ASCII characters), it falls back to
    transliteration using the character tables.

    This is approx. five times faster if the string only contains ASCII
    characters, but slightly slower than using unidecode directly if non-ASCII
    chars are present.
    """

    _warn_if_not_unicode(string)
    try:
        bytestring = string.encode('ASCII')
    except UnicodeEncodeError:
        return _unidecode(string)
    if version_info[0] >= 3:
        return string
    else:
        return bytestring

def unidecode_expect_nonascii(string):
    """Transliterate an Unicode object into an ASCII string

    >>> unidecode(u"\u5317\u4EB0")
    "Bei Jing "
    """

    _warn_if_not_unicode(string)
    return _unidecode(string)

unidecode = unidecode_expect_ascii

def _unidecode(string):
    retval = []

    for char in string:
        codepoint = ord(char)

        if codepoint < 0x80: # Basic ASCII
            retval.append(str(char))
            continue
        
        if codepoint > 0xeffff:
            continue # Characters in Private Use Area and above are ignored

        if 0xd800 <= codepoint <= 0xdfff:
            warnings.warn(  "Surrogate character %r will be ignored. "
                            "You might be using a narrow Python build." % (char,),
                            RuntimeWarning, 2)

        section = codepoint >> 8   # Chop off the last two hex digits
        position = codepoint % 256 # Last two hex digits

        try:
            table = Cache[section]
        except KeyError:
            try:
                mod = __import__('unidecode.x%03x'%(section), globals(), locals(), ['data'])
            except ImportError:
                Cache[section] = None
                continue   # No match: ignore this character and carry on.

            Cache[section] = table = mod.data

        if table and len(table) > position:
            retval.append( table[position] )

    return ''.join(retval)
Welcome to our SickBeard-TVRage Edition ... This version of SickBeard uses both TVDB and TVRage to search and gather it's series data from allowing you to now have access to and download shows that you couldn't before because of being locked into only what TheTVDB had to offer. Also this edition is based off the code we used in our XEM editon so it does come with scene numbering support as well as all the other features our XEM edition has to offer. Please before using this with your existing database (sickbeard.db) please make a backup copy of it and delete any other database files such as cache.db and failed.db if present, we HIGHLY recommend starting out with no database files at all to make this a fresh start but the choice is at your own risk! Enjoy! 2014-03-10 05:18:05 +00:00			`# -- coding: utf-8 --`
Update unidecode library 0.04.11 to 0.04.18 (fd57cbf). 2016-01-12 02:01:42 +00:00			`# vi:tabstop=4:expandtab:sw=4`
Welcome to our SickBeard-TVRage Edition ... This version of SickBeard uses both TVDB and TVRage to search and gather it's series data from allowing you to now have access to and download shows that you couldn't before because of being locked into only what TheTVDB had to offer. Also this edition is based off the code we used in our XEM editon so it does come with scene numbering support as well as all the other features our XEM edition has to offer. Please before using this with your existing database (sickbeard.db) please make a backup copy of it and delete any other database files such as cache.db and failed.db if present, we HIGHLY recommend starting out with no database files at all to make this a fresh start but the choice is at your own risk! Enjoy! 2014-03-10 05:18:05 +00:00			`"""Transliterate Unicode text into plain 7-bit ASCII.`

			`Example usage:`
Update unidecode library 0.04.21 (e99b0e3) → 1.0.22 (81f938d). 2018-03-27 00:09:07 +00:00			`>>> from unidecode import unidecode`
Welcome to our SickBeard-TVRage Edition ... This version of SickBeard uses both TVDB and TVRage to search and gather it's series data from allowing you to now have access to and download shows that you couldn't before because of being locked into only what TheTVDB had to offer. Also this edition is based off the code we used in our XEM editon so it does come with scene numbering support as well as all the other features our XEM edition has to offer. Please before using this with your existing database (sickbeard.db) please make a backup copy of it and delete any other database files such as cache.db and failed.db if present, we HIGHLY recommend starting out with no database files at all to make this a fresh start but the choice is at your own risk! Enjoy! 2014-03-10 05:18:05 +00:00			`>>> unidecode(u"\u5317\u4EB0")`
			`"Bei Jing "`

			`The transliteration uses a straightforward map, and doesn't have alternatives`
			`for the same character based on language, position, or anything else.`

			`In Python 3, a standard string object will be returned. If you need bytes, use:`
			`>>> unidecode("Κνωσός").encode("ascii")`
			`b'Knosos'`
			`"""`
			`import warnings`
			`from sys import version_info`

			`Cache = {}`

Update unidecode library 0.04.11 to 0.04.18 (fd57cbf). 2016-01-12 02:01:42 +00:00
			`def _warn_if_not_unicode(string):`
			`if version_info[0] < 3 and not isinstance(string, unicode):`
			`warnings.warn( "Argument %r is not an unicode object. "`
			`"Passing an encoded string will likely have "`
			`"unexpected results." % (type(string),),`
			`RuntimeWarning, 2)`


			`def unidecode_expect_ascii(string):`
Welcome to our SickBeard-TVRage Edition ... This version of SickBeard uses both TVDB and TVRage to search and gather it's series data from allowing you to now have access to and download shows that you couldn't before because of being locked into only what TheTVDB had to offer. Also this edition is based off the code we used in our XEM editon so it does come with scene numbering support as well as all the other features our XEM edition has to offer. Please before using this with your existing database (sickbeard.db) please make a backup copy of it and delete any other database files such as cache.db and failed.db if present, we HIGHLY recommend starting out with no database files at all to make this a fresh start but the choice is at your own risk! Enjoy! 2014-03-10 05:18:05 +00:00			`"""Transliterate an Unicode object into an ASCII string`

			`>>> unidecode(u"\u5317\u4EB0")`
			`"Bei Jing "`
Update unidecode library 0.04.11 to 0.04.18 (fd57cbf). 2016-01-12 02:01:42 +00:00
			`This function first tries to convert the string using ASCII codec.`
			`If it fails (because of non-ASCII characters), it falls back to`
			`transliteration using the character tables.`

			`This is approx. five times faster if the string only contains ASCII`
			`characters, but slightly slower than using unidecode directly if non-ASCII`
			`chars are present.`
Welcome to our SickBeard-TVRage Edition ... This version of SickBeard uses both TVDB and TVRage to search and gather it's series data from allowing you to now have access to and download shows that you couldn't before because of being locked into only what TheTVDB had to offer. Also this edition is based off the code we used in our XEM editon so it does come with scene numbering support as well as all the other features our XEM edition has to offer. Please before using this with your existing database (sickbeard.db) please make a backup copy of it and delete any other database files such as cache.db and failed.db if present, we HIGHLY recommend starting out with no database files at all to make this a fresh start but the choice is at your own risk! Enjoy! 2014-03-10 05:18:05 +00:00			`"""`

Update unidecode library 0.04.11 to 0.04.18 (fd57cbf). 2016-01-12 02:01:42 +00:00			`_warn_if_not_unicode(string)`
			`try:`
			`bytestring = string.encode('ASCII')`
			`except UnicodeEncodeError:`
			`return _unidecode(string)`
			`if version_info[0] >= 3:`
			`return string`
			`else:`
			`return bytestring`

			`def unidecode_expect_nonascii(string):`
			`"""Transliterate an Unicode object into an ASCII string`
Welcome to our SickBeard-TVRage Edition ... This version of SickBeard uses both TVDB and TVRage to search and gather it's series data from allowing you to now have access to and download shows that you couldn't before because of being locked into only what TheTVDB had to offer. Also this edition is based off the code we used in our XEM editon so it does come with scene numbering support as well as all the other features our XEM edition has to offer. Please before using this with your existing database (sickbeard.db) please make a backup copy of it and delete any other database files such as cache.db and failed.db if present, we HIGHLY recommend starting out with no database files at all to make this a fresh start but the choice is at your own risk! Enjoy! 2014-03-10 05:18:05 +00:00
Update unidecode library 0.04.11 to 0.04.18 (fd57cbf). 2016-01-12 02:01:42 +00:00			`>>> unidecode(u"\u5317\u4EB0")`
			`"Bei Jing "`
			`"""`

			`_warn_if_not_unicode(string)`
			`return _unidecode(string)`

			`unidecode = unidecode_expect_ascii`

			`def _unidecode(string):`
Welcome to our SickBeard-TVRage Edition ... This version of SickBeard uses both TVDB and TVRage to search and gather it's series data from allowing you to now have access to and download shows that you couldn't before because of being locked into only what TheTVDB had to offer. Also this edition is based off the code we used in our XEM editon so it does come with scene numbering support as well as all the other features our XEM edition has to offer. Please before using this with your existing database (sickbeard.db) please make a backup copy of it and delete any other database files such as cache.db and failed.db if present, we HIGHLY recommend starting out with no database files at all to make this a fresh start but the choice is at your own risk! Enjoy! 2014-03-10 05:18:05 +00:00			`retval = []`

			`for char in string:`
			`codepoint = ord(char)`

			`if codepoint < 0x80: # Basic ASCII`
			`retval.append(str(char))`
			`continue`

			`if codepoint > 0xeffff:`
			`continue # Characters in Private Use Area and above are ignored`

Update unidecode library 0.04.11 to 0.04.18 (fd57cbf). 2016-01-12 02:01:42 +00:00			`if 0xd800 <= codepoint <= 0xdfff:`
			`warnings.warn( "Surrogate character %r will be ignored. "`
			`"You might be using a narrow Python build." % (char,),`
			`RuntimeWarning, 2)`

Welcome to our SickBeard-TVRage Edition ... This version of SickBeard uses both TVDB and TVRage to search and gather it's series data from allowing you to now have access to and download shows that you couldn't before because of being locked into only what TheTVDB had to offer. Also this edition is based off the code we used in our XEM editon so it does come with scene numbering support as well as all the other features our XEM edition has to offer. Please before using this with your existing database (sickbeard.db) please make a backup copy of it and delete any other database files such as cache.db and failed.db if present, we HIGHLY recommend starting out with no database files at all to make this a fresh start but the choice is at your own risk! Enjoy! 2014-03-10 05:18:05 +00:00			`section = codepoint >> 8 # Chop off the last two hex digits`
			`position = codepoint % 256 # Last two hex digits`

			`try:`
			`table = Cache[section]`
			`except KeyError:`
			`try:`
Update unidecode library 0.04.11 to 0.04.18 (fd57cbf). 2016-01-12 02:01:42 +00:00			`mod = __import__('unidecode.x%03x'%(section), globals(), locals(), ['data'])`
Welcome to our SickBeard-TVRage Edition ... This version of SickBeard uses both TVDB and TVRage to search and gather it's series data from allowing you to now have access to and download shows that you couldn't before because of being locked into only what TheTVDB had to offer. Also this edition is based off the code we used in our XEM editon so it does come with scene numbering support as well as all the other features our XEM edition has to offer. Please before using this with your existing database (sickbeard.db) please make a backup copy of it and delete any other database files such as cache.db and failed.db if present, we HIGHLY recommend starting out with no database files at all to make this a fresh start but the choice is at your own risk! Enjoy! 2014-03-10 05:18:05 +00:00			`except ImportError:`
			`Cache[section] = None`
			`continue # No match: ignore this character and carry on.`

			`Cache[section] = table = mod.data`

			`if table and len(table) > position:`
			`retval.append( table[position] )`

			`return ''.join(retval)`