Typed dict throws KeyError when keys contain any UTF-8 character ends with \xb8\x80 #9542

Open
@M0gician

Description

Reporting a bug

Numba typed dict with key type of UnicodeCharSeq() of any length doesn't handle UTF-8 characters end with \xb8\x80 correctly. It seems that any of these characters is casted into empty string when __getitem__ is called, resulting KeyError

Minimum Reproduction Demo

import numba
a = numba.typed.typeddict.Dict.empty(numba.types.UnicodeCharSeq(1), numba.int64)
a['一'] = 10    # \xe4\xb8\x80
print(a)
  • this demo also works for other UTF-8 characters like 㸀 ( \xe3\xb8\x80 ) 渀 ( \xe6\xb8\x80 ) 縀 ( \xe7\xb8\x80 ) 帀 ( \xe5\xb8\x80 )
  • Error Message

    Traceback (most recent call last):
      File "
    
    
    
    
        
    <stdin>", line 1, in <module>
      File "...\.vscode\extensions\ms-python.python-2024.4.1\python_files\pythonrc.py", line 22, in my_displayhook
        self.original_displayhook(value)
      File "...\mamba\lib\site-packages\numba\typed\typeddict.py", line 217, in __repr__
        body = str(self)
      File "...\mamba\lib\site-packages\numba\typed\typeddict.py", line 212, in __str__
        for k, v in self.items():
      File "...\mamba\lib\_collections_abc.py", line 911, in __iter__
        yield (key, self._mapping[key])
      File "...\mamba\lib\site-packages\numba\typed\typeddict.py", line 180, in __getitem__
        return _getitem(self, key)
      File "...\mamba\lib\site-packages\numba\typed\dictobject.py", line 783, in impl
        raise KeyError()
    KeyError

    numba 0.59.1

  • [x ] I have tried using the latest released version of Numba (most recent is
    visible in the release notes
    ( https://numba.readthedocs.io/en/stable/release-notes-overview.html ).
  • [x ] I have included a self contained code sample to reproduce the problem.
    i.e. it's possible to run as 'python bug.py'.
  •