如何在python pandas中对一个列进行base64编码和解码?

0 人关注

我读了这个帖子

对数据框架进行编码

但我不想对数据框架进行加密,只是将其转换为64进制。我将一个以回车为界的单词列表导入到一个数据框架中。

words = pd.read_table("sampleText.txt",names=['word'], header=None)
words.head()
0   difference
1   where
2   mc
3   is
4   the
words['words_encoded'] = map(lambda x: x.encode('base64','strict'), words['word'])
print (words)
                word                   words_encoded
0         difference  <map object at 0x7fad3e89e410>
1              where  <map object at 0x7fad3e89e410>
2                 mc  <map object at 0x7fad3e89e410>
3                 is  <map object at 0x7fad3e89e410>
4                the  <map object at 0x7fad3e89e410>
...              ...                             ...
999995  distribution  <map object at 0x7fad3e89e410>
999996            in  <map object at 0x7fad3e89e410>
999997      scenario  <map object at 0x7fad3e89e410>
999998          less  <map object at 0x7fad3e89e410>
999999          land  <map object at 0x7fad3e89e410>
[1000000 rows x 2 columns]

我不明白为什么我的编码列指的是地图对象而不是实际数据,所以我试了一下。

b64words = words.word.str.encode('base64')
print(b64words)

gives

0        NaN
1        NaN
2        NaN
3        NaN
4        NaN
999995   NaN
999996   NaN
999997   NaN
999998   NaN
999999   NaN
Name: word, Length: 1000000, dtype: float64

Well,

这让我很困惑,所以我阅读了上面的链接答案并尝试了一下

import base64
def encode(text):
    return base64.b64encode(text)
words['Encoded_Column'] = [encode(x) for x in words]

but got

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-89-8cf5a6f1f3a9> in <module>
      2 def encode(text):
      3     return base64.b64encode(text)
----> 4 words['Encoded_Column'] = [encode(x) for x in words]
<ipython-input-89-8cf5a6f1f3a9> in <listcomp>(.0)
      2 def encode(text):
      3     return base64.b64encode(text)
----> 4 words['Encoded_Column'] = [encode(x) for x in words]
<ipython-input-89-8cf5a6f1f3a9> in encode(text)
      1 import base64
      2 def encode(text):
----> 3     return base64.b64encode(text)
      4 words['Encoded_Column'] = [encode(x) for x in words]
~/miniconda3/envs/p37cu10.2PyTo/lib/python3.7/base64.py in b64encode(s, altchars)
     56     application to e.g. generate url or filesystem safe Base64 strings.
     57     """
---> 58     encoded = binascii.b2a_base64(s, newline=False)
     59     if altchars is not None:
     60         assert len(altchars) == 2, repr(altchars)
TypeError: a bytes-like object is required, not 'str'

所以我试着把它转换为类似字节的对象,就像这样。

import base64
def encode(text):
    btext = text.str.encode('utf-8')
    return base64.b64encode(btext)
words['Encoded_Column'] = [encode(x) for x in words]

but got

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-90-46db6d3688ba> in <module>
      3     btext = text.str.encode('utf-8')
      4     return base64.b64encode(btext)
----> 5 words['Encoded_Column'] = [encode(x) for x in words]
<ipython-input-90-46db6d3688ba> in <listcomp>(.0)
      3     btext = text.str.encode('utf-8')
      4     return base64.b64encode(btext)
----> 5 words['Encoded_Column'] = [encode(x) for x in words]
<ipython-input-90-46db6d3688ba> in encode(text)
      1 import base64
      2 def encode(text):
----> 3     btext = text.str.encode('utf-8')
      4     return base64.b64encode(btext)
      5 words['Encoded_Column'] = [encode(x) for x in words]
AttributeError: 'str' object has no attribute 'str'

in this在C语言的例子中,他们也是先转换为字节字符串,然后再转换为base64,但我无法在Python中完成这个简单的任务。我正在掉进这个兔子洞,每次尝试都让我越陷越深。我真的很感谢任何头脑清晰的人能够提供的帮助。

python
pandas
base64
aquagremlin
aquagremlin
发布于 2020-02-17
2 个回答
tdelaney
tdelaney
发布于 2020-02-17
已采纳
0 人赞同

map 返回一个迭代器,而不是一个列表,所以 pandas 只是把它分配给新形成的 "words_encoded "列中的所有槽。类似地,如果你做了 words['all_ones'] = 1 pandas 会将1分配到该列下。

其次,"base64 "并不是字符串的编解码器,它在 bytes 上工作。你必须选择一个文本编码,然后对其进行编码。所以。

words['word_encoded'] = words.word.str.encode(
    'utf-8', 'strict').str.encode('base64')

除了这个编码器会在base64字符串的末尾加上一个"/n",我觉得这很奇怪。相反,你可以做下面的一个

words['word_encoded'] = words.word.str.encode(
    'utf-8', 'strict').apply(
         base64.b64encode)
words['word_encoded'] = [base64.b64encode(x.encode('utf-8', 'strict'))
    for x in words.word]

我个人认为第一种方法更 "熊猫 "一些,因为它直接生成了系列,没有中间的列表。

行动中的解决方案

>>> import base64
>>> import pandas as pd
>>> words = pd.read_table("sampleText.txt",names=['word'], header=None)
__main__:1: FutureWarning: read_table is deprecated, use read_csv instead, passing sep='\t'.
>>> words['word_encoded'] = words.word.str.encode(
...     'utf-8', 'strict').str.encode('base64')
>>> words
         word           word_encoded
0  difference  b'ZGlmZmVyZW5jZQ==\n'
1       where          b'd2hlcmU=\n'
2          mc              b'bWM=\n'
3          is              b'aXM=\n'
4         the              b'dGhl\n'
>>> words['word_encoded'] = words.word.str.encode(
...     'utf-8', 'strict').apply(
...          base64.b64encode)
>>> words
         word         word_encoded
0  difference  b'ZGlmZmVyZW5jZQ=='
1       where          b'd2hlcmU='
2          mc              b'bWM='
3          is              b'aXM='
4         the              b'dGhl'
>>> words['word_encoded'] = [base64.b64encode(x.encode('utf-8', 'strict'))
...     for x in words.word]
>>> words
         word         word_encoded
0  difference  b'ZGlmZmVyZW5jZQ=='
1       where          b'd2hlcmU='
2          mc              b'bWM='
3          is              b'aXM='
4         the              b'dGhl'
    
谢谢你,但没有结果。如果你使用我上面定义的数据框架,那么你的第一个解决方案(用'.apply')给出了'TypeError: a bytes-like object is required, not 'float'' 和下一个解决方案给出了'AttributeError: 'float' object has no attribute 'encode''
@Aquagremlin - 对我来说是有效的(见更新的答案)。我不知道浮点是怎么来的。我使用的是 python 3.7.3 和 pandas 0.24.2。
我在我的笔记本中仍然得到这个错误(Python 3.76, pandas 0.23.4)。
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-178-0f7040aa4d4e> in <module> ----> 1 words['word_encoded'] = words.word.str.encode('utf-8', 'strict').str.encode('base64')
` ~/miniconda3/envs/p37cu10.2PyTo/lib/python3.7/site-packages/pandas/core/strings.py in wrapper(self, *args, **kwargs) 1949 f"inferred dtype '{self._inferred_dtype}'." 1950 ) -> 1951 raise TypeError(msg) 1952 return func(self, *args, **kwargs) 1953 TypeError: Cannot use .str.encode with values of inferred dtype 'bytes'.`
Ae_Mc
Ae_Mc
发布于 2020-02-17
0 人赞同

Simply delete .str from function body. True code:

import base64
def encode(text):
    btext = text.encode('utf-8')
    return base64.b64encode(btext)
words = {'1': 1, '2': 2, '3': 3, 'asdasd': 4}
words['Encoded_Column'] = [encode(x) for x in words]
print(words)