如何在python pandas中对一个列进行base64编码和解码？

Question 1

我读了这个帖子

但我不想对数据框架进行加密，只是将其转换为64进制。我将一个以回车为界的单词列表导入到一个数据框架中。

words = pd.read_table("sampleText.txt",names=['word'], header=None)
words.head()
0   difference
1   where
2   mc
3   is
4   the
words['words_encoded'] = map(lambda x: x.encode('base64','strict'), words['word'])
print (words)
                word                   words_encoded
0         difference  <map object at 0x7fad3e89e410>
1              where  <map object at 0x7fad3e89e410>
2                 mc  <map object at 0x7fad3e89e410>
3                 is  <map object at 0x7fad3e89e410>
4                the  <map object at 0x7fad3e89e410>
...              ...                             ...
999995  distribution  <map object at 0x7fad3e89e410>
999996            in  <map object at 0x7fad3e89e410>
999997      scenario  <map object at 0x7fad3e89e410>
999998          less  <map object at 0x7fad3e89e410>
999999          land  <map object at 0x7fad3e89e410>
[1000000 rows x 2 columns]
我不明白为什么我的编码列指的是地图对象而不是实际数据，所以我试了一下。
b64words = words.word.str.encode('base64')
print(b64words)
gives
0        NaN
1        NaN
2        NaN
3        NaN
4        NaN
999995   NaN
999996   NaN
999997   NaN
999998   NaN
999999   NaN
Name: word, Length: 1000000, dtype: float64
Well,
这让我很困惑，所以我阅读了上面的链接答案并尝试了一下
import base64
def encode(text):
    return base64.b64encode(text)
words['Encoded_Column'] = [encode(x) for x in words]
but got
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-89-8cf5a6f1f3a9> in <module>
      2 def encode(text):
      3     return base64.b64encode(text)
----> 4 words['Encoded_Column'] = [encode(x) for x in words]
<ipython-input-89-8cf5a6f1f3a9> in <listcomp>(.0)
      2 def encode(text):
      3     return base64.b64encode(text)
----> 4 words['Encoded_Column'] = [encode(x) for x in words]
<ipython-input-89-8cf5a6f1f3a9> in encode(text)
      1 import base64
      2 def encode(text):
----> 3     return base64.b64encode(text)
      4 words['Encoded_Column'] = [encode(x) for x in words]
~/miniconda3/envs/p37cu10.2PyTo/lib/python3.7/base64.py in b64encode(s, altchars)
     56     application to e.g. generate url or filesystem safe Base64 strings.
     57     """
---> 58     encoded = binascii.b2a_base64(s, newline=False)
     59     if altchars is not None:
     60         assert len(altchars) == 2, repr(altchars)
TypeError: a bytes-like object is required, not 'str'
所以我试着把它转换为类似字节的对象，就像这样。
import base64
def encode(text):
    btext = text.str.encode('utf-8')
    return base64.b64encode(btext)
words['Encoded_Column'] = [encode(x) for x in words]
but got
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-90-46db6d3688ba> in <module>
      3     btext = text.str.encode('utf-8')
      4     return base64.b64encode(btext)
----> 5 words['Encoded_Column'] = [encode(x) for x in words]
<ipython-input-90-46db6d3688ba> in <listcomp>(.0)
      3     btext = text.str.encode('utf-8')
      4     return base64.b64encode(btext)
----> 5 words['Encoded_Column'] = [encode(x) for x in words]
<ipython-input-90-46db6d3688ba> in encode(text)
      1 import base64
      2 def encode(text):
----> 3     btext = text.str.encode('utf-8')
      4     return base64.b64encode(btext)
      5 words['Encoded_Column'] = [encode(x) for x in words]
AttributeError: 'str' object has no attribute 'str'
in this在C语言的例子中，他们也是先转换为字节字符串，然后再转换为base64，但我无法在Python中完成这个简单的任务。我正在掉进这个兔子洞，每次尝试都让我越陷越深。我真的很感谢任何头脑清晰的人能够提供的帮助。

Question 2


          
           
            
             
              
               
                
                 
                  
                   
                    map
                   
                   返回一个迭代器，而不是一个列表，所以
                   
                    pandas
                   
                   只是把它分配给新形成的 "words_encoded "列中的所有槽。类似地，如果你做了
                   
                    words['all_ones'] = 1
                   
                   ，
                   
                    pandas
                   
                   会将1分配到该列下。
                  
                  
                   其次，"base64 "并不是字符串的编解码器，它在
                   
                    bytes
                   
                   上工作。你必须选择一个文本编码，然后对其进行编码。所以。
                  
                  words['word_encoded'] = words.word.str.encode(
    'utf-8', 'strict').str.encode('base64')
除了这个编码器会在base64字符串的末尾加上一个"/n"，我觉得这很奇怪。相反，你可以做下面的一个
words['word_encoded'] = words.word.str.encode(
    'utf-8', 'strict').apply(
         base64.b64encode)
words['word_encoded'] = [base64.b64encode(x.encode('utf-8', 'strict'))
    for x in words.word]
我个人认为第一种方法更 "熊猫 "一些，因为它直接生成了系列，没有中间的列表。
行动中的解决方案
>>> import base64
>>> import pandas as pd
>>> words = pd.read_table("sampleText.txt",names=['word'], header=None)
__main__:1: FutureWarning: read_table is deprecated, use read_csv instead, passing sep='\t'.
>>> words['word_encoded'] = words.word.str.encode(
...     'utf-8', 'strict').str.encode('base64')
>>> words
         word           word_encoded
0  difference  b'ZGlmZmVyZW5jZQ==\n'
1       where          b'd2hlcmU=\n'
2          mc              b'bWM=\n'
3          is              b'aXM=\n'
4         the              b'dGhl\n'
>>> words['word_encoded'] = words.word.str.encode(
...     'utf-8', 'strict').apply(
...          base64.b64encode)
>>> words
         word         word_encoded
0  difference  b'ZGlmZmVyZW5jZQ=='
1       where          b'd2hlcmU='
2          mc              b'bWM='
3          is              b'aXM='
4         the              b'dGhl'
>>> words['word_encoded'] = [base64.b64encode(x.encode('utf-8', 'strict'))
...     for x in words.word]
>>> words
         word         word_encoded
0  difference  b'ZGlmZmVyZW5jZQ=='
1       where          b'd2hlcmU='
2          mc              b'bWM='
3          is              b'aXM='
4         the              b'dGhl'

Question 3


          
           
            
             
              
               
                
                 
                  
                   
                    
                     
                      Simply delete .str from function body.
True code:
                     
                     import base64
def encode(text):
    btext = text.encode('utf-8')
    return base64.b64encode(btext)
words = {'1': 1, '2': 2, '3': 3, 'asdasd': 4}
words['Encoded_Column'] = [encode(x) for x in words]
print(words)