pyspark 如果是 2.4.x 版本以及 python 环境是 3.8 时,会报 TypeError: an integer is required (got type bytes)
错误,那如何 fix 该 error 呢。
错误信息
错误信息可能如下:
Traceback (most recent call last):
File "/xxx/xxx/xxx.py", line 2, in <module>
from pyspark.sql import SparkSession
File "/xxx/xxx/lib/python3.8/site-packages/pyspark/__init__.py", line 51, in <module>
from pyspark.context import SparkContext
File "/xxx/xxx/lib/python3.8/site-packages/pyspark/context.py", line 31, in <module>
from pyspark import accumulators
File "/xxx/xxx/lib/python3.8/site-packages/pyspark/accumulators.py", line 97, in <module>
from pyspark.serializers import read_int, PickleSerializer
File "/xxx/xxx/lib/python3.8/site-packages/pyspark/serializers.py", line 72, in <module>
from pyspark import cloudpickle
File "/xxx/xxx/lib/python3.8/site-packages/pyspark/cloudpickle.py", line 145, in <module>
_cell_set_template_code = _make_cell_set_template_code()
File "/xxx/xxx/lib/python3.8/site-packages/pyspark/cloudpickle.py", line 126, in _make_cell_set_template_code
return types.CodeType(
TypeError: an integer is required (got type bytes)
原因及解决方法
打印如上错误异常是因为 spark 2.4.x 还不支持 python 3.8 版本,需要将执行代码的 python 环境降级到 3.7 版本或以下即可解决。