以下の記事にコメントがついており、週平均歩数はどうやって求めるのか?ということだったのでやってみた。
検証環境
$ docker run --rm -it --user 0 apache/spark-py bash
root@7c8a9e54ac50:/opt/spark/bin# ./pyspark
Python 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/07/14 14:31:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 3.4.0
/_/
Using Python version 3.10.6 (main, Mar 10 2023 10:55:28)
Spark context Web UI available at http://7c8a9e54ac50:4040
Spark context available as 'sc' (master = local[*], app id = local-1720967492989).
SparkSession available as 'spark'.
検証結果
期待通りの結果になりました。
+-------------------+-------+-----+
| datetime|week_no|value|
+-------------------+-------+-----+
|2020-01-01 00:00:00| 1| 5179|
|2020-01-02 00:00:00| 1| 8387|
|2020-01-03 00:00:00| 1|18740|
|2020-01-04 00:00:00| 1| 7037|
|2020-01-05 00:00:00| 1| 5392|
+-------------------+-------+-----+
-> 平均:8947
+-------+--------+
|week_no|avg_week|
+-------+--------+
| 1| 8947.0|
...