pyspark.sql.functions.array_repeat#

pyspark.sql.functions.array_repeat(col, count)[source]#

Array function: creates an array containing a column repeated count times.

New in version 2.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters

colColumn or str: The name of the column or an expression that represents the element to be repeated.
countColumn or str or int: The name of the column, an expression, or an integer that represents the number of times to repeat the element.

Returns

Column: A new column that contains an array of repeated elements.

Examples

Example 1: Usage with string

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([('ab',)], ['data'])
>>> df.select(sf.array_repeat(df.data, 3)).show()
+---------------------+
|array_repeat(data, 3)|
+---------------------+
|         [ab, ab, ab]|
+---------------------+

Example 2: Usage with integer

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(3,)], ['data'])
>>> df.select(sf.array_repeat(df.data, 2)).show()
+---------------------+
|array_repeat(data, 2)|
+---------------------+
|               [3, 3]|
+---------------------+

Example 3: Usage with array

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(['apple', 'banana'],)], ['data'])
>>> df.select(sf.array_repeat(df.data, 2)).show(truncate=False)
+----------------------------------+
|array_repeat(data, 2)             |
+----------------------------------+
|[[apple, banana], [apple, banana]]|
+----------------------------------+

Example 4: Usage with null

>>> from pyspark.sql import functions as sf
>>> from pyspark.sql.types import IntegerType, StructType, StructField
>>> schema = StructType([
...   StructField("data", IntegerType(), True)
... ])
>>> df = spark.createDataFrame([(None, )], schema=schema)
>>> df.select(sf.array_repeat(df.data, 3)).show()
+---------------------+
|array_repeat(data, 3)|
+---------------------+
|   [NULL, NULL, NULL]|
+---------------------+