pyspark.sql.functions.array_repeat#

pyspark.sql.functions.array_repeat(col, count)[source]#

Array function: creates an array containing a column repeated count times.

New in version 2.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colColumn or str

The name of the column or an expression that represents the element to be repeated.

countColumn or str or int

The name of the column, an expression, or an integer that represents the number of times to repeat the element.

Returns
Column

A new column that contains an array of repeated elements.

Examples

Example 1: Usage with string

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([('ab',)], ['data'])
>>> df.select(sf.array_repeat(df.data, 3)).show()
+---------------------+
|array_repeat(data, 3)|
+---------------------+
|         [ab, ab, ab]|
+---------------------+

Example 2: Usage with integer

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(3,)], ['data'])
>>> df.select(sf.array_repeat(df.data, 2)).show()
+---------------------+
|array_repeat(data, 2)|
+---------------------+
|               [3, 3]|
+---------------------+

Example 3: Usage with array

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(['apple', 'banana'],)], ['data'])
>>> df.select(sf.array_repeat(df.data, 2)).show(truncate=False)
+----------------------------------+
|array_repeat(data, 2)             |
+----------------------------------+
|[[apple, banana], [apple, banana]]|
+----------------------------------+

Example 4: Usage with null

>>> from pyspark.sql import functions as sf
>>> from pyspark.sql.types import IntegerType, StructType, StructField
>>> schema = StructType([
...   StructField("data", IntegerType(), True)
... ])
>>> df = spark.createDataFrame([(None, )], schema=schema)
>>> df.select(sf.array_repeat(df.data, 3)).show()
+---------------------+
|array_repeat(data, 3)|
+---------------------+
|   [NULL, NULL, NULL]|
+---------------------+