pyspark.sql.functions.array_repeat#
- pyspark.sql.functions.array_repeat(col, count)[source]#
Array function: creates an array containing a column repeated count times.
New in version 2.4.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- Returns
Column
A new column that contains an array of repeated elements.
Examples
Example 1: Usage with string
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([('ab',)], ['data']) >>> df.select(sf.array_repeat(df.data, 3)).show() +---------------------+ |array_repeat(data, 3)| +---------------------+ | [ab, ab, ab]| +---------------------+
Example 2: Usage with integer
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(3,)], ['data']) >>> df.select(sf.array_repeat(df.data, 2)).show() +---------------------+ |array_repeat(data, 2)| +---------------------+ | [3, 3]| +---------------------+
Example 3: Usage with array
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(['apple', 'banana'],)], ['data']) >>> df.select(sf.array_repeat(df.data, 2)).show(truncate=False) +----------------------------------+ |array_repeat(data, 2) | +----------------------------------+ |[[apple, banana], [apple, banana]]| +----------------------------------+
Example 4: Usage with null
>>> from pyspark.sql import functions as sf >>> from pyspark.sql.types import IntegerType, StructType, StructField >>> schema = StructType([ ... StructField("data", IntegerType(), True) ... ]) >>> df = spark.createDataFrame([(None, )], schema=schema) >>> df.select(sf.array_repeat(df.data, 3)).show() +---------------------+ |array_repeat(data, 3)| +---------------------+ | [NULL, NULL, NULL]| +---------------------+