In distributed data processing, changing the number of partitions affects parallelism, shuffle cost, and skew. Spark exposes two APIs—repartition() and coalesce()—that both change partitioning but with different guarantees and costs.
Explain the difference between repartition() and coalesce() in Spark.
Address these points:
Assume the interviewer expects practical engineering depth: mention shuffle boundaries, narrow vs wide dependencies, and typical performance implications. You do not need to write Spark code, but you should be able to reason about execution plans and trade-offs.