
For context, see slides for Shazam Clone Week 2: Fourier Transforms and Spectrograms
STFT: Why use a window function? What's spectral leakage?
Key idea: The Fourier transform (DFT, FFT) implicitly assumes periodicity of the input signal, i.e. the signal repeats every samples.
Recall when we split into and . Note that the sine and cosine repeat when increases by samples, so the period is , the frequency in Hz is , and the angular frequency in radians per sample is .
This means that completes cycles in the interval of samples (period is ).
We have the property that:
Therefore the transform itself is considered as being "periodic with period in ", meaning we have . Our repeats every samples, and the DFT/FFT operates under the assumption that also repeats every samples.
Key idea, restated: The DFT/FFT interprets the input signal as a infinite-length periodic signal with period .
The assumption of repetition from the (related) Fourier series process, called the periodic extension, captures this idea:
Based on this definition of repeating (the interval) over and over, we can see that when the end points of the waveform sample are not equal or have different slopes (the input signal is not periodic), we get discontinuities when we repeat the input signal:

This turns an imperfect sample of a periodic wave into something that is interpreted as being much more complicated. The DFT/FFT represents this jump using other frequency bins besides the bin that corresponds to the frequency of interest. Therefore, the energy from the true frequency "spills" into many bins. This idea is spectral leakage; many frequencies are required to represent discontinuities.
You may have seen before Fourier series animations, where an SVG image storing continuous paths can be converted to a sequence of discrete points, and then reconstructed/drawn using many spinning vectors ("epicycles") found using the FFT. The more complex and winding the image, the more epicycles (sinusoids with different frequencies) are needed to reconstruct it. Spectral leakage is the same sort of idea, replacing "winding" with "containing a discontinuity".
Window functions solve this issue: by gradually reducing the amplitude of the signal to zero at the endpoints, there is less of a sharp jump to represent which concentrates the result more closely to the true frequency.

image source: (https://www.ni.com/docs/en-US/bundle/ni-scope/page/spectral-leakage.html)