6.1.2 Entropy#
Prompts
What is von Neumann entropy? How does it generalize Shannon entropy to quantum states?
Why is entropy zero for pure states but positive for mixed states? What does this tell you about uncertainty?
How does the maximum entropy principle connect to the thermal state and the Boltzmann distribution?
Why is the partition function \(Z = \operatorname{Tr}(\mathrm{e}^{-\beta \hat{H}})\) the key bridge between quantum mechanics and thermodynamics?
Why is the Helmholtz free energy \(F = -k_B T \ln Z\) the natural thermodynamic potential built from the partition function? How does \(F = \langle E \rangle - TS\) balance energy against entropy?
Lecture Notes#
Overview#
Von Neumann entropy \(S(\hat{\rho}) = -\operatorname{Tr}(\hat{\rho} \ln \hat{\rho})\) quantifies how much we don’t know about a quantum state. It is zero for pure states (perfect knowledge) and positive for mixed states (ignorance). The maximum entropy principle shows that thermal states—those that maximize entropy subject to a constraint on average energy—are precisely the Boltzmann distributions. This simple principle bridges quantum mechanics to thermodynamics without assuming temperature a priori.
Von Neumann Entropy#
Von Neumann Entropy
For a density matrix \(\hat{\rho}\) on a \(d\)-dimensional Hilbert space, the von Neumann entropy is:
Equivalently, using spectral decomposition \(\hat{\rho} = \sum_i \lambda_i \vert \psi_i\rangle\langle\psi_i\vert\) (eigenvalues \(\lambda_i \geq 0\)):
with the convention \(0 \ln 0 = 0\).
Bounds and limits:
The minimum \(S = 0\) occurs for pure states (a single eigenvalue equals 1). The maximum \(S = \ln d\) occurs for the maximally mixed state \(\hat{\rho} = \hat{I}/d\) (all eigenvalues equal \(1/d\)).
Example: Pure and Mixed States
Pure state: \(\hat{\rho} = \vert \psi\rangle\langle\psi\vert\) has one eigenvalue \(\lambda_1 = 1\). Thus:
Maximally mixed qubit: \(\hat{\rho} = \hat{I}/2\) (2-dimensional) has eigenvalues \(\lambda_1 = \lambda_2 = 1/2\). Thus:
Thermal qubit at finite temperature: \(\hat{\rho} = \frac{\mathrm{e}^{-\beta E_1} \vert 1\rangle\langle 1\vert + \mathrm{e}^{-\beta E_2} \vert 2\rangle\langle 2\vert}{Z}\) where \(Z = \mathrm{e}^{-\beta E_1} + \mathrm{e}^{-\beta E_2}\). The entropy ranges between 0 (zero temperature, ground state only) and \(\ln 2\) (infinite temperature, maximally mixed).
Discussion: entropy via heat capacity
A thermometer measures entropy by heat capacity. At high temperatures, a system has many accessible energy levels and entropy is large. At low temperatures, only the ground state is occupied and entropy vanishes. Why is entropy the right quantity to measure this ignorance?
Poll: Entropy and mixedness
The von Neumann entropy \(S(\hat{\rho}) = -\operatorname{Tr}(\hat{\rho} \ln \hat{\rho})\) measures the lack of information. Which statement is correct?
(A) \(S = 0\) if and only if \(\hat{\rho}\) is a pure state; \(S\) is maximal for the maximally mixed state.
(B) \(S\) is always positive and increases with the number of eigenstates.
(C) \(S = 1\) for any two-level system, regardless of whether it is pure or mixed.
(D) \(S(\hat{\rho})\) depends on the basis in which \(\hat{\rho}\) is measured.
Properties of Entropy#
Properties of Von Neumann Entropy
Non-negativity: \(S(\hat{\rho}) \geq 0\), with equality iff \(\hat{\rho}\) is a pure state. — zero entropy means perfect knowledge of the state.
Upper bound: \(S(\hat{\rho}) \leq \ln d\) for \(d\)-dimensional systems, with equality iff \(\hat{\rho} = \hat{I}/d\). — complete ignorance corresponds to the maximally mixed state \(\hat{I}/d\).
Unitary invariance: \(S(\hat{U}\hat{\rho} \hat{U}^\dagger) = S(\hat{\rho})\) for any unitary \(\hat{U}\). Entropy is unchanged by reversible quantum operations — unitaries are information-preserving.
Concavity: For \(p \in [0,1]\),
Mixing states increases entropy — ignorance about which mixture we’re in adds to the internal entropy of each component.
Connection to Information Theory#
Von Neumann entropy is the quantum generalization of Shannon entropy. For a classical probability distribution \(P = \{p_i\}\):
If we measure \(\hat{\rho}\) in basis \(\{\vert \psi_i\rangle\}\) with outcomes \(i\) having probability \(p_i = \langle\psi_i\vert \hat{\rho}\vert \psi_i\rangle\), we obtain the Shannon entropy \(H(P)\) of the measurement outcome distribution. But von Neumann entropy \(S(\hat{\rho})\) is basis-independent and captures the intrinsic quantum uncertainty.
Interpretation: \(S(\hat{\rho})\) measures how much we don’t know about the state \(\hat{\rho}\).
\(S = 0\): We know the state exactly (pure state).
\(S > 0\): We have incomplete information; measuring may yield different outcomes.
Maximum Entropy Principle#
Among all density matrices with a fixed average energy \(\langle E \rangle = \operatorname{Tr}(\hat{\rho} \hat{H})\), which one maximizes entropy?
Derivation: Maximum Entropy and Thermal State
We maximize \(S(\hat{\rho}) = -\operatorname{Tr}(\hat{\rho} \ln \hat{\rho})\) subject to:
\(\operatorname{Tr}(\hat{\rho}) = 1\) (normalization)
\(\operatorname{Tr}(\hat{\rho} \hat{H}) = E\) (fixed average energy)
Using Lagrange multipliers, we form the functional:
where \(\alpha\) and \(\beta\) are Lagrange multipliers to be determined by the constraints.
The extremum condition is:
Varying \(S(\hat{\rho}) = -\operatorname{Tr}(\hat{\rho} \ln \hat{\rho})\) with respect to \(\hat{\rho}\) yields:
Thus the extremum condition becomes:
Rearranging: \(\ln \hat{\rho} = -(1 + \alpha) \hat{I} - \beta \hat{H}\).
Redefine \(\alpha' \equiv 1 + \alpha\) to absorb the constant term. Then:
The factor \(\mathrm{e}^{-\alpha'}\) is a normalization constant. Applying the constraint \(\operatorname{Tr}(\hat{\rho}) = 1\):
Define the partition function \(Z(\beta) = \operatorname{Tr}(\mathrm{e}^{-\beta \hat{H}})\). Then \(\mathrm{e}^{-\alpha'} = 1/Z\), so:
where \(\beta = 1/(k_B T)\) is the inverse temperature.
The Lagrange multiplier \(\beta\) is determined by the constraint \(\langle E \rangle = \operatorname{Tr}(\hat{\rho}_{\text{th}} \hat{H})\): it adjusts to match the prescribed average energy.
Thermal State and Boltzmann Distribution
The state maximizing entropy at fixed \(\langle E \rangle\) is the thermal state:
where \(Z = \operatorname{Tr}(\mathrm{e}^{-\beta \hat{H}})\) is the partition function and \(\beta = 1/(k_B T)\).
In the energy eigenbasis, diagonal elements are Boltzmann weights:
where \(E_n\) are energy eigenvalues.
Key insight: The Boltzmann distribution is not an assumption—it follows from the principle of maximum entropy. When we know only the average energy, the state of maximum ignorance (maximum entropy) is thermal.
Discussion: why nature maximizes entropy
Why does nature prefer maximum entropy states? If a system is isolated and in thermal equilibrium, why should it be the state that maximizes entropy? Is this a fundamental principle or a consequence of statistical mechanics?
Partition Function and Thermodynamics#
The partition function \(Z(\beta) = \operatorname{Tr}(\mathrm{e}^{-\beta \hat{H}})\) counts the effective number of thermally accessible energy levels — it is the bridge between microscopic quantum mechanics and macroscopic thermodynamics.
Average Energy:
Free Energy (Helmholtz):
where \(T = 1/(k_B \beta)\).
Entropy in terms of \(Z\):
From \(\hat{\rho}_{\text{th}} = \mathrm{e}^{-\beta \hat{H}}/Z\):
Equivalently, \(S = -(\partial F/\partial T)\vert_V\) (standard thermodynamic relation).
Thermodynamic Identities:
The first law connects all three:
(ignoring volume dependence for simplicity). From \(F = \langle E \rangle - TS\):
which is consistent with equation (216).
Summary#
Von Neumann entropy: \(S(\hat{\rho}) = -\operatorname{Tr}(\hat{\rho} \ln \hat{\rho}) = -\sum_i \lambda_i \ln \lambda_i\) over the eigenvalues \(\lambda_i\) of \(\hat{\rho}\) quantifies how much is unknown about a quantum state.
Bounds: \(0 \leq S(\hat{\rho}) \leq \ln d\) — the minimum \(S = 0\) is attained only by pure states, the maximum \(S = \ln d\) only by the maximally mixed state \(\hat{I}/d\).
Structural properties: entropy is non-negative, invariant under unitary evolution (reversible dynamics preserve information), and concave — mixing states can only increase entropy.
Link to information theory: \(S(\hat{\rho})\) is the quantum generalization of the Shannon entropy \(-\sum_i p_i \ln p_i\); measuring \(\hat{\rho}\) in a fixed basis returns the Shannon entropy of the outcome distribution, while \(S(\hat{\rho})\) itself is basis-independent and captures the intrinsic quantum uncertainty.
Maximum entropy principle: maximizing \(S\) at fixed average energy \(\langle E \rangle\) singles out the thermal state \(\hat{\rho}_{\text{th}} = \mathrm{e}^{-\beta \hat{H}}/Z\); the Boltzmann weights \(P_n = \mathrm{e}^{-\beta E_n}/Z\) are derived, not assumed, with \(\beta = 1/(k_B T)\) the Lagrange multiplier fixing the energy.
Partition function bridges to thermodynamics: \(Z = \operatorname{Tr}(\mathrm{e}^{-\beta \hat{H}})\) generates the macroscopic quantities — average energy \(\langle E \rangle = -\mathrm{d}\ln Z/\mathrm{d}\beta\), free energy \(F = -k_B T \ln Z\), and entropy \(S = k_B[\beta \langle E \rangle + \ln Z]\) — connecting microscopic quantum mechanics to thermodynamics.
See Also
6.1.1 Mixed States: Density matrices, convex combinations, and the state space on which entropy is defined.
6.1.3 Quantum Statistics: Bose-Einstein and Fermi-Dirac occupation numbers, obtained by applying the thermal state and partition function to single bosonic and fermionic modes.
6.2.2 Entanglement Entropy: Entropy of subsystems—distinct but related use of partial trace and \(\operatorname{Tr}\,\hat{\rho}\ln\hat{\rho}\).
Homework#
1. Entropy and measurement outcomes. A density matrix \(\hat{\rho}\) is measured in an orthonormal basis \(\{\vert e_i\rangle\}\), producing outcome \(i\) with probability \(p_i = \langle e_i\vert \hat{\rho}\vert e_i\rangle\).
(a) Show that \(\{p_i\}\) is a valid probability distribution: \(p_i \geq 0\) and \(\sum_i p_i = 1\).
(b) For the maximally mixed qubit \(\hat{\rho} = \hat{I}/2\), compute the Shannon entropy \(H = -\sum_i p_i \ln p_i\) of the outcome distribution and verify that it equals the von Neumann entropy \(S(\hat{\rho})\) for any choice of measurement basis.
(c) For the pure state \(\hat{\rho} = \vert 0\rangle\langle 0\vert\), compute \(H\) when the measurement uses (i) the basis \(\{\vert 0\rangle, \vert 1\rangle\}\) and (ii) the basis \(\{\vert +\rangle, \vert -\rangle\}\). Explain why \(H\) depends on the basis while the von Neumann entropy \(S(\hat{\rho}) = 0\) does not.
2. Entropy of a diagonal mixture. A qubit is in state \(\hat{\rho} = p\vert 0\rangle\langle 0\vert + (1-p)\vert 1\rangle\langle 1\vert\) (diagonal mixture). Compute entropy \(S(\hat{\rho})\) as a function of \(p \in [0,1]\). Show that \(S\) is maximized when \(p = 1/2\). What is the physical interpretation?
3. Entropy concavity. Prove that entropy is concave: for any \(\hat{\rho}_1, \hat{\rho}_2\) and \(p \in [0,1]\),
4. Unitary invariance. Show that von Neumann entropy is invariant under unitary transformations: \(S(\hat{U}\hat{\rho} \hat{U}^\dagger) = S(\hat{\rho})\) for any unitary \(\hat{U}\). Why does this make physical sense?
5. Thermal state entropy. For a thermal state \(\hat{\rho} = \mathrm{e}^{-\beta \hat{H}}/Z\) with Hamiltonian \(\hat{H}\) having energy levels \(E_n\) with multiplicity \(g_n\) (degeneracy), write the partition function as \(Z = \sum_n g_n \mathrm{e}^{-\beta E_n}\). Express \(\langle E \rangle\) and \(S\) in terms of \(Z(\beta)\).
6. Free energy relations. Starting from \(F = -k_B T \ln Z\), show that \(S = -(\partial F/\partial T)\vert _V\) and \(\langle E \rangle = F + TS\). Verify that these satisfy the first law of thermodynamics.
7. Partition function and entropy. A system has partition function \(Z(\beta) = 1 + 2\mathrm{e}^{-\beta}\). Compute the free energy \(F(\beta)\), average energy \(\langle E \rangle(\beta)\), and entropy \(S(\beta)\). At what temperature do \(\langle E \rangle\) and \(S\) reach half their maximum values?
8. Oscillator partition function. For a harmonic oscillator with Hamiltonian \(\hat{H} = \hbar \omega (\hat{a}^\dagger \hat{a} + 1/2)\) (ground state energy \(\hbar\omega/2\)), compute the partition function \(Z(\beta)\) and show that the average occupation number is \(\langle n \rangle = 1/(\mathrm{e}^{\beta\hbar\omega} - 1)\) (Bose-Einstein distribution).
9. Two-level entropy. A two-state system has \(E_{1} = 0\) and \(E_{2} = \Delta\), both non-degenerate.
(a) Compute the canonical-ensemble entropy \(S(\beta) = -p_{1}\ln p_{1} - p_{2}\ln p_{2}\) as a function of \(\beta\Delta\) using \(p_{1,2} = \mathrm{e}^{-\beta E_{1,2}}/Z\).
(b) Show that \(S(\beta)\) is monotonically decreasing on \(\beta\in[0,\infty)\): at \(\beta = 0\) both populations are equal and \(S = \ln 2\) (maximally mixed), while as \(\beta\to\infty\) the system collapses into the ground state and \(S\to 0\).
(c) The maximum entropy \(\ln 2\) therefore occurs at \(\beta = 0\) (infinite temperature), not at any interior \(\beta > 0\). Explain physically why the only way to make a finite-level thermal system more mixed is to flatten the Boltzmann weights — i.e., raise \(T\).