Menu
Genetics
The Origin Stories of DNA 4 min read

DNA: The Double Helix — Structure, Base Pairing, and Replication

Inside every cell sits two metres of DNA coiled into a space six micrometres across. How the double helix works, why complementary base pairing matters, and how one molecule copies itself with extraordinary fidelity — with Mukherjee on the moment the mechanism of heredity became visible.

TL;DR

Four bases (A, T, G, C), Chargaff rules, Watson-Crick double helix, semi-conservative replication via helicase and DNA polymerase. Error rate ~1 in 10^9. Mukherjee perspective on the 1953 discovery.

Inside every one of your 37 trillion cells sits a molecule two metres long, coiled so tightly it fits inside a space six micrometres across. That molecule — deoxyribonucleic acid — is the master archive of everything the cell needs to build, run, and replicate itself. Understanding DNA begins with understanding its physical shape, because the shape is the mechanism.

The four-letter alphabet

DNA is built from four chemical units called nucleotides, each consisting of a sugar (deoxyribose), a phosphate group, and one of four nitrogen-containing bases: adenine (A), thymine (T), guanine (G), and cytosine (C). The bases are the letters of the genetic code. Their sequence along the DNA strand is the information.

The Four DNA Bases — Complementary Base Pairing A always pairs with T (2 hydrogen bonds) · G always pairs with C (3 hydrogen bonds) A Adenine Purine 2 H-bonds with T H·H T Thymine Pyrimidine 2 H-bonds with A G Guanine Purine 3 H-bonds with C H·H·H C Cytosine Pyrimidine 3 H-bonds with G Chargaff's Rules In any DNA molecule: %A = %T %G = %C This symmetry told Watson & Crick the two strands were complementary — one determines the other Human genome: ~3.2 billion base pairs per haploid cell ~20,000 protein-coding genes encoding ~100,000 proteins
Complementary base pairing: A-T (2 hydrogen bonds) and G-C (3 hydrogen bonds). Erwin Chargaff noticed the ratio symmetry in 1950; Watson and Crick used it to deduce the double-helix structure in 1953.

The double helix

James Watson and Francis Crick's 1953 paper in Nature — building on X-ray crystallography data produced by Rosalind Franklin and Maurice Wilkins — described DNA's three-dimensional structure: two strands wound around each other in a right-handed spiral, with the sugar-phosphate backbones on the outside and the bases pointing inward, held together by the hydrogen bonds between complementary pairs.

"Watson and Crick's double helix wasn't just some pretty shape. It was a mechanism. The moment you saw it, you understood how information could be copied."

— Siddhartha Mukherjee, from lectures on The Gene

The genius of the structure is its self-explaining elegance: each strand is the template for rebuilding the other. When a cell divides, the helix unzips, and each single strand serves as the pattern from which a new complementary strand is synthesised. The result: two identical double helices from one.

The DNA Double Helix — Structure and Replication Logic 5' end 3' end 3' end 5' end Sugar- phosphate backbone G-C pair (3 bonds) A-T pair (2 bonds) G-C pair A-T pair DNA Replication Logic Semi-conservative: each daughter cell gets one original strand + one new strand Original Helicase unzips → DNA polymerase builds Daughter 1 Daughter 2 Original strand Newly synthesised Error rate: ~1 in 10⁹ bases (proofreading by polymerase)
The double helix: two antiparallel strands wound around a central axis. During replication the strands separate and each serves as a template — producing two identical copies, each with one original and one new strand (semi-conservative replication).

From molecule to information

The sequence of bases along a DNA strand is analogous to a text written in a four-letter alphabet. The order of letters encodes instructions: which amino acids to assemble into proteins, when to activate or silence a gene, and how to regulate the cell's entire metabolic programme. Three consecutive bases (a codon) specify one amino acid; the full set of three-letter words translates into the proteins that build and operate every living cell.

The human genome contains approximately 3.2 billion base pairs — enough text, if printed, to fill several thousand books. Of that sequence, only about 1.5% encodes proteins directly. The rest was once dismissed as "junk DNA" but is now understood to include regulatory regions, structural elements, and sequences whose function is still being mapped. As Mukherjee observed, the discovery of the gene's physical structure was only the beginning of understanding what genes actually do.

Why the structure matters

Watson and Crick's double helix was not merely a molecular discovery — it was the answer to the most fundamental biological question: how is hereditary information stored and copied with enough fidelity to transmit across generations, yet with enough variation to allow evolution? The answer is encoded in the molecule's shape. The complementary base-pairing rule ensures accurate copying; the four-base alphabet allows virtually unlimited informational complexity; and the antiparallel strand orientation provides the directionality that the replication machinery requires.

"Here was Watson, here's this is where Watson stood up and said let's do this. It was a visual tour, as it were, of history — the moment the mechanism of heredity became visible."

— Siddhartha Mukherjee
More in The Origin Stories of DNA
🌐 ID
Health Q&A
Hi! Ask me anything about species-appropriate nutrition, metabolic health, or ancestral eating.
Not medical advice. Consult a healthcare provider.