Skip to content

Instantly share code, notes, and snippets.

@lassebenni
Created February 21, 2026 22:53
Show Gist options
  • Select an option

  • Save lassebenni/c546fd94166922bc39155113abf9afe7 to your computer and use it in GitHub Desktop.

Select an option

Save lassebenni/c546fd94166922bc39155113abf9afe7 to your computer and use it in GitHub Desktop.
Visual for: Parquet columnar storage vs CSV row storage
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Parquet Columnar Storage: Row vs Column Layout</title>
<style>
body {
margin: 0;
padding: 20px;
background: #1a1a2e;
display: flex;
justify-content: center;
align-items: center;
min-height: 100vh;
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
}
svg {
max-width: 700px;
width: 100%;
height: auto;
}
</style>
</head>
<body>
<svg viewBox="0 0 700 420" xmlns="http://www.w3.org/2000/svg">
<!-- Background -->
<rect width="700" height="420" rx="12" fill="#1a1a2e"/>
<!-- Title -->
<text x="350" y="32" text-anchor="middle" fill="#e0e0e0" font-size="18" font-weight="bold" font-family="-apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif">Row Storage (CSV) vs Column Storage (Parquet)</text>
<!-- ===== LEFT SIDE: ROW STORAGE (CSV) ===== -->
<text x="170" y="65" text-anchor="middle" fill="#e94560" font-size="14" font-weight="bold" font-family="-apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif">CSV: Row by Row</text>
<!-- Header row -->
<rect x="30" y="80" width="90" height="28" rx="4" fill="#e94560" opacity="0.6"/>
<rect x="125" y="80" width="60" height="28" rx="4" fill="#f0a500" opacity="0.6"/>
<rect x="190" y="80" width="70" height="28" rx="4" fill="#4ecca3" opacity="0.6"/>
<text x="75" y="99" text-anchor="middle" fill="#fff" font-size="11" font-weight="bold" font-family="monospace">name</text>
<text x="155" y="99" text-anchor="middle" fill="#fff" font-size="11" font-weight="bold" font-family="monospace">age</text>
<text x="225" y="99" text-anchor="middle" fill="#fff" font-size="11" font-weight="bold" font-family="monospace">city</text>
<!-- Row 1 -->
<rect x="30" y="114" width="90" height="28" rx="4" fill="#e94560" opacity="0.2"/>
<rect x="125" y="114" width="60" height="28" rx="4" fill="#f0a500" opacity="0.2"/>
<rect x="190" y="114" width="70" height="28" rx="4" fill="#4ecca3" opacity="0.2"/>
<text x="75" y="133" text-anchor="middle" fill="#e0e0e0" font-size="11" font-family="monospace">Alice</text>
<text x="155" y="133" text-anchor="middle" fill="#e0e0e0" font-size="11" font-family="monospace">25</text>
<text x="225" y="133" text-anchor="middle" fill="#e0e0e0" font-size="11" font-family="monospace">NYC</text>
<!-- Row 2 -->
<rect x="30" y="148" width="90" height="28" rx="4" fill="#e94560" opacity="0.2"/>
<rect x="125" y="148" width="60" height="28" rx="4" fill="#f0a500" opacity="0.2"/>
<rect x="190" y="148" width="70" height="28" rx="4" fill="#4ecca3" opacity="0.2"/>
<text x="75" y="167" text-anchor="middle" fill="#e0e0e0" font-size="11" font-family="monospace">Bob</text>
<text x="155" y="167" text-anchor="middle" fill="#e0e0e0" font-size="11" font-family="monospace">30</text>
<text x="225" y="167" text-anchor="middle" fill="#e0e0e0" font-size="11" font-family="monospace">LA</text>
<!-- Row 3 -->
<rect x="30" y="182" width="90" height="28" rx="4" fill="#e94560" opacity="0.2"/>
<rect x="125" y="182" width="60" height="28" rx="4" fill="#f0a500" opacity="0.2"/>
<rect x="190" y="182" width="70" height="28" rx="4" fill="#4ecca3" opacity="0.2"/>
<text x="75" y="201" text-anchor="middle" fill="#e0e0e0" font-size="11" font-family="monospace">Carol</text>
<text x="155" y="201" text-anchor="middle" fill="#e0e0e0" font-size="11" font-family="monospace">28</text>
<text x="225" y="201" text-anchor="middle" fill="#e0e0e0" font-size="11" font-family="monospace">CHI</text>
<!-- Row bracket: reads ALL data -->
<rect x="28" y="112" width="234" height="100" rx="6" fill="none" stroke="#e94560" stroke-width="2" stroke-dasharray="6,3"/>
<text x="170" y="230" text-anchor="middle" fill="#e94560" font-size="11" font-family="-apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif">← Must read ALL rows + columns</text>
<!-- Query label -->
<rect x="40" y="250" width="240" height="34" rx="6" fill="#16213e" stroke="#8a8a9a" stroke-width="1"/>
<text x="160" y="272" text-anchor="middle" fill="#8a8a9a" font-size="12" font-family="monospace">SELECT AVG(age) FROM data</text>
<!-- Result -->
<text x="160" y="305" text-anchor="middle" fill="#e94560" font-size="13" font-weight="bold" font-family="-apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif">Reads 9 cells to get 3 values</text>
<!-- ===== DIVIDER ===== -->
<line x1="350" y1="55" x2="350" y2="320" stroke="#3a3a5c" stroke-width="1" stroke-dasharray="4,4"/>
<text x="350" y="340" text-anchor="middle" fill="#6c6c80" font-size="12" font-family="-apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif">vs</text>
<!-- ===== RIGHT SIDE: COLUMN STORAGE (PARQUET) ===== -->
<text x="530" y="65" text-anchor="middle" fill="#4ecca3" font-size="14" font-weight="bold" font-family="-apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif">Parquet: Column by Column</text>
<!-- Name column -->
<rect x="390" y="80" width="70" height="130" rx="4" fill="#e94560" opacity="0.15"/>
<text x="425" y="99" text-anchor="middle" fill="#e94560" font-size="11" font-weight="bold" font-family="monospace">name</text>
<text x="425" y="125" text-anchor="middle" fill="#8a8a9a" font-size="11" font-family="monospace">Alice</text>
<text x="425" y="150" text-anchor="middle" fill="#8a8a9a" font-size="11" font-family="monospace">Bob</text>
<text x="425" y="175" text-anchor="middle" fill="#8a8a9a" font-size="11" font-family="monospace">Carol</text>
<!-- Age column (highlighted) -->
<rect x="470" y="80" width="60" height="130" rx="4" fill="#f0a500" opacity="0.3"/>
<rect x="470" y="80" width="60" height="130" rx="4" fill="none" stroke="#4ecca3" stroke-width="2.5"/>
<text x="500" y="99" text-anchor="middle" fill="#f0a500" font-size="11" font-weight="bold" font-family="monospace">age</text>
<text x="500" y="125" text-anchor="middle" fill="#e0e0e0" font-size="11" font-weight="bold" font-family="monospace">25</text>
<text x="500" y="150" text-anchor="middle" fill="#e0e0e0" font-size="11" font-weight="bold" font-family="monospace">30</text>
<text x="500" y="175" text-anchor="middle" fill="#e0e0e0" font-size="11" font-weight="bold" font-family="monospace">28</text>
<!-- City column -->
<rect x="540" y="80" width="70" height="130" rx="4" fill="#4ecca3" opacity="0.15"/>
<text x="575" y="99" text-anchor="middle" fill="#4ecca3" font-size="11" font-weight="bold" font-family="monospace">city</text>
<text x="575" y="125" text-anchor="middle" fill="#8a8a9a" font-size="11" font-family="monospace">NYC</text>
<text x="575" y="150" text-anchor="middle" fill="#8a8a9a" font-size="11" font-family="monospace">LA</text>
<text x="575" y="175" text-anchor="middle" fill="#8a8a9a" font-size="11" font-family="monospace">CHI</text>
<!-- Arrow pointing to age column only -->
<text x="530" y="230" text-anchor="middle" fill="#4ecca3" font-size="11" font-family="-apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif">← Only reads the age column</text>
<!-- Query label -->
<rect x="400" y="250" width="240" height="34" rx="6" fill="#16213e" stroke="#8a8a9a" stroke-width="1"/>
<text x="520" y="272" text-anchor="middle" fill="#8a8a9a" font-size="12" font-family="monospace">SELECT AVG(age) FROM data</text>
<!-- Result -->
<text x="520" y="305" text-anchor="middle" fill="#4ecca3" font-size="13" font-weight="bold" font-family="-apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif">Reads 3 cells to get 3 values</text>
<!-- ===== BOTTOM: Key takeaway ===== -->
<rect x="100" y="360" width="500" height="44" rx="8" fill="#16213e" stroke="#4ecca3" stroke-width="1"/>
<text x="350" y="380" text-anchor="middle" fill="#e0e0e0" font-size="12" font-family="-apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif">With 50 columns and 1M rows, Parquet reads <tspan fill="#4ecca3" font-weight="bold">1 column</tspan>.</text>
<text x="350" y="396" text-anchor="middle" fill="#e0e0e0" font-size="12" font-family="-apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif">CSV reads <tspan fill="#e94560" font-weight="bold">all 50</tspan>. That is a 50x difference in I/O.</text>
</svg>
</body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment