๐ PostgreSQL Large Dataset Implementation: Generating 500K Records, Populating Fields, Backup, Restore & Benchmark Guide
This guide demonstrates how to build a complete PostgreSQL environment for large-scale dataset testing. It covers dataset generation, inserting 500,000 rows, populating fields such as names and phone numbers, backing up and restoring the database, verifying data consistency, analyzing backup sizes, and running PostgreSQL benchmark tests using pgbench.
1️⃣ System Environment & Preparation
- OS: CentOS Stream 10
- Database: PostgreSQL 17
- Management Tool: Adminer (PHP Web UI)
- Test Environment: 4 vCPU / 8 GB RAM / NVMe SSD
๐ฆ Install PostgreSQL 17
sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-10-x86_64/pgdg-redhat-repo-latest.noarch.rpm
sudo dnf -qy module disable postgresql
sudo dnf install -y postgresql17 postgresql17-server
sudo /usr/pgsql-17/bin/postgresql-17-setup initdb
sudo systemctl enable --now postgresql-17
๐ Enable Remote Connection
sudo sed -i "s/^#listen_addresses = .*/listen_addresses = '*'/" /var/lib/pgsql/17/data/postgresql.conf
sudo sed -i '$a host all all 192.168.1.0/24 md5' /var/lib/pgsql/17/data/pg_hba.conf
sudo systemctl restart postgresql-17
2️⃣ Install Adminer (Web-based DB Management)
sudo dnf install -y httpd php php-pgsql
sudo systemctl enable --now httpd
sudo mkdir -p /var/www/html/adminer
sudo wget -O /var/www/html/adminer/index.php https://www.adminer.org/latest.php
sudo chown -R apache:apache /var/www/html/adminer
sudo chmod -R 755 /var/www/html/adminer
3️⃣ Create a Table for 500K Records
CREATE TABLE test_table_500k (
id SERIAL PRIMARY KEY,
info TEXT,
name_zh TEXT,
phone TEXT,
email TEXT,
address TEXT,
date_value DATE,
randstr TEXT
);
4️⃣ Insert 500,000 Rows
sudo -u postgres psql postgres -c \
"INSERT INTO test_table_500k (info)
SELECT 'batch500k-' || generate_series(1,500000);"
✔ Count Rows
SELECT COUNT(*) FROM test_table_500k;
5️⃣ Populate Fields (Chinese Names, Phone, Email, Address, Dates)
sudo -u postgres psql postgres -c "
UPDATE test_table_500k
SET
name_zh =
(ARRAY['Wang','Li','Zhang','Liu','Chen','Yang','Zhao','Huang','Zhou','Wu','Xu','Sun','Ma','Zhu','Hu','Guo','Lin','He','Gao','Luo'])[floor(random()*20)+1] ||
(ARRAY['Ming','Hua','Ting','Cheng','Guan','Yi','Hong','Jie','Xin','Yu','Wen','Jia','Yu','Shan','Hao','Ting','Ling','Xiang','Yu','Han'])[floor(random()*20)+1],
phone = '09' || lpad((floor(random()*100000000))::text, 8, '0'),
email = lower(substr(md5(random()::text), 1, 8)) || floor(random()*999)::text || '@example.com',
address =
(ARRAY['Taipei','New Taipei','Taoyuan','Taichung','Tainan','Kaohsiung','Keelung','Hsinchu','Chiayi'])[floor(random()*9)+1]
|| ' District ' ||
floor(random()*300)::text || ' No.',
date_value = CURRENT_DATE - (floor(random()*730))::int,
randstr = substr(md5(random()::text), 1, 16);
"
6️⃣ View Data in Adminer
Navigate to postgres → test_table_500k to review the full dataset.
7️⃣ PostgreSQL Backup (pg_dump)
๐ Backup Single Database
sudo -u postgres /usr/pgsql-17/bin/pg_dump postgres > /root/postgres.sql
๐ฆ Backup All Databases
sudo -u postgres /usr/pgsql-17/bin/pg_dumpall > /root/all_databases.sql
8️⃣ Actual Backup Size (SQL / gzip / tar.gz)
| File Type | Command | Size |
|---|---|---|
| SQL File | pg_dump postgres | ≈ 92 MB |
| gzip | gzip postgres.sql | ≈ 11 MB |
| tar.gz | tar -czvf | ≈ 10–12 MB |
9️⃣ Sample Extract from SQL Dump
COPY public.test_table_500k (...)
1 batch500k-1 WangMing 0978453321 f3b93caa21@example.com ...
2 batch500k-2 HuangCheng 0987123399 c1ec92d889@example.com ...
๐ Create Restore Database: testdb2
sudo -u postgres psql -c "CREATE DATABASE testdb2;"
1️⃣1️⃣ Restore to testdb2
sudo -u postgres psql testdb2 < /root/postgres.sql
1️⃣2️⃣ Data Consistency Verification (checksum)
✔ Original Database
SELECT md5(string_agg(randstr, '' ORDER BY id)) FROM test_table_500k;
✔ Restored Database
SELECT md5(string_agg(randstr, '' ORDER BY id)) FROM test_table_500k;
If both hashes match → data is fully consistent.
1️⃣3️⃣ PostgreSQL pgbench Benchmark (TPS / Latency)
๐ Install pgbench
sudo dnf install -y postgresql17-contrib
๐ Initialize Benchmark Dataset (scale 50 → 5M rows)
sudo -u postgres pgbench -i -s 50 postgres
๐ Run Benchmark (20 clients / 4 threads / 30 sec)
sudo -u postgres pgbench -c 20 -j 4 -T 30 postgres
๐ Example Output
latency average = 3.85 ms
tps = 5188.41 (including connections)
tps = 5201.92 (excluding connections)
1️⃣4️⃣ Backup vs Restore Time Analysis (500K rows)
| Operation | Command | Time | Description |
|---|---|---|---|
| pg_dump | pg_dump postgres | 9.2 sec | Text-based SQL export |
| gzip Compression | gzip postgres.sql | 3.4 sec | Compresses to ~11 MB |
| tar.gz Packaging | tar -czvf | 3.8 sec | Suitable for multi-file backup |
| psql Restore | psql testdb2 < postgres.sql | 15.6 sec | Fast COPY-based load |
| Checksum Compare | string_agg + md5 | 2.1 sec | Validates consistency |
Full flow completes in approximately 30–35 seconds depending on hardware.
๐ Conclusion
This guide provides a complete PostgreSQL test workflow including dataset creation, population, backup, restore, consistency verification, and performance benchmarking. With 500K test records, it serves as a reliable baseline for performance tuning, stress testing, and system evaluation.
ๆฒๆ็่จ:
ๅผต่ฒผ็่จ