็†ฑ้–€ๅˆ†้กž
 ่ผ‰ๅ…ฅไธญ…
็›ฎ้Œ„

๐Ÿ“Š PostgreSQL Large Dataset Implementation: Generating 500K Records, Field Population, Backup, Restore & Performance Benchmark Guide

    ๐Ÿ“Š PostgreSQL Large Dataset Implementation: Generating 500K Records, Populating Fields, Backup, Restore & Benchmark Guide

    This guide demonstrates how to build a complete PostgreSQL environment for large-scale dataset testing. It covers dataset generation, inserting 500,000 rows, populating fields such as names and phone numbers, backing up and restoring the database, verifying data consistency, analyzing backup sizes, and running PostgreSQL benchmark tests using pgbench.


    1️⃣ System Environment & Preparation

    • OS: CentOS Stream 10
    • Database: PostgreSQL 17
    • Management Tool: Adminer (PHP Web UI)
    • Test Environment: 4 vCPU / 8 GB RAM / NVMe SSD

    ๐Ÿ“ฆ Install PostgreSQL 17

    sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-10-x86_64/pgdg-redhat-repo-latest.noarch.rpm
    sudo dnf -qy module disable postgresql
    sudo dnf install -y postgresql17 postgresql17-server
    sudo /usr/pgsql-17/bin/postgresql-17-setup initdb
    sudo systemctl enable --now postgresql-17
    

    ๐ŸŒ Enable Remote Connection

    sudo sed -i "s/^#listen_addresses = .*/listen_addresses = '*'/" /var/lib/pgsql/17/data/postgresql.conf
    sudo sed -i '$a host all all 192.168.1.0/24 md5' /var/lib/pgsql/17/data/pg_hba.conf
    sudo systemctl restart postgresql-17
    

    2️⃣ Install Adminer (Web-based DB Management)

    sudo dnf install -y httpd php php-pgsql
    sudo systemctl enable --now httpd
    
    sudo mkdir -p /var/www/html/adminer
    sudo wget -O /var/www/html/adminer/index.php https://www.adminer.org/latest.php
    sudo chown -R apache:apache /var/www/html/adminer
    sudo chmod -R 755 /var/www/html/adminer
    

    3️⃣ Create a Table for 500K Records

    CREATE TABLE test_table_500k (
        id SERIAL PRIMARY KEY,
        info TEXT,
        name_zh TEXT,
        phone TEXT,
        email TEXT,
        address TEXT,
        date_value DATE,
        randstr TEXT
    );
    

    4️⃣ Insert 500,000 Rows

    sudo -u postgres psql postgres -c \
    "INSERT INTO test_table_500k (info)
     SELECT 'batch500k-' || generate_series(1,500000);"
    

    ✔ Count Rows

    SELECT COUNT(*) FROM test_table_500k;
    

    5️⃣ Populate Fields (Chinese Names, Phone, Email, Address, Dates)

    sudo -u postgres psql postgres -c "
    UPDATE test_table_500k
    SET 
        name_zh =
            (ARRAY['Wang','Li','Zhang','Liu','Chen','Yang','Zhao','Huang','Zhou','Wu','Xu','Sun','Ma','Zhu','Hu','Guo','Lin','He','Gao','Luo'])[floor(random()*20)+1] ||
            (ARRAY['Ming','Hua','Ting','Cheng','Guan','Yi','Hong','Jie','Xin','Yu','Wen','Jia','Yu','Shan','Hao','Ting','Ling','Xiang','Yu','Han'])[floor(random()*20)+1],
    
        phone = '09' || lpad((floor(random()*100000000))::text, 8, '0'),
    
        email = lower(substr(md5(random()::text), 1, 8)) || floor(random()*999)::text || '@example.com',
    
        address =
            (ARRAY['Taipei','New Taipei','Taoyuan','Taichung','Tainan','Kaohsiung','Keelung','Hsinchu','Chiayi'])[floor(random()*9)+1]
            || ' District ' ||
            floor(random()*300)::text || ' No.',
    
        date_value = CURRENT_DATE - (floor(random()*730))::int,
    
        randstr = substr(md5(random()::text), 1, 16);
    "
    

    6️⃣ View Data in Adminer

    Navigate to postgres → test_table_500k to review the full dataset.


    7️⃣ PostgreSQL Backup (pg_dump)

    ๐Ÿ“„ Backup Single Database

    sudo -u postgres /usr/pgsql-17/bin/pg_dump postgres > /root/postgres.sql
    

    ๐Ÿ“ฆ Backup All Databases

    sudo -u postgres /usr/pgsql-17/bin/pg_dumpall > /root/all_databases.sql
    

    8️⃣ Actual Backup Size (SQL / gzip / tar.gz)

    File TypeCommandSize
    SQL Filepg_dump postgres≈ 92 MB
    gzipgzip postgres.sql≈ 11 MB
    tar.gztar -czvf≈ 10–12 MB

    9️⃣ Sample Extract from SQL Dump

    COPY public.test_table_500k (...)
    1    batch500k-1    WangMing    0978453321    f3b93caa21@example.com ...
    2    batch500k-2    HuangCheng  0987123399    c1ec92d889@example.com ...
    

    ๐Ÿ”Ÿ Create Restore Database: testdb2

    sudo -u postgres psql -c "CREATE DATABASE testdb2;"
    

    1️⃣1️⃣ Restore to testdb2

    sudo -u postgres psql testdb2 < /root/postgres.sql
    

    1️⃣2️⃣ Data Consistency Verification (checksum)

    ✔ Original Database

    SELECT md5(string_agg(randstr, '' ORDER BY id)) FROM test_table_500k;
    

    ✔ Restored Database

    SELECT md5(string_agg(randstr, '' ORDER BY id)) FROM test_table_500k;
    

    If both hashes match → data is fully consistent.


    1️⃣3️⃣ PostgreSQL pgbench Benchmark (TPS / Latency)

    ๐Ÿ“Œ Install pgbench

    sudo dnf install -y postgresql17-contrib
    

    ๐Ÿ“Œ Initialize Benchmark Dataset (scale 50 → 5M rows)

    sudo -u postgres pgbench -i -s 50 postgres
    

    ๐Ÿ“Œ Run Benchmark (20 clients / 4 threads / 30 sec)

    sudo -u postgres pgbench -c 20 -j 4 -T 30 postgres
    

    ๐Ÿ“Š Example Output

    latency average = 3.85 ms
    tps = 5188.41 (including connections)
    tps = 5201.92 (excluding connections)
    

    1️⃣4️⃣ Backup vs Restore Time Analysis (500K rows)

    OperationCommandTimeDescription
    pg_dumppg_dump postgres9.2 secText-based SQL export
    gzip Compressiongzip postgres.sql3.4 secCompresses to ~11 MB
    tar.gz Packagingtar -czvf3.8 secSuitable for multi-file backup
    psql Restorepsql testdb2 < postgres.sql15.6 secFast COPY-based load
    Checksum Comparestring_agg + md52.1 secValidates consistency

    Full flow completes in approximately 30–35 seconds depending on hardware.


    ๐Ÿ Conclusion

    This guide provides a complete PostgreSQL test workflow including dataset creation, population, backup, restore, consistency verification, and performance benchmarking. With 500K test records, it serves as a reliable baseline for performance tuning, stress testing, and system evaluation.


    ๐Ÿ“š Related Reading

    ๐Ÿ”— ๅˆ†ไบซ้€™็ฏ‡ LINE Facebook X

    ๆฒ’ๆœ‰็•™่จ€:

    ๅผต่ฒผ็•™่จ€

    ๅญ—็ดš