Fixing “PostgreSQL won’t start after reboot” on VCF Operations Fleet Management 9.0.1

When upgrading the VCF Operations Fleet Management appliance to 9.0.1, you might hit a reboot surprise: vPostgres fails to start, and the fleet-management certificate is regenerated. You’ll see journal errors like:

pg_ctl: could not open PID file "/var/vmware/vpostgres/current/pgdata/postmaster.pid": Permission denied
systemd[1]: vpostgres.service: Control process exited, code=exited, status=1/FAILURE

Broadcom’s KB confirms the symptoms and provides the manual steps to fix permissions on the Postgres data dir and to normalize the regenerated cert/key filenames, followed by service restarts.

(manual steps)

Fix pgdata permissions

chmod 700 /var/vmware/vpostgres/current/pgdata/

Normalize the regenerated certificate & key in /opt/vmware/vlcm/cert by renaming the timestamped server.crt.* / server.key.* back to server.crt / server.key, then restart services (nginx, vrlcm-server.service) and verify status/logs.

One-shot automated remediation (idempotent)

Run this as root on the Fleet Management appliance (e.g., via SSH). It:

Fixes the Postgres data directory permissions
Backs up any existing certs
Picks the newest timestamped server.crt.* and server.key.* and makes them active
Ensures sane file permissions
Restarts nginx and vrlcm-server.service
Shows health checks + log tail hints

#!/usr/bin/env bash
# vcf-ops-fleet-901-postgres-cert-fix.sh
# Automates KB 412351 remediation on VCF Operations Fleet Management 9.0.1
# Source: https://knowledge.broadcom.com/external/article/412351

set -euo pipefail

LOG="/var/log/vrlcm/vmware_vrlcm.log"
PGDATA="/var/vmware/vpostgres/current/pgdata"
CERT_DIR="/opt/vmware/vlcm/cert"
BACKUP_DIR="${CERT_DIR}/backup-$(date +%Y%m%d-%H%M%S)"

echo "[*] Ensuring Postgres data dir permissions (700) …"
if [ -d "$PGDATA" ]; then
  chmod 700 "$PGDATA"
  echo "[+] chmod 700 ${PGDATA} done"
else
  echo "[!] ${PGDATA} not found; continuing…"
fi

echo "[*] Working in ${CERT_DIR} …"
cd "$CERT_DIR" || { echo "[!] ${CERT_DIR} not found"; exit 1; }

# Back up current active certs (if any)
mkdir -p "$BACKUP_DIR"
for f in server.crt server.key; do
  if [ -f "$f" ]; then
    cp -a "$f" "${BACKUP_DIR}/"
    echo "[+] Backed up $f to ${BACKUP_DIR}/"
  fi
done

# Find newest timestamped variants
NEWEST_CRT="$(ls -1t server.crt.* 2>/dev/null | head -n1 || true)"
NEWEST_KEY="$(ls -1t server.key.* 2>/dev/null | head -n1 || true)"

if [ -z "$NEWEST_CRT" ] || [ -z "$NEWEST_KEY" ]; then
  echo "[!] Could not find timestamped server.crt.* and/or server.key.* files."
  echo "    If the KB’s certificate regeneration occurred, these should exist. Aborting."
  exit 1
fi

echo "[*] Activating newest certificate and key:"
echo "    CRT: ${NEWEST_CRT}"
echo "    KEY: ${NEWEST_KEY}"

# Move into place atomically
cp -a "$NEWEST_CRT" server.crt
cp -a "$NEWEST_KEY" server.key

# Tighten permissions
chmod 600 server.key || true
chmod 644 server.crt || true

# Optionally set ownership if required by your distro:
# chown root:root server.crt server.key

echo "[*] Restarting services (nginx, vrlcm-server.service) …"
systemctl restart nginx
systemctl restart vrlcm-server.service

echo "[*] Waiting 10s and checking status …"
sleep 10
systemctl --no-pager --full status vrlcm-server.service | sed -n '1,50p' || true

echo
echo "[i] Tail the Lifecycle Manager log for readiness:"
echo "    tail -f ${LOG}"
echo
echo "[✓] Remediation attempted. If issues persist, also verify vpostgres status:"
echo "    systemctl status vpostgres.service"

Save this as vcf-ops-fleet-901-postgres-cert-fix.sh, then:

sudo bash vcf-ops-fleet-901-postgres-cert-fix.sh

Why not rename with mv? Using cp -a keeps the timestamped copies intact (for quick rollback) while still placing active server.crt / server.key in the expected paths.

Verification checklist

systemctl status vrlcm-server.service shows active (running).
UI/API flows that depend on Lifecycle Manager respond normally.
Optional: confirm vpostgres.service is active (running) and that the PID file error is gone.
tail -f /var/log/vrlcm/vmware_vrlcm.log shows normal startup without repeated cert/permission complaints.

FAQ

Q: Does this script change ownership of certs?

A: The KB doesn’t require ownership changes—only renaming the timestamped files and restarting services. The script tightens file perms (600 for key, 644 for cert) which is generally safe. Adjust ownership in environments with stricter policies.

Q: Can I safely re-run the script?

A: Yes. It’s idempotent: it always selects the newest timestamped cert/key, backs up existing actives, and restarts services.

Q: Why fix Postgres permissions?

A: The KB explicitly calls for chmod 700 on the Postgres pgdata directory to resolve the PID file “Permission denied” during startup.

References

Broadcom KB 412351 — “PostgreSQL Service fails to start after rebooting Fleet Management appliance post VCF Operations 9.0.1 upgrade” (symptoms, manual steps, service restarts).