PostgreSQL GROUP BY and HAVING
Aggregate functions like COUNT and SUM produce one result for the entire table. GROUP BY changes this by dividing rows into groups and applying the aggregate function to each group separately. The HAVING clause then filters those groups — the same way WHERE filters individual rows.
Understanding GROUP BY
Imagine a table with 100 employees spread across 5 departments. Without GROUP BY, COUNT(*) returns the total of all 100 employees. With GROUP BY on the department column, COUNT returns the employee count for each department separately — five rows, one per department.
Basic Syntax
SELECT column, AGGREGATE_FUNCTION(other_column)
FROM table_name
GROUP BY column;Every column in the SELECT list that is not inside an aggregate function must appear in the GROUP BY clause. This is a firm rule in PostgreSQL.
GROUP BY with COUNT
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department;This splits employees into groups by department and counts the rows in each group. The result will have one row per unique department value.
GROUP BY with SUM and AVG
SELECT department,
SUM(salary) AS total_salary,
ROUND(AVG(salary), 2) AS average_salary
FROM employees
GROUP BY department;For each department, this calculates the total salary spend and the average salary.
GROUP BY with Multiple Columns
Grouping can be done on more than one column. Each unique combination of grouped columns forms its own group:
SELECT department, job_title, COUNT(*) AS headcount
FROM employees
GROUP BY department, job_title
ORDER BY department, job_title;This produces one row for each unique department-and-job_title pair.
GROUP BY with WHERE
WHERE is applied before grouping, so it filters individual rows before the groups are formed:
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE status = 'active'
GROUP BY department;Only active employees are considered. Inactive employees are excluded before the groups are created.
The HAVING Clause
HAVING filters groups after aggregation, whereas WHERE filters rows before aggregation. A comparison: WHERE says "only include rows where salary > 50000", while HAVING says "only include groups where the average salary > 50000".
Basic Syntax
SELECT column, AGGREGATE_FUNCTION(other_column)
FROM table_name
GROUP BY column
HAVING AGGREGATE_FUNCTION(other_column) condition;Example: Departments with More Than 2 Employees
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 2;Groups (departments) with 2 or fewer employees are excluded from the result. Only departments with more than 2 employees are shown.
Example: Departments Where Average Salary Exceeds 60,000
SELECT department, ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 60000
ORDER BY avg_salary DESC;WHERE vs HAVING — Side by Side
| Feature | WHERE | HAVING |
|---|---|---|
| Filters | Individual rows | Groups of rows |
| Applied | Before grouping | After grouping |
| Can use aggregates | No | Yes |
| Required with GROUP BY | No | No |
Using Both WHERE and HAVING Together
SELECT department, COUNT(*) AS active_count
FROM employees
WHERE status = 'active'
GROUP BY department
HAVING COUNT(*) >= 3
ORDER BY active_count DESC;Step-by-step execution:
- WHERE filters to include only active employees.
- GROUP BY groups the remaining rows by department.
- COUNT counts employees in each department group.
- HAVING keeps only departments with 3 or more active employees.
- ORDER BY sorts results by count, highest first.
Example: Full GROUP BY and HAVING Workflow
CREATE TABLE sales (
id SERIAL PRIMARY KEY,
salesperson VARCHAR(100),
region VARCHAR(50),
amount NUMERIC(10, 2),
sale_date DATE
);
INSERT INTO sales (salesperson, region, amount, sale_date)
VALUES
('Alice', 'North', 1500.00, '2024-01-10'),
('Bob', 'South', 2200.00, '2024-01-12'),
('Alice', 'North', 1800.00, '2024-01-15'),
('Carol', 'East', 900.00, '2024-01-18'),
('Bob', 'South', 3100.00, '2024-01-20'),
('Carol', 'East', 1200.00, '2024-01-22'),
('Alice', 'North', 2500.00, '2024-01-25'),
('Dan', 'West', 500.00, '2024-01-28');
-- Total sales per salesperson
SELECT salesperson, SUM(amount) AS total_sales
FROM sales
GROUP BY salesperson
ORDER BY total_sales DESC;
-- Regions with more than one sale
SELECT region, COUNT(*) AS number_of_sales
FROM sales
GROUP BY region
HAVING COUNT(*) > 1;
-- Salespersons with total sales above 4000
SELECT salesperson, SUM(amount) AS total
FROM sales
GROUP BY salesperson
HAVING SUM(amount) > 4000;
-- Average sale per region, only for regions with avg sale above 1000
SELECT region, ROUND(AVG(amount), 2) AS avg_sale
FROM sales
GROUP BY region
HAVING AVG(amount) > 1000
ORDER BY avg_sale DESC;Key Points
- GROUP BY divides rows into groups and applies aggregate functions to each group.
- Every non-aggregate column in SELECT must appear in GROUP BY.
- WHERE filters rows before grouping; HAVING filters groups after aggregation.
- Aggregate functions cannot be used in a WHERE clause — use HAVING instead.
- WHERE and HAVING can be used together in the same query.
- The execution order is: WHERE → GROUP BY → HAVING → ORDER BY.
