24 SQL – III in MySQL

Hiren Joshi

 

Objectives

  • Aggregate Function
  • Group by and having clause
  • Subquery

     Aggregate function:

  • The function which operates on single value and returns a single value is known as scalar function.
  • The function which operates on set of values and returns a single summary value is known as aggregate function. Aggregate function is also known as column functions.
  • A select statement having one or more aggregate function is called as a summary query.
  • The most common aggregate functions are:
  1. Avg
  2. Sum
  3. Min
  4. Max
  5. Count

The syntax of aggregate functions is shown in following table.

 

Function Syntax Description
avg ( [All | Distinct ] expression ) Returns the average of the non-null values in the expression
sum ( [All | Distinct ] expression ) Returns the total of the non-null values in the expression
min ( [All | Distinct ] expression ) Returns the minimum  non-null value in the expression
max ( [All | Distinct ] expression ) Returns the maximum non-null value in the expression
count ( [All | Distinct ] expression ) Returns the number of the non-null values in the expression
count ( * ) Returns the number of rows selected by the query
  • The expression specified for the avg and sum functions must result in a numeric value.
  • The expression specified for the min , max and count function can result in a numeric, string or date value.
  • By default, all values are included in the calculation regardless of whether they are duplicated.
  • To omit duplicate values in the calculation, code distinct keyword. The distinct keyword is generally used with the count function.
  • Count (*) included null values. While all other aggregate functions not include null values in  calculation.
  • For example, following query counts unpaid invoices and calculate the total due.

select count(*) as  “Number of Invoices”,

  sum(itotal – paymenttotal – credittotal) as “Total Due”

 from invoices   where (itotal – paymenttotal – credittotal ) >= 0

  • Count(*) is used to count all the selected rows. Count(<columnName>) can be coded .
  • To count only the rows with unique values in a specified column, code count (distinct <columnName>).

/* count(*) , AVG and sum function */

 

select ‘After 15/04/2011’ as selection_date,

count(*) as “Number of Invoices”,

round(avg(itotal),2) as “Average Invoice Amount”,

sum(itotal) as ‘Total Invoice Amount’

from invoices

where idate > ‘2011-04-15’; /* Min and Max example */

    select ‘After 13/04/2011’ as selection_date,

count(*) as ‘Number of invoices’, max(itotal) as ‘Highest Invoice Total’,

min(itotal) as ‘Lowest Invoice Total’ from invoices

where idate > ‘2011-04-13’;

 

/* Min and Max on non-numeric value */ select min(vname) as ‘First Vendor’, max(vname) as ‘Last Vendor’, count(vname) as ‘Number of Vendors’ from vendors;

 

/* a summary query which uses Distinct keyword */

 

select count(distinct vid) as ‘Number of Vendors’,

count(vid) as ‘number of invoices’,

round(avg(itotal),2) as ‘Avg Income Amt’,

sum(itotal) as ‘Total invoice Amount’

from invoices

where idate > ‘2011-04-01’;

 

The syntax of select statement with group by and having clause

 

Select <column-list separated by ,>

From <tableName>

[where searchCondition]

[group by <group by list]

[having <searchcondition>]

[Order by <order by list>]

[limit  <row limit>]

  • The group by clause is used to group the rows of a result set based on one or more columns or expressions. Comma (,) is used to include two or more columns or expressions.
  • Aggregate function is included in the select statement, then the aggregate is calculated for each group specified by the group by clause.
  •  If two or more columns or expression are included in the group by clause, they form a hierarchy where each column or expression is subordinate to the previous one.
  • The having clause specifies a search condition for a group or an aggregate. MySQL applies this condition after it groups the rows.
  • When a select statement includes a group by clause, the select clause can include the columns used for grouping, aggregate functions and expressions that result in a constant value.

 

/* A summary query that counts the number of invoices by vendor */

 

select vid, count(*) as invoice_qty

from invoices

group by vid;

 

/* A summary query that calculates the average invoice amount by vendor     */

 

select vid, round(avg(itotal),2) as “Average Invoice Amount”

from invoices

group by vid

having avg(itotal) > 100

order by “Average Invoice Amount” desc;

  • By default, group by clause sorts the columns in ascending order. To sort the same in descending order code desc keyword after the column name in the group by clause.

/* A summary query that counts the number of invoices by vendor */

 

select vid, count(*) as invoice_qty

from invoices

group by vid;

 

/* A summary query that calculates the number of invoices and the average invoice amount

 

for the vendors in each state and city */

 

select vstate, vcity, count(*) as invoice_qty,

round(avg(itotal),2) as invoice_avg

from invoices , vendors

where invoices.vid = vendors.vid

group by vstate, vcity;

 

/* A summary query that limits the groups to those

 

with two or more invoices */

 

select vstate, vcity, count(*) as invoice_qty,

round(avg(itotal),2) as invoice_avg

from invoices i, vendors v

where i.vid = v.vid

group by vstate,vcity

having count(*) >= 2;

 

Compare between where and having

  • Where clause is used in select statement that uses grouping and aggregates, MySQL apply search criteria before it groups the rows and calculates the aggregates.
  • Having clause is used in select statement that uses grouping and aggregates, MySQL apply search criteria after it groups the rows and calculates the aggregates.
  • A where clause can refer to any column in the base table.
  • A having clause can only refer to a column included in the select clause.
  • A where clause cannot contain aggregate functions.
  • A having clause can contain aggregate functions.

/* A summary query with search condition in the having clause */

 

select vname,count(*) as invoice_qty,

round(avg(itotal),2) as invoice_avg

from vendors v, invoices i

where v.vid = i.vid

group by vname

having avg(itotal) > 200

order by invoice_qty desc;

    /* A summary query with a search condition in the where clause */ select vname, count(*)as invoice_qty, round(avg(itotal),2) as invoice_avg

 

from vendors v, invoices i

where v.vid = i.vid

and itotal > 200

group by vname

order by invoice_qty desc;

 

/* A summary query with a search condition in the where clause */ select vname, count(*) as invoice_qty, round(avg(itotal),2) as invoice_avg

 

from vendors v,invoices i

where v.vid = i.vid and

itotal > 200

group by vname

order by invoice_qty desc;

 

/* A summary query with a search condition in the having clause */ select vname, count(*) as invoice_qty,

    round(avg(itotal),2) as invoice_avg

from vendors v, invoices i

where v.vid = i.vid

group by vname

having avg(itotal) > 200

order by invoice_qty desc;

  • Compound search condition can be coded in having clause using the AND and OR operators.
  • If a search condition includes an aggregate function, it must be coded in the having clause. Otherwise, it can be coded in either the having clause or the where clause.

   /* summary query with a compound condition in having clause */

 

select idate, count(*) as invoice_qty, sum(itotal) as invoice_sum

from invoices

group by idate

having idate between ‘2011-04-15’ AND ‘2011-05-31’

and count(*) > 1

and sum(itotal)> 200

order by idate desc;

 

/* summary query with a compound condition in having clause using where clause */

select idate, count(*) as invoice_qty, sum(itotal) as invoice_sum

from invoices

where idate between ‘2011-04-15’ AND ‘2011-05-31’

group by idate

having count(*) > 1

and sum(itotal) > 200

order by idate desc;

     Subquery

  • A subquery is a query (select statement) that is coded within another SQL statement.
  • Subquery must enclose in parentheses.
  • A subquery can return a single value, a result set that has a single column or a result set that has multiple columns.
  • A subquery can be coded anywhere a single value, a result set that has a single column or a result set that has multiple columns.
  • Subquery has the same syntax as the standard select statement.
  • Subquery cannot include an order by clause.
  • Subqueries can be nested within other subqueries.
  • Subquery can be coded in select statement in
  • A where clause as a search condition o A having clause as a search condition o The from clause as a table specification
  • The select clause as a column specification

/* Subquey in where clause */

select ino, idate, itotal

from invoices

where itotal >

(select avg(itotal)

from invoices)

order by itotal;

  • Subquery can be used to code queries which work with two or more tables.
  • Most subquery can be coded as joins and most joins can be coded as subqueries.
  • For example

/* A query that uses a join */

 

select ino, idate, itotal

from invoices i, vendors v

where i.vid = v.vid

and vstate = ‘GJ’

order by idate;

   /* The same query that uses a subquery */

 

select ino,idate,itotal

from invoices

where vid in

(select vid

from vendors

where vstate = ‘GJ’)

order by idate;

 

  • Subquery can be written with IN operator to provide the list of values which are tested again the expression.
  • Subquery written using not in can be recoded using outer join.

 

/* Get vendor without invoices */ select vid, vname, vcity

 

from vendors where vid not in (select vid from invoices);

  • A query which uses NOT IN operator with subquery can be recoded using an outer join.

select v.vid, vname, vstate

from vendors v left join invoices i on v.vid = i.vid

where i.vid is null order by v.vid;

 

 

Comparison operators in subquery

  • Comparison operator can be used in where clause to compare an expression with the result of a subquery.
  • The syntax for where clause that uses a comparison operator is: Where expression <comparison operator> [some|any|all] {subquery}

 

select ino, idate, itotal-paymenttotal-credittotal as balance_due from invoices

where itotal-paymenttotal-credittotal >= 0 and itotal-paymenttotal-credittotal <= (select avg(itotal-paymenttotal-credittotal) from invoices

 

where itotal-paymenttotal-credittotal >= 0) order by itotal desc;

  • A search condition without the some, any , all keywords , the subquery must return a single value.
  •  A search condition with the some, any , all keyword , the subquery can return a list of  values.
  • All keyword is used to test that comparison condition is true for all of the values returned by a subquery.
  • If no rows are returned by the subquery, a comparison that uses the All keyword is always true.
  •  If all rows are returned by the subquery contains a null value, a comparison that uses the All keyword is always false.
  • All keyword working

 

Condition

 

Equivalent

expression

Description

 

X > all (1,2)

 

 

X > 2

 

 

X must be greater than all the values returned by the subquery , which

means that it must be greater than the maximum value. If X is greater

than the maximum value returned by subquery, it is evaluated as true.

X < all (1,2)

 

 

 

X < 1

 

 

 

X must be less than all the values returned by the subquery, which means

that it must be less than the minimum value. If X is less than the

minimum value returned by subquery, it is evaluated as true.

 

X = all (1,2)

 

 

 

 

(X = 1)

AND

(X = 2)

 

 

This condition evaluates to TRUE only if the subquery returns a single

value which is equal to X or if the subquery returns multiple values that

are the same and these  values are all equal to X. Otherwise it evaluates

to FALSE.

 

X < > all(1,2)

 

 

X NOT IN

(1,2)

 

This condition evaluates to TRUE only if x is not one of the values

returned by the subquery. Otherwise it evaluates to FALSE.

 

 

For example, following query get invoices smaller than the largest invoices for vendorid 3

 

select vname, ino, itotal

from invoices i, vendors v

where i.vid = v.vid

and itotal < any

(

select itotal

from invoices iv

where vid = 3

)

order by vname;

  • Any keyword is used to test that comparison condition is true for one or more of the values returned by a subquery.
  • If no rows are returned by the subquery or it returns null values, a comparison that uses the Any keyword is always false.
  • The Some keyword works the same as the ANY keyword.
  • Any keyword working

 

Condition

 

Equivalent

expression

Description

 

X > any (1,2)

 

 

 

X > 1

 

 

 

X must be greater than at least one of the values returned by the

subquery list, which means that it must be greater than the minimum

value returned by the subquery.  If X is greater than the minimum value

returned by subquery, it is evaluated as true.

X < any (1,2)

 

 

 

X < 2

 

 

 

X must be less than at least one of the values returned by the subquery

list, which means that it must be less than the maximum value returned

by the subquery. If X is less than the maximum value returned by

subquery, it is evaluated as true.

X = any (1,2)

 

 

(X = 1) OR

(X = 2)

 

This condition can evaluate to TRUE only if X is equal to any of the

values returned by the subquery.  Otherwise it evaluates to FALSE.

 

X < > any(1,2)

 

 

( X < > 1 )

OR

(X < > 2)

This consdition is equivalent to X NOT IN(1, 2)

 

 

 

For example, following query get invoices smaller than the largest invoices for vendorid 7

 

select vname, ino, itotal

from invoices i, vendors v

where i.vid = v.vid

and itotal < any

(

 

select itotal

from invoices iv

where vid = 7

)

order by vname;

 

Correlated Subquery:

  • A correlated subquery is a subquery which is executed once for each row of the main query.
  • A correlated subquery refers to a value that is provided by a column in the main query. For each different value that is returned by main query for that column, the subquery returns a different result.
  • A subquery which is executed only once for the entire query is known an uncorrelated subquery.

   For example, following query get each invoice amount that is higher than the vendor’s average invoice amount.

 

select vid,ino,itotal

from invoices i

where itotal >

(

 

select avg(itotal)

from invoices i

where vid = i.vid

)

order by vid, itotal;

 

Exists Operator:

  • Exists operator is used to test that one or more rows are returned by the subquery.
  • Not exists operator is used to test that no rows are returned by the subquery.
  • When exists or not exists operator is used with a subquery, it does not matter what columns are specified in the select clause. As a result, generally an * (asterisk) is coded within subquery.
  • The syntax of a subquery which is used the exists operator
  • Where [not] exists (subquery)

select vid,vname,vstate

from vendors v where not exists

(

select *

from invoices i where i. vid = v.vid

)

 

Subquery used with other clauses

  • Subquery can be coded in select statement in o A where clause as a search condition o A having clause as a search condition o The from clause as a table specification
  • The select clause as a column specification
  • The use of subquery with where clause is shown by example earlier.
  • When a subquery is used in the select clause, it is generally a correlated subquery.
  • When a subquery is coded in the select clause, the subquery must return a single value.
  • A query which include subquery in its select clause can be coded using a join instead of the subquery. Join is generally faster and easier to read, subqueries are rarely coded in the select clause.

 

For example, to list the most recent invoice date for each vendor , the query is

 

select vname,

(

select max(idate)

from invoices

where vid = vendors.vid ) as latest_inv

from vendors

order by latest_inv desc;

 

The above query can be coded using join as below:

 

select vname, max(idate) as latest_inv

from vendors v left join invoices i

on v.vid = i.vid

group by v.vname

order by latest_inv desc;

  • A subquery which is coded in the from clause returns a result set which is also known as an inline view.
  • When a subquery is coded in the form clause, an alias must be assign to the subquery. The alias can be used as the same way any table name alias.
  • When a subquery is coded in the form clause, an alias should be used for any columns in the subquery that perform calculations. Then, the inline view can use these aliases as the column name of a table.

For example, to get the largest invoice total for the top vendor in each state

 

select vstate, max(sum_of_invoices) as ” max sum of invoices”

from

(

select vstate, vname, sum(itotal) as sum_of_invoices

from vendors v, invoices i

where v.vid = i.vid

group by vstate, vname

) t

group by vstate;

 

 

References:

 

1. Luke Welling, Laura Thomson: PHP and MySQL Web Development, Pearson,

2. W. Jason Gilmore: Beginning PHP and MySQL 5 From Novice to Professional, Apress

3. Elizabeth Naramore, Jason Gerner, Yann Le Scouarnec, Jeremy Stolz, Michael K. Glass:Beginning PHP5, Apache, and MySQL Web Development, Wrox,

4. Robin Nixon: Learning PHP, MySQL, and JavaScript, O’Reilly Media

5. Ed Lecky-Thompson, Heow Eide-Goodman, Steven D. Nowicki, Alec Cove: Professional PHP,Wrox

6. Tim Converse, Joyce Park, Clark Morgan: PHP5 and MySQL Bible

7. Joel Murach, Ray Harris: Murach’s PHP and MySQL, Shroff/Murach

8. Ivan Bayross, Web Enabled Commercial Application Development UsingHTML/Javascript/DHTML/PHP , BPB Publications

9. Joel Murach, “Murach’s MySQL”, Shroff/Murach

10. Julie C. Meloni, Sams Teach Yourself PHP, MySQL and Apache All in One, Sams

11. Larry Ullman, PHP and MySQL for Dynamic Web Sites: Visual QuickPro Guide, Pearson Education

12. http://www.php.net/

13. http://www.w3schools.com/

14. http://www.tutorialspoint.com/