7 Data Manipulation Transparencies
Dr R. Baskaran
The SELECT statement allows you to specify the data that you want to retrieve, what order to arrange the data, what calculations to perform on the retrieved data and many, many more operations. As it’s the only SQL verb that enables you to query the database and SQL is a query language, it is necessarily the most complex of all SQL commands. ANSI/ISO SQL allows up to six different clauses in the SELECT statement of which the first two are mandatory.
The simple SELECT statement, as the name implies, is the most elementary form of query which uses only the mandatory clauses of the full SELECT. It only requires you to supply two pieces of information. First, the columns that you wish to see, and second, the name of the table that the columns are in. For example, this query retrieves all the rows in the DEPARTMENTS table:
SELECT DEPT_NO, DEPT_NAME, HEAD, BUDGET, P_BUDGET
FROM DEPARTMENTS ;
DEPT_NO DEPT_NAME HEAD BUDGET P_BUDGET
——- ——————— ——— ———— ———–
1 Engineering 59 5780000 6200000
2 Arts & Humanities 23 753000 643000
3 Management Studies 3 2510000 1220000
4 Industrial Law 12 78000 210000
5 Physical Sciences 18 4680000 4250000
To find out what the simple SELECT does, let’s have a closer look at what the query we’ve just used is telling the DBMS; “SELECT the DEPT_NO, the DEPT_NAME, the HEAD, the BUDGET and the P_BUDGET columns FROM the DEPARTMENTS table”.
When you read it out like this, it is obvious what information this query is requesting from the DBMS. In most versions of interpreted SQL, the results are displayed as soon as the DBMS finishes executing the query. In most cases, the results appear on the screen as they are shown in this book. Column names are at the top with the columns shown in the order in which they were specified in the SELECT statement. If more columns are specified in the SELECT statement than can fit on the screen, on some systems they are split up on two or more lines. Other systems allow you to scroll up, down, left or right through the results by using the arrow keys. The second method is better because when results columns are split up on different lines, the formatting is lost and data appears to be displayed haphazardly. The query result rows are not listed in any particular order. The DBMS just lists the rows in the order in which it comes across them in the table.
Note that all SQL queries (and other statements too, for that matter) end with the semicolon character. Newline can be used to format the query into clauses so that it is easier to understand what the query is doing when you refer to it several weeks later say. Most SQL interpreters and programs treat the newline and the tab characters as equivalent to the space character. You can type all SQL statements on a long single line if you wanted. To tell SQL that you have finished entering the query, you must type the semicolon character at the end.
To retrieve all the columns froma table, SQL allows you to use the asterisk, *, character as a shortcut.
Thus the following query is exactly the same as the previous query where we retrieved all the columns from the DEPARTMENTS table:
SELECT * FROM DEPARTMENTS ;
DEPT_NO DEPT_NAME HEAD BUDGET P_BUDGET
——- ——————- ——— ———— ————
1 Engineering 59 5780000 6200000
2 Arts & Humanities 23 753000 643000
3 Management Studies 3 2510000 1220000
4 Industrial Law 12 78000 210000
5 Physical Sciences 18 4680000 4250000
6 Medicine 67 6895000 6932000
In place of the asterisk, you should read “all the fields” in the SELECT statement. Notice how the columns in the results appear in the order in which they were defined when the table was created.
So far, we have looked at SELECT statement s that retrieve all the columns from a table. In most cases, we are only interested in certain columns in a table. SQL allows us to specify these columns in the first clause of the SELECT. As an example, say we wanted to look at the pass mark for each subject in the SUBJECTS table, we are only interested in the SUB_NAME and the PASS columns in the SUBJECTS table:
SELECT PASS, SUB_NAME FROM SUBJECTS ;
PASS SUB_NAME
—– ——————–
65 Mathematics
60 English Lit
71 Engineering Drwg
67 Basic Accounts
52 Industrial Law
57 Organic Chemistry
78 Physiology
74 Anatomy
71 Electronics
56 Marketing
If a column list is used the columns in the results table appear in the order in which they are specified in the SELECT. You can use this fact to change the order in which the columns appear in the results. Leaving columns out of the SELECT statement only affects the results of the query. It does not affect the data in the named table in any way. As well as simple column names, the SELECT clause also lets you use scalar expressions and string constants. Scalar expressions are simple calculations performed on numeric type column values. The results of the calculation are displayed in the results table as columns. For example, we can use a scalar expression using the annual pay field to display the monthly pay for each lecturer:
SELECT SURNAME, PAY, (PAY / 12)
FROM LECTURERS ;
SURNAME PAY
—————- ——– ———–
Jones 24000 2000
Scrivens 31800 2650
Nizamuddin 86790 7232
Campbell 43570 3630
Ramanujan 40900 3408
Finley 34210 2850
The third column in the results table has been generated as a direct result of the PAY / 12 calculation that we specified. The data in this column is not actually stored in any table, but has been calculated by SQL. In most versions of SQL, expressions are only allowed to use the addition, subtraction, multiplication and division functions. The fields used in expressions must be numeric type. Notice that the heading of the generated column is the expression that we used in the SELECT clause. This feature depends on the particular version of SQL that you use. Some dialects of SQL have blank headings for calculated columns.
AVG(PASS) is an aggregate function which calculates the average value of the PASS column. This will be discussed further in a later section. ANSI/ISO SQL defines SELECT statements as part of the DML. ANSI/ISO SQL further defines DML commands as having the ability to change the data in the database. SELECT by itself, cannot alter data in the database and so it is not strictly a part of the DML. Database data is modified only when SELECT is used in conjunction with other DML commands such as INSERT and UPDATE. It is best to think of the SELECT as being in a category by itself. SELECT lets you use the DISTINCT keyword to eliminate duplicate rows from the query results. Consider the DEPT_NO column in the STUDENTS table. This gives the department number that each student belongs to. If we simply wanted to know which departments are represented in the STUDENTS table, we could use the DISTINCT argument to remove repeat values for this column from the results table:
SELECT DISTINCT DEPT_NO
FROM STUDENTS ;
DEPT_NO
———
1
2
3
4
5
DISTINCT is very useful in queries where you simply want to know if a value is present in a table and are not interested in how many times it occurs. DISTINCT itself can only be used once in a SELECT statement. However, you can specify more than one column after DISTINCT. In this case, SQL will eliminate those rows where the values are the same in all the columns. The opposite of DISTINCT is ALL. This is the default that SQL assumes if neither is specified. In practice, ALL is not used. It is understood that if DISTINCT is absent, then the default, ALL is in effect and all columns, including duplicates will be displayed in the results table.
The BETWEEN range test operator allows you to define a predicate in the from of arrange. If a column value for a row falls within this range, then the predicate is true and the row will be added to the results table. The BETWEEN range test consists of two keywords, BETWEEN and AND. It must be supplied with the upper and the lower range values. The first value must be the lower bound and the second value, the upper bound.
For example, in the LECTURERS table, if we wanted to look at the records of all those lecturers who earn between 31,800 and 40,900:
SELECT SURNAME, PAY
FROM LECTURERS
WHERE PAY BETWEEN 31800 AND 40900 ;
SURNAME PAY
—————- ———
Scrivens 31800
Ramanujan 40900
Finley 34210
This query retrieves three records. Notice that the upper and lower parameters are inclusive. This means that the rows where pay equals 31,800 (lower bound) and 40,900 (upper bound) are also retrieved in the results. SQL will not allow you to specify the upper bound first. Thus the following query does not return any records:
SELECT SURNAME, PAY
FROM LECTURERS
WHERE PAY BETWEEN 40900 AND 31800 ;
No matching records found.
You can use character values as upper and lower range bounds:
SELECT * FROM LECTURERS WHERE SURNAME BETWEEN ‘N’ AND ‘R’ ;
SURNAME INITL LECT_NO DEPT_NO SUB_NO GRADE PAY JOINED
———- —– ——- ——- —— —– —— ———-
Nizamuddin W M 3 3 4 A 86790 05-26-1969
The query only retrieves one row because Nizamuddin is between N and R, but Ramanujan is not. When comparing strings of unequal length, SQL pads out the smaller string with spaces before doing the comparison. As the space character has a lower value than letter characters in the ASCII scheme, the word Ramanujan falls outside the upper bound. BETWEEN does not actually add any new functionality to SQL. All queries that use BETWEEN can be rephrased to run using only the comparison test operators instead.
For example the last query can be expressed without using BETWEEN as:
SELECT * FROM LECTURERS
WHERE (SURNAME >= ‘N’) AND (SURNAME <= ‘R’) ;
SURNAME INITL LECT_NO DEPT_NO SUB_NO GRADE PAY JOINED
———– —– ——- ——- —— —– —— ———-
Nizamuddin W M 3 3 4 A 86790 05-26-1969
The AND keyword is a boolean operator that tells SQL that both expressions inside the parentheses must be true for the predicate to be true. Although this query is functionally the same as the previous query, the one using BETWEEN is more elegant and it is clearer to the reader what the query is trying to achieve.
We’ve seen that BETWEEN defines a range of values to check against for inclusion or exclusion from the results table. This is not always enough. What if you needed to check for certain values only? Values that do not always fit in to a neat range. To accommodate this, SQL allows the use of the IN operator. An example will illustrate the use of IN. In the SUBJECTS table, if we wanted to look at the rows of the Anatomy and the Physiology subjects, we could use a query with IN:
SELECT * FROM SUBJECTS WHERE SUB_NAME IN (‘Anatomy’, ‘Physiology’) ;
SUB_NO SUB_NAME DEPT_NO CREDITS PASS
—— ————- ——- ——- —-
7 Physiology 6 3 78
8 Anatomy 6 1 74
You must define the set values within parentheses, and must separate each value with a comma. In this example, we have used string values. IN also allows other valid data types to be used as set members for example to list the subjects rows given that their pass marks are 52, 56 and 57:
SELECT SUB_NAME, PASS
FROM SUBJECTS
WHERE PASS IN (52, 56, 57) ;
SUB_NAME PASS
——————– —-
Industrial Law 52
Organic Chemistry 57
Marketing 56
As with all the SQL query commands, the result records are not displayed in any order unless the ordering is explicit ly specified. In the above query for example, we specified pass marks of 52, 56 and 57 in the inclusion set. The results table displayed the rows in the 52, 57, 56 order. The reason for this is that this is the order in which the DBMS found the rows in the table.
As with BETWEEN, IN does not add to SQL’s functionality. What IN does can also be accomplished by using comparison and Boolean operators. For example, the previous query can also be expressed as:
SELECT SUB_NAME, PASS FROM SUBJECTS WHERE PASS = 52 OR PASS = 56 OR PASS = 57 ;
SUB_NAME PASS
——————— —-
Industrial Law 52
Organic Chemistry 57
Marketing 56
The LIKE operator is used to match string pattern values. LIKE uses wildcard characters to specify one or more string character values. ANSI/ISO SQL defines two wildcard characters, the underscore (_) and the percent (%). These are the characters that are almost universally used in commercial SQL systems for pattern matching. String pattern matching is useful in cases where you are not sure of the exact string value that you need to search for. For example if you cannot remember the spelling of a person’s name:
SELECT * FROM STUDENTS WHERE SURNAME LIKE ‘A_ton’;
SURNAME FIRST_NAME D_O_B STUDENT_NO DEPT_NO YEAR
——— ———– ———- ———- ——- —-
Ayton Phil J M A 07-13-1967 3 3 1
The underscore character is one of the wildcards, and is used to represent any valid character (one only). In this query, we are not sure if the student’s surname is spelt as Ayton or Aeton or even Aiton. The LIKE ‘A_ton’ predicate tells SQL that the first letter of the surname is ‘A’ and the last three letters are ‘ton’, but we are not sure of the second letter. If you are familiar with the MS-DOS or OS/2 or UNIX operating systems, then the _ character performs the same function in SQL as ? does in MS-DOS, and . does in UNIX.
The previous query told SQL to retrieve those rows where the second letter of the surname is any valid character. The rest of the pattern ie. the first and the last three letters must match exactly as specified. The second wildcard character you can use in LIKE is the percent (%) character. This is used to represent a sequence of zero or more characters. The percent wildcard in SQL corresponds to the * wildcard in MS-DOS and OS/2 and UNIX. Let’s use percent to look at the records of all those students whose surname ends in ‘ton’:
SELECT * FROM STUDENTS WHERE SURNAME LIKE ‘%ton’ ;
SURNAME FIRST_NAME D_O_B STUDENT_NO DEPT_NO YEAR
——— ———– ———- ———- ——- —-
Ayton Phil J M A 07-13-1967 3 3 1
You can also mix and match the % and the _ wildcard characters in a single query:
SELECT * FROM STUDENTS WHERE SURNAME LIKE ‘A_t%’ ;
SURNAME FIRST_NAME D_O_B STUDENT_NO DEPT_NO YEAR
——— ———– ———- ———- ——- —-
Ayton Phil J M A 07-13-1967 3 3 1
The % and _ characters are themselves legal ASCII characters. Using valid characters as wildcards can cause problems.
What if you wanted to use % or _ as part of the string and not as wildcards? SQL’s solution to this is to allow you to define and use the escape character. The escape character has a special meaning in the LIKE string in that the character immediately following it is treated as a regular character and not a wildcard. For example suppose we wanted to search for the string ‘_search%’ where % and _ are regular characters and not wildcards, then we could use the following query with the ESCAPE clause: SELECT * FROM SUBJECTS WHERE SUB_NAME LIKE ‘$_search$%’ ESCAPE ‘$’ ; No matching records found.
The ESCAPE clause at the end of the query defines the dollar ($) character as the escape character. In the string, ‘$_search$%’, % and _ are treated as characters and not as wildcards. Of course, this query comes up empty because we do not have a subject called ‘%search_’ in the SUBJECTS table.
As we know NULL values are used to indicate that no data has been defined yet. This is different from blank string values or zero numeric values. Blank and zero values are just that, values. NULL marks the column as not having any definite value. When you use NULLs in SQL expressions, the result will always be undefined. For example, if you wanted to look at the rows in the LECTURERS table where the value for the DEPT_NO field is NULL, the following query will not retrieve the results you want:
SELECT SURNAME, DEPT_NO FROM LECTURERS WHERE DEPT_NO = NULL ;
SURNAME DEPT_NO
————— ——-
Jones 1
Scrivens 3
Nizamuddin 3
Campbell 5
Ramanujan 4
Finley 4
The DBMS retrieved all the lecturers row in our system because the predicate “DEPT_NO = NULL” is unknown for all the rows. It is neither true nor false. Another DBMS could just as easily have not retrieved any rows depending upon how it treats unknown predicate results. SQL provides the IS NULL operator to search specifically for NULL values. The valid form of the previous query is thus: SELECT SURNAME, DEPT_NO FROM LECTURERS WHERE DEPT_NO IS NULL ;
No matching records found.
The NOT logical operator (discussed in the next section) can be used to reverse the meaning of IS NULL. To retrieve the rows of those lecturers where the DEPT_NO value is not NULL:
SELECT SURNAME, DEPT_NO FROM LECTURERS WHERE DEPT_NO IS NOT NULL ;
SURNAME DEPT_NO
————— ——-
Jones 1
Scrivens 3
Nizamuddin 3
Campbell 5
Ramanujan 4
Finley 4
NOT can also be used with the other operators, eg NOT BETWEEN and NOT LIKE to reverse their meaning.
SQL lets you group expressions by using parentheses. These have the same effect in SQL expressions as they do in mathematical expressions. The expressions inside the parenthesis are evaluated first, and are treated as a single expression. In the above query, AND applies to the expression inside the parenthesis as a whole, ie. GRADE > ‘C’ OR PAY <= 30000. When you are analyzing complex WHERE clauses, it is best to break the WHERE into it’s constituent predicates and reading them in plain english. Let’s apply this to the last query. The first search condition is “department number is equal to 4”. The AND links this to a parenthesized expression, “either the grade is lower than C or pay is 30000 or less”. You need to be careful here because grade D is lower than grade C but the character D is greater than C. The last predicate is slightly more tricky. In english, we would say ” lecturer number is not equal to 5″. SQL doesn’t let you construct this as LECT_NO NOT = 5. The NOT must precede the boolean expression that it operates on. If we now put these all together, the WHERE clause can be expressed as “Where department number is equal to 4 and either the grade is lower than C or pay is 30000 or less and also, the lecturer number is not equal to 5”.