Friday, June 17. 2005MySQL 5 Stored Procedures and INFORMATION_SCHEMAA Handy One-Two Punch for AdministratorsIn writing Pro MySQL, I've become extremely excited about the new features debuting in MySQL 5---features that have already started to get thoroughly tested by the development community and have leveled MySQL with Oracle, SQL Server, and PostgreSQL on functional equivalencies long sought-after by developers and administrators. I figured, partly in response to Arjen Lentz' call to action, I'd write about two of these functional areas in this article: stored procedures and the INFORMATION_SCHEMA virtual database. Both features are detailed thoroughly in Pro MySQL, Chapters 9 and 21, respectively, but I wanted to do a quick article combining these two features into a practical example of MySQL 5 functionality. When I worked for RadioShack, a Microsoft shop, we used stored procedures and MS SQL Server's INFORMATION_SCHEMA views all the time. Since making the move to MySQL, I've missed these two features sorely. Well, now the wait's over. Stored ProceduresAt their core, stored procedures in MySQL are simply routines (a collection of SQL statements and flow control blocks) that MySQL allows you to save and call---or execute---at a later time. MySQL stored procedures allow you to specify parameters for the stored routine, so that when you execute the routine, you may pass in variable information, which the routine can use. It's easiest to see the use of stored procedures through an example. To start with, attach yourself to the nearest MySQL 5 installation, and connect with the standard MySQL client. For these examples, I'll just use the test database that all MySQL installations come with and a simple table called States (here is the script to create the sample table: create_states.sql): The Stored Procedure ScriptFor an easy introduction, let's make a stored procedure that simply returns all the records in our States table. Using your favorite text editor, copy the following code into and create a file called GetAllStates.sql:
DELIMITER //
To explain, I'll walk through the details:
DELIMITER //
starts off the stored procedure script and tells MySQL to use the // character sequence as the execution delimiter for the SQL statements contained in the script. We must do this because within our CREATE PROCEDURE statement, we have "normal" SQL statements which end in the standard ; delimiter. This line enables us to put SQL statements inside our CREATE PROCEDURE statement and not have the server execute those lines when we create the procedure. The \g command on the next line (and subsequent lines) simply tells the mysql client to "Go", or execute everything so far and prepare for the next statement. The next line simply changes to the database that we wish to create our procedure in. This is not necessarily the database that we may query within our procedure, just the database in which we wish to store, or associate the procedure with.
USE test
The next line:
DROP PROCEDURE IF EXISTS GetAllStates
simply drops the procedure we're about to create if it already exists. Having this statement in your stored procedure creation scripts allows you to make changes to the stored procedure and re-run the script to alter it, without having to manually drop the procedure yourself (you'll see how this happens in a moment). Next, we get to the heart of our script:
CREATE PROCEDURE GetAllStates ()
The CREATE PROCEDURE statement is followed by the name of the stored procedure you're creating---in this case, GetAllStates. Following the name of the procedure is a set of parentheses which take optional parameters. We'll take a look at parameters in our next example; for now, just hold tight. On the following line, the BEGIN statement marks the beginning of our stored procedure, and MySQL will consider all following statements to be part of the stored procedure until it receives the END statement on its own line. In our procedure GetAllStates(), the only statement contained in the procedure is a simple SELECT from our States table. Finally, we use the // character sequence to tell MySQL to execute everything from the CREATE PROCEDURE to the // as a single statement. Creating and Executing the ProcedureNow that we have our script saved, let's create the procedure by running the script against our database using the mysql client. Here, we're assuming the the current directory contains our GetAllStates.sql script from above. [root@localhost ~]# mysql < GetAllStates.sql The prompt should return quickly with no output. Next, let's test out our new stored procedure. First, we log in to the test database with the mysql client, and run our procedure using the CALL statement, followed by the name of our procedure and a set of parentheses: [root@localhost ~]# mysql test The stored procedure returns the results of our SELECT * FROM States statement. Much easier, and much nicer than typing out all that SQL, right? One of the best advantages to using stored procedures is that you encapsulate your SQL code into a single location. Now that you have your create script, you can make a change to the script, re-run the create script, and all you have to do is CALL your procedure again. Nice and compact. Stored Procedure ParametersOK, now I'll provide an example which uses parameters in your stored procedure definition and calling execution. This next stored procedure we'll call GetStatesBeginning, and we want to be able to pass a letter to the stored procedure and have it return all the states which begin with that letter. Check out the next script, which I've called GetStatesBeginning.sql:
DELIMITER //
Here, the lines of interest are the CREATE PROCEDURE line, where you have the parameter list inside the parenthese following the procedure name.
CREATE PROCEDURE GetStatesBeginning (IN starts_with CHAR(1))
The IN keyword specifies the parameter is an INPUT (as opposed to an OUTPUT parameter, which I'll cover in a later article). With the parameter name (starts_with), you provide the datatype for the parameter. The next two lines establish a new variable (tmp, of type CHAR(2)) which simply stores a string composed of the parameter passed to the procedure and the percent character:
DECLARE tmp CHAR(2);
Finally, we issue a SELECT statement against the States table with a LIKE expression that uses our tmp variable:
SELECT Code, Name FROM test.States WHERE Name LIKE tmp;
The rest of the script is identical to GetAllStates.sql. Executing the New Parameterized ProcedureSo, let's create the procedure and attempt to run it like we did with GetAllStates(): [root@localhost ~]# mysql < GetStatesBeginning.sql [root@localhost ~]# mysql test MySQL dutifully tells us that we've created a procedure for which it expects a parameter to be supplied. To do so, we simply provide the parameter inside the parentheses, like so: CALL GetStatesBeginning('C');
+------+-------------+
| Code | Name |
+------+-------------+
| CA | California |
| CO | Colorado |
| CT | Connecticut |
+------+-------------+
3 rows in set (0.01 sec)
For now, that will wrap up this discussion of stored procedure essentials. MySQL, of course, has all the specifications on various stored procedure options. Experiment and have fun. Next, we'll write a procedure which queries the new virtual database INFORMATION_SCHEMA, available in MySQL 5.0.2+. Querying the INFORMATION_SCHEMA Virtual DatabaseThe new INFORMATION_SCHEMA support in MySQL 5.0.2+ changes the way that metadata (or data which describes the MySQL schemata and system) is retrieved from the MySQL server. Prior to INFORMATION_SCHEMA, MySQL provided (and still does provide) the MySQL proprietary DESCRIBE and SHOW commands which displayed various information about the databases, tables, columns, and variables in the system. There are two major problems with the DESCRIBE and SHOW commands, however. First, they don't all have standard result outputs, in a row/column format. Secondly, they aren't standard ANSI SQL, and so aren't portable. The INFORMATION_SCHEMA virtual database provides access to the same (and more) information that the SHOW and DESCRIBE commands give, but allow access to the system data using a standard SELECT statement. This ability provides administrators and application developers with a whole world of flexibility to retrieve the data they need in a consistent and maintainable format. The reason I say that the INFORMATION_SCHEMA is a virtual database is because MySQL does not actually store the data retrieved by a query on the INFORMATION_SCHEMA tables. Instead, the data is pulled from the actual mysql system database and various system variables. For this reason, I'll refer to the INFORMATION_SCHEMA tables by the more appropriate term: the INFORMATION_SCHEMA views. You can find the most up-to-date list of all the INFORMATION_SCHEMA views on the MySQL website. Also, Chapter 21 of Pro MySQL also covers them in detail. For this article, we'll use only a couple of the views available: INFORMATION_SCHEMA.STATISTICS, which houses information on the indexes on your tables, and INFORMATION_SCHEMA.TABLES, which contains basic information about the tables in the various schemata on the server. Using INFORMATION_SCHEMA to Gather Index SelectivityAn index's selectivity is a measure of the uniqueness of the values contained in an index, or in a part of an index (if it contains more than one column). It's measured as a number from zero to one, with the following equation: S(I) = d/n where I is the index, d is the number of distinct values for the column, and n is the total number of records in the table. Clearly, if an index is entirely unique, it will have a selectivity of 1.0, since d = n. Prior to MySQL 5, in order to find the selectivity of all your indexes, you'd have to create a script which pulled a list of all the indexes on a table (possibly, by looping through a SHOW TABLES result, then issuing a SHOW INDEX FROM table_name, parsing the results, then issuing statements against the database which compared the cardinality value of each column in the index to the number of elements in the table. This is, in short, a pain in the butt. The INFORMATION_SCHEMA.STATISTICS and INFORMATION_SCHEMA.TABLES views can be queried in a single SQL statement to provide you selectivity statistics for all the tables on your server. No more messy scripts. Just simple, pure SELECT statements.
The following script, which I've saved as ShowIndexSelectivity.sql issues the SELECT statement against both views, joined by a common key value. Take a look through it to see how the INFORMATION_SCHEMA views relate. And, if you're wondering about the SQL used here, I might go into subqueries and derived tables in a later article (or you could pick up a copy of my book, which covers everything related to those topics...
SELECT
The code queries the INFORMATION_SCHEMA.STATISTICS view for cardinality numbers for each column in an index for any table not in the mysql system database (or, any user created database). It joins to INFORMATION_SCHEMA.TABLES on the TABLE_SCHEMA and TABLE_NAME columns in order to retrieve the total number of rows in each table (contained in the INFORMATION_SCHEMA.TABLES.TABLE_ROWS column). The rest of the query simply does some formatting, some calculation of the selectivity numbers, some filters for the records returned, and a sort. Running the above on a local development server, I came up with:
*************************** 1. row ***************************
TABLE_SCHEMA: jobs
TABLE_NAME: Job
INDEX_NAME: EmployerExpiresOn
COLUMN_NAME: Employer
SEQ_IN_INDEX: 1
COLS_IN_INDEX: 2
CARD: 31
ROWS: 56895
SEL %: 0.05
*************************** 2. row ***************************
TABLE_SCHEMA: jobs
TABLE_NAME: Job
INDEX_NAME: EmployerExpiresOn
COLUMN_NAME: ExpiresOn
SEQ_IN_INDEX: 2
COLS_IN_INDEX: 2
CARD: 49
ROWS: 56895
SEL %: 0.09
*************************** 3. row ***************************
TABLE_SCHEMA: jobs
TABLE_NAME: Job
INDEX_NAME: ExpiresOnLocation
COLUMN_NAME: Location
SEQ_IN_INDEX: 2
COLS_IN_INDEX: 2
CARD: 28447
ROWS: 56895
SEL %: 50.00
*************************** 4. row ***************************
TABLE_SCHEMA: jobs
TABLE_NAME: Job
INDEX_NAME: ExpiresOnLocation
COLUMN_NAME: ExpiresOn
SEQ_IN_INDEX: 1
COLS_IN_INDEX: 2
CARD: 38
ROWS: 56895
SEL %: 0.07
... omitted
*************************** 13. row ***************************
TABLE_SCHEMA: jobs
TABLE_NAME: EmployerRegion
INDEX_NAME: PRIMARY
COLUMN_NAME: Employer
SEQ_IN_INDEX: 1
COLS_IN_INDEX: 2
CARD: 9
ROWS: 495
SEL %: 1.82
*************************** 14. row ***************************
TABLE_SCHEMA: jobs
TABLE_NAME: JobSeekerJob
INDEX_NAME: PRIMARY
COLUMN_NAME: JobSeeker
SEQ_IN_INDEX: 1
COLS_IN_INDEX: 3
CARD: 7687
ROWS: 23062
SEL %: 33.33
14 rows in set (0.47 sec)
Pretty nifty to be able to get all this information in a single SELECT statement. Database designers and administrators now have an easy way to query system metadata and analyze statistics that MySQL keeps about your schema. To wrap up this article, I'll wrap the script above into a stored procedure which accepts a single input parameter specifying the database for which you want to see index selectivity stats for, and also fills an output parameter with the total number of index columns containing perfect (unique) selectivity. I hope this will come in handy as an exercise in the power and flexibility of stored procedures and the INFORMATION_SCHEMA virtual database in MySQL 5. Here's ShowIndexSelectivityForDb.sql:
DELIMITER //
The ShowIndexSelectivityForDb() stored procedure takes an input argument of the database name, along with a placeholder for the output parameter to be populated by the first SELECT statement within the stored procedure. Calling the stored procedure, we get a number of results (the output of the second SELECT of the procedure) and can use our output variable parameter as needed:
mysql> CALL ShowIndexSelectivityForDb('jobs', @NumUniques);
+--------------------------+-------------------+-----------+-----+--------+-------+-------+-------+
| TABLE | INDEX | COLUMN | SEQ | # COLS | CARD | ROWS | SEL % |
+--------------------------+-------------------+-----------+-----+--------+-------+-------+-------+
| EmployerRegion | PRIMARY | Employer | 1 | 2 | 9 | 495 | 1.82 |
| Job | EmployerExpiresOn | Employer | 1 | 2 | 31 | 56895 | 0.05 |
| Job | EmployerExpiresOn | ExpiresOn | 2 | 2 | 49 | 56895 | 0.09 |
| Job | ExpiresOnLocation | ExpiresOn | 1 | 2 | 38 | 56895 | 0.07 |
| Job | ExpiresOnLocation | Location | 2 | 2 | 28447 | 56895 | 50.00 |
| Job | PRIMARY | Location | 1 | 2 | 28447 | 56895 | 50.00 |
| JobSeekerJob | PRIMARY | JobSeeker | 1 | 3 | 7687 | 23062 | 33.33 |
| JobSeekerResumePosition | PRIMARY | JobSeeker | 1 | 2 | 16553 | 33106 | 50.00 |
| JobSeekerResumeReference | PRIMARY | JobSeeker | 1 | 2 | 1854 | 5563 | 33.33 |
| JobSeekerSearchRequest | PRIMARY | JobSeeker | 1 | 2 | 6625 | 53002 | 12.50 |
| Location | City | City | 1 | 1 | 6473 | 32365 | 20.00 |
| Location | State | State | 1 | 1 | 62 | 32365 | 0.19 |
| Location | Zip | Zip | 1 | 1 | 10788 | 32365 | 33.33 |
| Position | Employer | Employer | 1 | 1 | 37 | 74 | 50.00 |
| Scout | medals | medals | 1 | 1 | 1 | 37105 | 0.00 |
| StoreLocation | zip | zip | 1 | 1 | 4820 | 9640 | 50.00 |
+--------------------------+-------------------+-----------+-----+--------+-------+-------+-------+
16 rows in set (2.45 sec)
Trackbacks
Trackback specific URI for this entry
No Trackbacks
Comments
Display comments as
(Linear | Threaded)
I get an error when I try to create the stored procedure ShowIndexSelectivityForDb.
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'SELECT COUNT(*) INTO TotalUniqueIndexColumns FROM INFORMATION_SCHEMA.STATISTIC' at line 7 |
Calendar
QuicksearchArchivesCategoriesSyndicate This Blog |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
