| [Top] | [Contents] | [Index] | [ ? ] |
Copyright © 1998-2004 MySQL AB
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The purpose of this chapter is to establish a set of guidelines that will help you, the developer, to follow the established MySQL coding styles and standards when writing code. Consistent style is important as it makes it easier to maintain our code and allows you to find specific routines much more quickly.
The standards contained within this chapter apply to the mysql server, and do not necessarily apply to other projects such as JDBC, ODBC, Administrator, etc.
Indentation and Spacing
The following rules apply to indentation and spacing of code within a source file:
switch statement, all braces
should appear on their own line. The only exception to having braces on
separate lines is if there is nothing between the braces.
Examples:
if (is_valid(x))
{
x= 2;
}
switch (my_arg) {
for (...)
{}
|
if, for and
while.
return_value= my_function(arg1, arg2, arg3); |
x= .
When you make multiple assignments indent the right-hand values to make them easier to read:
int x= 27; int new_var= 18; |
x + y; if (x == y); |
int x= 11; int y= 12; int z= 0; z= x; y += x; |
int *var; |
Naming Conventions
my_var as opposed to myVar or MyVar (underscore
rather than capitalization is to be used for separating words in
identifiers).
typedef'ed to an all-upper-case identifier.
#define declarations should be in upper case.
#define MY_CONSTANT 15 |
enum_.
Commenting Code
In addition, do not use asterixes on left of the comment.
/* This is how a multi-line comment should look. */ |
// comments are allowed at end of lines. In other cases, use
/* */ comments. In C files or headers used by C files, one should not use
// comments.
Every function should have a description unless the function is very short and its purpose is obvious.
/* This is a function. Use it wisely. */ int my_function() |
General Development Guidelines
shell> bk clone bk://mysql.bkbits.net/mysql-5.0 mysql-5.0
my_* functions like my_read()/my_write()/
my_malloc() that you can find in the mysys library, instead
of the direct system calls; This will make your code easier to debug and
more portable.
libstring functions (in the `strings' directory)
instead of standard libc string functions whenever possible.
For example, use bfill() and bzero() instead of
memset().
NULL more than once.
The cases where you should use inline functions are when the function satisfies most of the following requirements:
malloc(), which is very slow. For memory allocations
that only need to live for the lifetime of one thread, use
sql_alloc() instead.
if(a() || b() || c()) { error("something went wrong"); }
|
goto is okay if not abused.
LINT_INIT() if the
compiler complains after making sure that there is really no way
the variable can be used uninitialized.
TRUE and FALSE instead of true and false
in C++ code. This makes the code more readable and makes it easier to
use it later in a C library, if needed.
bool exists only in C++. In C, you have to use
my_bool (which is char); it has different cast rules than
bool:
int c= 256*2; bool a= c; /* a gets 'true' */ my_bool b= c; /* b gets zero, i.e. 'false': BAD */ my_bool b= test(c); /* b gets 'true': GOOD */ |
In C++, use bool, unless the variable is used in C code
(for example the variable is passed to a C function).
&variable_name construct in C++.
Alway use a pointer instead!
The reason is that the above makes it much harder for the one reading the caller function code to know what is happening and what kind of code the compiler is generating for the call.
Return_value_type *Class_name::method_name(const char *arg1,
size_t arg2, Type *arg3)
|
return_value= function_name(argument1, argument2, long_argument3,
argument4,
function_name2(long_argument5,
long_argument6));
|
If function or argument names are too long:
return_value=
long_long_function_name(long_long_argument1, long_long_argument2,
long_long_long_argument3,
long_long_argument4,
long_function_name2(long_long_argument5,
long_long_argument6));
|
Long_long_return_value_type *
Long_long_class_name::
long_long_method_name(const char *long_long_arg1, size_t long_long_arg2,
Long_long_type *arg3)
|
(you can but don't have to split Class_name::method_name into two lines)
In such cases think also about renaming arguments.
Item::Item(int a_arg, int b_arg, int c_arg)
:a(a_arg), b(b_arg), c(c_arg)
{}
|
But keep lines short to make them more readable:
Item::Item(int longer_arg, int more_longer_arg)
:longer(longer_arg),
more_longer(more_longer_arg)
{}
|
If constructor can fit into one line:
Item::Item(int a_arg) :a(a_arg) {}
|
Type value; int var2; ulonglong var3; |
Always align assignments of one structure to another, like this:
foo->member= bar->member; foo->name= bar->name; foo->name_length= bar->name_length; |
When you write the assignments that way, you typically don't read the structures names more than once.
Suggested mode in emacs:
(require 'font-lock)
(require 'cc-mode)
(setq global-font-lock-mode t) ;;colors in all buffers that support it
(setq font-lock-maximum-decoration t) ;;maximum color
(c-add-style "MY"
'("K&R"
'("MY"
(c-basic-offset . 2)
(c-comment-only-line-offset . 0)
(c-offsets-alist . ((statement-block-intro . +)
(knr-argdecl-intro . 0)
(substatement-open . 0)
(label . -)
(statement-cont . +)
(arglist-intro . c-lineup-arglist-intro-after-paren)
(arglist-close . c-lineup-arglist)
))
)))
(setq c-mode-common-hook '(lambda ()
(c-set-style "MY")
(setq tab-width 8)
(setq indent-tabs-mode t)
(setq comment-column 48)))
(c-set-style "MY")
(setq c-default-style "MY")
|
Basic vim setup:
set tabstop=8 set shiftwidth=2 set backspace=2 set softtabstop set smartindent set cindent set cinoptions=g0:0t0c2C1(0f0l1 "set expandtab "uncomment if you don't want to use tabstops |
Another vim setup:
set tabstop=8
set shiftwidth=2
set bs=2
set et
set sts=2
set tw=78
set formatoptions=cqroa1
set cinoptions=g0:0t0c2C1(0f0l1
set cindent
function InsertShiftTabWrapper()
let num_spaces = 48 - virtcol('.')
let line = ' '
while (num_spaces > 0)
let line = line . ' '
let num_spaces = num_spaces - 1
endwhile
return line
endfunction
" jump to 48th column by Shift-Tab - to place a comment there
inoremap <S-tab> <c-r>=InsertShiftTabWrapper()<cr>
" highlight trailing spaces as errors
let c_space_errors=1
|
DBUG Tags
Here are some of the DBUG tags we now use:
enterArguments to the function.
exitResults from the function.
infoSomething that may be interesting.
warningWhen something doesn't go the usual route or may be wrong.
errorWhen something went wrong.
loopWrite in a loop, that is probably only useful when debugging the loop. These should normally be deleted when you are satisfied with the code and it has been in real use for a while.
Some tags specific to mysqld, because we want to watch these carefully:
transStarting/stopping transactions.
quitinfo when mysqld is preparing to die.
queryPrint query.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Definitions
This description uses a narrow definition: The OPTIMISER is the set of routines which decide what execution path the DBMS should take for queries.
MySQL changes these routines frequently, so you
should compare what is said here with what's in the
current source code. To make that easy, this description
includes notes referring to the relevant routine,
for example "See: `/sql/select_cc', optimize_cond()".
When one query is changed into another query which delivers the same result, that is a TRANSFORMATION. For example, the DBMS could change
SELECT ... WHERE 5 = a |
to
SELECT ...WHERE a = 5 |
Most transformations are less obvious. Some transformations result in faster execution.
The Code
Here is a diagram showing the code structure of
handle_select() in `/sql/sql_select.cc', the server
code that handles a query:
handle_select()
mysql_select()
JOIN::prepare()
setup_fields()
JOIN::optimize() /* optimizer is from here ... */
optimize_cond()
opt_sum_query()
make_join_statistics()
get_quick_record_count()
choose_plan()
/* Find the best way to access tables */
/* as specified by the user. */
optimize_straight_join()
best_access_path()
/* Find a (sub-)optimal plan among all or subset */
/* of all possible query plans where the user */
/* controlls the exhaustiveness of the search. */
greedy_search()
best_extension_by_limited_search()
best_access_path()
/* Perform an exhaustive search for an optimal plan */
find_best()
make_join_select() /* ... to here */
JOIN::exec()
|
The indentation in the diagram shows what calls what.
Thus you can see that handle_select() calls mysql_select()
which calls JOIN::prepare() which calls setup_fields(),
and so on. The first part of mysql_select() is JOIN::prepare()
which is for context analysis, metadata setup, and some
subquery transformations. The optimizer is JOIN::optimize()
and all its subordinate routines. When the optimizer finishes,
JOIN::exec() takes over and does the job that JOIN::optimize()
decides upon.
Although the word "JOIN" appears, these optimizer routines are for all query types.
The optimize_cond() and opt_sum_query() routines do
transformations. The make_join_statistics()
routine puts together all the information it can find about
indexes that might be useful for accessing the query's tables.
Constant Propagation
A transformation takes place for expressions like this:
WHERE column1 = column2 AND column2 = 'x' |
For such expressions, since it is known that "if A=B and B=C then A=C" (the Transitivity Law), the transformed condition becomes:
WHERE column1='x' AND column2='x' |
This transformation occurs for column1 <operator> column2
conditions if and only if <operator> is one of these operators:
=, <, >, <=, >=, <>, <=>, LIKE |
That is, transitive transformations don't apply for
BETWEEN. Probably they should not apply for LIKE either,
but that's a story for another day.
Constant propagation happens in a loop, so the output from one "propagation step" can be input for the next step.
See: `/sql/sql_select.cc', change_cond_ref_to_const().
Or see: `/sql/sql_select.cc', propagate_cond_constants().
Dead Code Elimination
A transformation takes place for always-true conditions:
WHERE 0=0 AND column1='y' |
The first condition is always true, so it is removed, leaving:
WHERE column1='y' |
See: `/sql/sql_select.cc', remove_eq_conds().
A transformation takes place for always-false conditions:
WHERE (0 = 1 AND s1 = 5) OR s1 = 7 |
The parenthesized part is always false, so it is removed, reducing the expression above to:
WHERE s1 = 7 |
Sometimes the optimizer might eliminate the whole WHERE
clause:
WHERE (0 = 1 AND s1 = 5) |
The EXPLAIN statement will show the words Impossible WHERE.
Informally, we at MySQL say: "The WHERE has been optimized
away."
If a column cannot be NULL, the optimizer removes any
non-relevant IS NULL conditions. Thus,
WHERE not_null_column IS NULL |
is an always-false situation, and
WHERE not_null_column IS NOT NULL |
is an always-true situation -- so such columns are also
eliminated from the conditional expression. This can be
tricky. For example, in an OUTER JOIN, a column which is
defined as NOT NULL might still contain a NULL. The optimizer
leaves IS NULL conditions alone in such exceptional
situations.
The optimizer will not detect all possible impossible
WHERE situations -- there are too many. For example:
CREATE TABLE Table1 (column1 CHAR(1)); ... SELECT * FROM Table1 WHERE column1 = 'Canada'; |
The optimizer will not eliminate the condition in the
query, even though the CREATE TABLE definition makes it
an impossible condition.
Constant Folding
A transformation takes place for this expression:
WHERE column1 = 1 + 2 |
which becomes:
WHERE column1 = 3 |
Before you say "but I never would write 1 + 2 in the first place" -- remember what was said earlier about constant propagation. It is quite easy for the optimizer to put such expressions together. This process simplifies the result.
Constants and Constant Tables
A MySQL "constant" is something more than a mere literal in the query. It can also be the contents of a "constant table," which is defined as follows:
WHERE condition, containing expressions of the form
column = constant, for all the columns of the table's
PRIMARY KEY, or for all the columns of any of the table's
UNIQUE keys (provided that the UNIQUE columns are also
defined as NOT NULL).
For example, if the table
definition for Table0 contains
... PRIMARY KEY (column1,column2) |
then this expression
FROM Table0 ... WHERE column1=5 AND column2=7 ... |
returns a constant table. More simply, if the
table definition for Table1 contains
... unique_not_null_column INT NOT NULL UNIQUE |
then this expression
FROM Table1 ... WHERE unique_not_null_column=5 |
returns a constant table.
These rules mean that a constant table has at most one row value. MySQL will evaluate a constant table in advance, to find out what that value is. Then MySQL will "plug" that value into the query. Here's an example:
SELECT Table1.unique_not_null_column, Table2.any_column
FROM Table1, Table2
WHERE Table1.unique_not_null_column = Table2.any_column
AND Table1.unique_not_null_column = 5;
|
When evaluating this query, MySQL first finds that table
Table1 -- after restriction with Table1.unique_not_null_column
-- is a constant table according to the second definition above.
So it retrieves that value.
If the retrieval fails (there is no row in the table with
unique_not_null_column = 5), then the constant table has zero
rows and you will see this message if you run EXPLAIN for
the statement:
Impossible WHERE noticed after reading const tables |
Alternatively, if the retrieval succeeds (there is exactly
one row in the table with unique_not_null_column = 5), then
the constant table has one row and MySQL transforms the
query to this:
SELECT 5, Table2.any_column
FROM Table1, Table2
WHERE 5 = Table2.any_column
AND 5 = 5;
|
Actually this is a grand-combination example. The optimizer does some of the transformation because of constant propagation, which we described earlier. By the way, we described constant propagation first because it happens happens before MySQL figures out what the constant tables are. The sequence of optimizer steps sometimes makes a difference.
Although many queries have no constant-table references, it should be kept in mind that whenever the word "constant" is mentioned hereafter, it refers either to a literal or to the contents of a constant table.
See: `/sql/sql_select.cc', make_join_statistics().
Join Type
When evaluating a conditional expression, MySQL decides what "join type" the expression has. (Again: despite the word "join," this applies for all conditional expressions, not just join expressions. A term like "access type" would be clearer.) These are the documented join types, in order from best to worst:
system ... a system table which is a constant table const ... a constant table eq_ref ... unique/primary index with '=' for joining ref ... index with '=' ref_or_null ... index with '=', possibly NULL range ... index with BETWEEN, IN, >=, LIKE, etc. index ... sequential scan of index ALL ... sequential scan of table |
See: `/sql/sql_select.h', enum join_type{}. Notice that there are
a few other (undocumented) join types too, for subqueries.
The optimizer can use the join type to pick a "driver expression." For example, consider this query:
SELECT * FROM Table1 WHERE indexed_column = 5 AND unindexed_column = 6 |
Since indexed_column has a better join type, it is more likely
to be the driver. You'll see various exceptions as this
description proceeds, but this is a simple first rule.
What is significant about a driver? Consider that there are two execution paths for the query:
(The Bad Execution Path) Read every row in the table. (This is
called a "sequential scan of Table1" or just "table scan.")
For each row, examine the values in indexed_column and in
unindexed_column, to see if they meet the conditions.
(The Good Execution Plan) Via the index, look up the rows which
have indexed_column = 5. (This is called an "indexed search.")
For each row, examine the value in unindexed_column to see if
it meets the condition.
An indexed search generally involves fewer accesses than a sequential scan, and far fewer accesses if the table is large but the index is UNIQUE. That is why it is better to access with "The Good Execution Plan," and that is why it is often good to choose indexed_column as the driver.
The 'range' Join Type
Some conditions can work with indexes, but over a (possibly
wide) range of keys. These are known as "range" conditions,
and are most often encountered with expressions involving
these operators: >, >=, <, <=, IN, LIKE, BETWEEN
To the optimizer, this expression:
column1 IN (1,2,3) |
is the same as this one:
column1 = 1 OR column1 = 2 OR column1 = 3 |
and MySQL treats them the same -- there is no need to change IN to OR for a query, or vice versa.
The optimizer will use an index (range search) for
column1 LIKE 'x%' |
but not for
column1 LIKE '%x' |
That is, there is no range search if the first character in the pattern is a wildcard.
To the optimizer,
column1 BETWEEN 5 AND 7 |
is the same as this expression
column1 >= 5 AND column1 <= 7 |
and again, MySQL treats both expressions the same.
The optimizer may change a Range to an ALL
join type if a condition would examine too many
index keys. Such a change is particularly likely
for < and > conditions and multiple-level
secondary indexes. See: (for MyISAM indexes)
`/myisam/mi_range.c', mi_records_in_range().
The index Join Type
Consider this query:
SELECT column1 FROM Table1; |
If column1 is indexed, then the optimizer may
choose to retrieve the values from the index
rather than from the table. An index which is
used this way is called a "covering index"
in most texts. MySQL just uses the word "index"
in EXPLAIN descriptions.
For this query:
SELECT column1, column2 FROM Table1; |
the optimizer will use "join type = index" only
if the index has this definition:
CREATE INDEX ... ON Table1 (column1, column2); |
In other words, all columns in the select list must be in the index. (The order of the columns in the index does not matter.) Thus it might make sense to define a multiple-column index strictly for use as a covering index, regardless of search considerations.
Transposition
MySQL supports transpositions (reversing the order of operands around a relational operator) for simple expressions only. In other words:
WHERE - 5 = column1 |
becomes:
WHERE column1 = -5 |
However, MySQL does not support transpositions where arithmetic exists. Thus:
WHERE 5 = -column1 |
is not treated the same as:
WHERE column1 = -5 |
Transpositions to expressions of the form <column>=<constant>
are ideal for index lookups. If an expression of this
form refers to an indexed column, then MySQL always uses
the index, regardless of the table size. (Exception:
if the table has only zero rows or only one row, it is
a constant table and receives special treatment. See the
earlier section "Constants and Constant Tables".)
AND
The ANDed search has the form <condition> AND <condition>, as in:
WHERE column1 = 'x' AND column2 = 'y' |
Here the optimizer's decision is:
Here's an example:
CREATE TABLE Table1 (s1 INT, s2 INT); CREATE INDEX Index1 ON Table1 (s2); CREATE INDEX Index2 ON Table1 (s1); ... SELECT * FROM Table1 WHERE s1 = 5 AND s2 = 5; |
When choosing a strategy to solve this query, the optimizer
picks s2 = 5 as the driver because the index for s2 was
created first. Regard this as an accidental
effect rather than a rule -- it could change at any moment.
OR
The ORed search has the form "<condition> OR <condition>" as in:
WHERE column1 = 'x' OR column2 = 'y' |
Here the optimizer's decision is:
Use a sequential scan. |
In theory there is another choice if both column1 and column2 are indexed:
index search for the first condition, index search for the second condition, merge the two result sets |
MySQL never does that. But MySQL will do that in future, the code for doing so already exists.
The above warning does not apply if the same column is used in both conditions. For example:
WHERE column1 = 'x' OR column1 = 'y' |
In such a case, the search is indexed because the expression
is a range search. This subject will be revisited during the
discussion of the IN predicate.
AND plus OR
Consider this search expression:
WHERE column1 = 5 AND (column2 = 5 OR column3 = 5) |
If column1 is indexed, then the optimizer will choose to use
the index. In other words, the "sequential scan if OR" rule
does not apply if the OR is subordinate to an AND.
UNION
All SELECT statements within a UNION are optimized separately.
Therefore, for this query:
SELECT * FROM Table1 WHERE column1 = 'x' UNION ALL SELECT * FROM TABLE1 WHERE column2 = 'y' |
if both column1 and column2 are indexed, then each SELECT
is done using an indexed search, and the result sets are
merged. Notice that this query might produce the same results
as the query used in the OR example, which uses a sequential
scan.
NOT, <>
It is a logical rule that
column1 <> 5 |
is the same as
column1 < 5 OR column1 > 5 |
However, MySQL does not transform in this circumstance. If you think that a range search would be better, then you should do your own transforming in such cases.
It is also a logical rule that
WHERE NOT (column1 != 5) |
is the same as
WHERE column1 = 5 |
However, MySQL does not transform in this circumstance either.
We expect to add optimizations soon for both the above cases.
ORDER BY
In general, the optimizer will skip the sort procedure
for the ORDER BY clause if it sees that the rows will
be in order anyway. But let's examine some exceptional
situations.
For the query:
SELECT column1 FROM Table1 ORDER BY 'x'; |
the optimizer will throw out the ORDER BY clause. This
is another example of dead code elimination.
For the query:
SELECT column1 FROM Table1 ORDER BY column1; |
the optimizer will use an index on column1, if it exists.
For the query:
SELECT column1 FROM Table1 ORDER BY column1+1; |
the optimizer will use an index on column1, if it exists.
But don't let that fool you! The index is only for finding
the values. (It's cheaper to do a sequential scan of the
index than a sequential scan of the table, that's why index
is a better join type than ALL -- see "The 'index' Join Type"
section, earlier.) There will still be a full sort of the
results.
For the query:
SELECT * FROM Table1 WHERE column1 > 'x' AND column2 > 'x' ORDER BY column2; |
if both column1 and column2 are indexed, the optimizer will
choose an index on ... column1. The fact that ordering takes
place by column2 values does not affect the choice of driver
in this case.
See: `/sql/sql_select.cc', test_if_order_by_key(),
and `/sql/sql_select.cc', test_if_skip_sort_order().
There is a description of the internal sort procedure in
the MySQL Reference Manual, in section 5.2.8 "How MySQL
optimizes ORDER BY." We will not repeat it here, but urge
you to read it because it includes a description of how
the buffering and the quicksort operate.
See: `/sql/sql_select.cc', create_sort_index().
GROUP BY
These are the main optimizations that take place for
GROUP BY and related items (HAVING, COUNT(), MAX(),
MIN(), SUM(), AVG(), DISTINCT()).
GROUP BY will use an index, if one exists.
GROUP BY will use sorting, if there is no index.
The optimizer may choose to use a hash table.
GROUP BY x ORDER BY x, the optimizer
will realize that the ORDER BY is unnecessary, because
the GROUP BY comes out in order by x.
HAVING conditions to the WHERE clause; however, this
code is not operative at time of writing.
See: `/sql/sql_select.cc', JOIN::optimize(), after
#ifdef HAVE_REF_TO_FIELDS.
SELECT COUNT(*) FROM Table1; |
gets the count without going through all the rows.
This is true for MyISAM tables, but not for InnoDB
tables. Note that the query
SELECT COUNT(column1) FROM Table1; |
is not subject to the same optimization, unless
column1 is defined as NOT NULL.
MAX() and MIN(). For example,
consider the query
SELECT MAX(column1) FROM Table1 WHERE column1 < 'a'; |
If column1 is indexed, then it's easy to find the
highest value by looking for 'a' in the index and
going back to the key before that.
SELECT DISTINCT column1 FROM Table1; |
to
SELECT column1 FROM Table1 GROUP BY column1; |
if and only if both of these conditions are true:
GROUP BY can be done with an index.
(This implies that there is only one table in the FROM
clause, and no WHERE clause.)
LIMIT clause.
Because DISTINCT is not always transformed to GROUP BY,
do not expect that queries with DISTINCT will always
cause ordered result sets. (You can, however, rely on
that rule with GROUP BY, unless the query includes
ORDER BY NULL.)
See: `/sql/sql_select.cc', opt_sum_query(),
and `/sql/sql_select.cc', remove_duplicates().
JOIN
Bad join choices can cause more damage than bad choices in single-table searches, so MySQL developers have spent proportionally more time making sure that the tables in a query are joined in an optimal order and that optimal access methods (often called "access paths") are chosen to retrieve table data. A combination of a fixed order in which tables are joined and the corresponding table access methods for each table is called "query execution plan" (QEP). The goal of the query optimizer is to find an optimal QEP among all possible such plans. There are several general ideas behind join optimization.
Each plan (or part of plan) is assigned a "cost." The cost of a plan reflects roughly the resources needed to compute a query according to the plan, where the main factor is the number of rows that will be accessed while computing a query. Once we have a way to assign costs to different QEPs we have a way to compare them. Thus, the goal of the optimizer is to find a QEP with minimal cost among all possible plans.
In MySQL, the search for an optimal QEP is performed in a bottom-up manner. The optimizer first considers all plans for one table, then all plans for two tables, and so on, until it builds a complete optimal QEP. Query plans that consist of only some of the tables (and predicates) in a query are called "partial plans." The optimizer relies on the fact that the more tables are added to a partial plan, the greater its cost. This allows the optimizer to expand with more tables only the partial plans with lower cost than the current best complete plan.
The key routine that performs the search for an optimal QEP is
`sql/sql_select.cc', find_best(). It performs an exhaustive
search of all possible plans and thus guarantees it will find an optimal
one.
Below we represent find_best() in an extremely free translation to
pseudocode. It is recursive, so some input variables are labeled "so far"
to indicate that they come from a previous iteration.
remaining_tables = {t1, ..., tn}; /* all tables referenced in a query */
procedure find_best(
partial_plan in, /* in, partial plan of tables-joined-so-far */
partial_plan_cost, /* in, cost of partial_plan */
remaining_tables, /* in, set of tables not referenced in partial_plan */
best_plan_so_far, /* in/out, best plan found so far */
best_plan_so_far_cost)/* in/out, cost of best_plan_so_far */
{
for each table T from remaining_tables
{
/* Calculate the cost of using table T. Factors that the
optimizer takes into account may include:
Many rows in table (bad)
Many key parts in common with tables so far (very good)
Restriction mentioned in the WHERE clause (good)
Long key (good)
Unique or primary key (good)
Full-text key (bad)
Other factors that may at some time be worth considering:
Many columns in key
Short average/maximum key length
Small table file
Few levels in index
All ORDER BY / GROUP columns come from this table */
cost = complex-series-of-calculations;
/* Add the cost to the cost so far. */
partial_plan_cost+= cost;
if (partial_plan_cost >= best_plan_so_far_cost)
/* partial_plan_cost already too great, stop search */
continue;
partial_plan= expand partial_plan by best_access_method;
remaining_tables= remaining_tables - table T;
if (remaining_tables is not an empty set)
{
find_best(partial_plan, partial_plan_cost,
remaining_tables,
best_plan_so_far, best_plan_so_far_cost);
}
else
{
best_plan_so_far_cost= partial_plan_cost;
best_plan_so_far= partial_plan;
}
}
}
|
Here the optimizer applies a "depth-first search algorithm." It tries
estimates for every table in the FROM clause. It will stop a search
early if the estimate becomes worse than the best estimate so far. The order
of scanning will depend on the order that the tables appear in the
FROM clause.
See: `/sql/table.h', struct st_table.
ANALYZE TABLE may affect some of the factors that the
optimizer considers.
See also: `/sql/sql_sqlect.cc', make_join_statistics().
The straightforward use of find_best() and greedy_search()
will not apply for LEFT JOIN or RIGHT JOIN. For example,
starting with MySQL 4.0.14, the optimizer may change a left join to a
straight join and swap the table order in some cases. See also "5.2.7
How MySQL Optimises LEFT JOIN and RIGHT JOIN" in the MySQL
Reference Manual.
2.1 The Index Merge Join Type |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Index Merge Join Type | 2.1.1 Overview | ||
| 2.1.2 Index Merge Optimizer | ||
| 2.1.3 Row Retrieval Algorithm |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Index Merge is used when table condition can be converted to form:
cond_1 OR cond_2 ... OR cond_N |
The conditions for conversion are that
each cond_i can be used for a range scan,
and no pair (cond_i, cond_j) uses the same index.
(If cond_i and cond_j use the same index, then
cond_i OR cond_j can be combined into a single range scan
and no merging is necessary.)
For example, Index Merge can be used for the following queries:
SELECT * FROM t WHERE key1=c1 OR key2<c2 OR key3 IN (c3,c4); SELECT * FROM t WHERE (key1=c1 OR key2<c2) AND nonkey=c3; |
Index Merge is implemented as a "container" for range key scans
constructed from cond_i conditions. When doing Index Merge,
MySQL retrieves rows for each of the keyscans and then runs them through
a duplicate elimination procedure.
Currently the Unique class is used for duplicate elimination.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
| 2.1.2.1 Range Optimizer | ||
| 2.1.2.2 Index Merge Optimizer |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
For Range-type queries, the MySQL optimizer builds a SEL_TREE
object which represents a condition in this form:
range_cond = (cond_key_1 AND cond_key_2 AND ... AND cond_key_N) |
Each of cond_key_i is a condition that refers to components of one
key. MySQL creates a cond_key_i condition for each of the usable keys.
Then the cheapest condition cond_key_i is used for doing range scan.
A single cond_key_i condition is represented by a pointer-linked
network of SEL_ARG objects. Each SEL_ARG object refers to
particular part of the key and represents the following condition:
sel_arg_cond= (inf_val < key_part_n AND key_part_n < sup_val) (1)
AND next_key_part_sel_arg_cond (2)
OR left_sel_arg_cond (3)
OR right_sel_arg_cond (4)
|
SEL_ARG object with condition on next key component.
SEL_ARG object with interval on the same field as this
SEL_ARG object. Intervals of current and "left" object are
disjoint and left_sel_arg_cond.sup_val <= inf_val.
SEL_ARG object with interval on the same field as this
SEL_ARG object. Intervals of current and "right" object are
disjoint and left_sel_arg_cond.min_val >= max_val.
MySQL is able to convert arbitrary-depth nested AND-OR conditions to the above conjunctive form.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A single SEL_TREE object cannot be constructed for conditions that
have different members of keys in the OR clause, like in condition:
key1 < c1 OR key2 < c2 |
Beginning with MySQL 5.0,
these conditions are handled with the Index Merge method, and its
range optimizer structure, class SEL_IMERGE. SEL_IMERGE
represents a disjunction of several SEL_TREE objects, which can be
expressed as:
sel_imerge_cond = (t_1 OR t_1 OR ... OR t_n) |
where each of t_i stands for a SEL_TREE object, and no pair
(t_i, t_j) of distinct SEL_TREE objects can be combined
into single SEL_TREE object.
The current implementation builds SEL_IMERGE only if no single
SEL_TREE object can be built for the part of the query condition
it has analyzed, and discards SEL_TREE immediately if it discovers
that a single SEL_TREE object can be constructed. This is actually
a limitation, and can cause worse row retrieval strategy to be used.
E.g. for query:
SELECT * FROM t WHERE (goodkey1=c1 OR goodkey1=c2) AND badkey=c3 |
scan on badkey will be chosen even if Index Merge on
(goodkey1, goodkey) would be faster.
The Index Merge optimizer collects a list of possible ways to access
rows with Index Merge. This list of SEL_IMERGE structures
represents the following condition:
(t_11 OR t_12 OR ... OR t_1k) AND (t_21 OR t_22 OR ... OR t_2l) AND ... AND (t_M1 OR t_M2 OR ... OR t_mp) |
where t_ij is one SEL_TREE and one line is for one
SEL_IMERGE object.
The SEL_IMERGE object with minimal cost is used for row
retrieval.
In `sql/opt_range.cc', see imerge_list_and_list(),
imerge_list_or_list(), and
SEL_IMERGE class member functions for more details of
Index Merge construction.
See the get_index_merge_params function in the same file for
Index Merge cost calculation algorithm.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Index Merge works in two steps:
Preparation step:
activate 'index only';
foreach key_i in (key_scans \ clustered_pk_scan)
{
while (retrieve next (key, rowid) pair from key_i)
{
if (no clustered PK scan ||
row doesn't match clustered PK scan condition)
put rowid into Unique;
}
}
deactivate 'index only';
|
Row retrieval step:
for each rowid in Unique
{
retrieve row and pass it to output;
}
if (clustered_pk_scan)
{
while (retrieve next row for clustered_pk_scan)
pass row to output;
}
|
See: `sql/opt_range.cc', QUICK_INDEX_MERGE_SELECT class members
for Index Merge row retrieval code.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
MySQL uses many different algorithms and structures. This chapter tries to describe some of them.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
filesort) In those cases where MySQL must sort the result, it uses the following
filesort algorithm before MySQL 4.1:
WHERE clause are skipped.
sort_buffer_size
system variable.
MERGEBUFF (7) regions to one block in
another temporary file. Repeat until all blocks from the first file
are in the second file.
MERGEBUFF2 (15)
blocks left.
read_rnd_buffer_size system variable.
The code for this step is in the `sql/records.cc' source file.
One problem with this approach is that it reads rows twice: One time when
evaluating the WHERE clause, and again after sorting the pair values.
And even if the rows were accessed successively the first time (for example,
if a table scan is done), the second time they are accessed randomly. (The
sort keys are ordered, but the row positions are not.)
In MySQL 4.1 and up, a filesort optimization is used that records not
only the sort key value and row position, but also the columns required for
the query. This avoids reading the rows twice. The modified filesort
algorithm works like this:
WHERE clause, as before.
Using the modified filesort algorithm, the tuples are longer than the
pairs used in the original method, and fewer of them fit in the sort buffer
(the size of which is given by sort_buffer_size). As a result, it is
possible for the extra I/O to make the modified approach slower, not faster.
To avoid a slowdown, the optimization is used only if the total size of the
extra columns in the sort tuple does not exceed the value of the
max_length_for_sort_data system variable. (A symptom of setting the
value of this variable too high is that you will see high disk activity and
low CPU activity.)
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The logic behind bulk insert optimization is simple.
Instead of writing each key value to B-tree (that is, to the key cache, although the bulk insert code doesn't know about the key cache), we store keys in a balanced binary (red-black) tree, in memory. When this tree reaches its memory limit, we write all keys to disk (to key cache, that is). But since the key stream coming from the binary tree is already sorted, inserting goes much faster, all the necessary pages are already in cache, disk access is minimized, and so forth.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
MySQL has the following caches. (Note that the some of the filenames contain an incorrect spelling of the word "cache.")
A shared cache for all B-tree index blocks in the different NISAM files. Uses hashing and reverse linked lists for quick caching of the most recently used blocks and quick flushing of changed entries for a specific table. (`mysys/mf_keycash.c')
This is used for quick scanning of all records in a table. (`mysys/mf_iocash.c' and `isam/_cash.c')
This holds the most recently used tables. (`sql/sql_base.cc')
For quick lookup (with reverse name resolving). This is a must when you have a slow DNS. (`sql/hostname.cc')
To allow quick change between databases, the last used privileges are cached for each user/database combination. (`sql/sql_acl.cc')
Many uses of GROUP BY or DISTINCT cache all found rows in
a HEAP table. (This is a very quick in-memory table with hash index.)
For every "full join" in a SELECT statement the rows found are
cached in a join cache. (A "full join" here means there were no keys that
could be used to find rows for the next table in the list.) In the worst
case, one SELECT query can use many join caches.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Basic information about the join buffer cache:
join_buffer_size system variable.
ALL or
index (in other words, when no possible keys can be used).
ALL or index.
ALL/index
are stored in the cache and are used to compare against each read
row in the ALL table.
Assume you have the following join:
Table name Type
t1 range
t2 ref
t3 |
The join is then done as follows:
- While rows in t1 matching range
- Read through all rows in t2 according to reference key
- Store used fields from t1, t2 in cache
- If cache is full
- Read through all rows in t3
- Compare t3 row against all t1, t2 combinations in cache
- If row satisfies join condition, send it to client
- Empty cache
- Read through all rows in t3
- Compare t3 row against all stored t1, t2 combinations in cache
- If row satisfies join condition, send it to client
|
The preceding description means that the number of times table t3 is
scanned is determined as follows:
S = size-of-stored-row(t1,t2) C = accepted-row-combinations(t1,t2) scans = (S * C)/join_buffer_size + 1 |
Some conclusions:
join_buffer_size, the fewer the scans of
t3. If join_buffer_size is already large enough to hold all
previous row combinations, there is no speed to be gained by making it
larger.
ALL or index, then we
allocate one buffer of size join_buffer_size for each of them and use
the same algorithm described above to handle it. (In other words, we store
the same row combination several times into different buffers.)
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
FLUSH TABLES FLUSH TABLES is handled in `sql/sql_base.cc::close_cached_tables()'.
FLUSH TABLES is to force all tables to be closed. This is
mainly to ensure that if someone adds a new table outside of MySQL (for
example, by copying files into a database directory with cp), all
threads will start using the new table. This will also ensure that all table
changes are flushed to disk (but of course not as optimally as simply
calling a sync for all tables)!
FLUSH TABLES, the variable refresh_version
is incremented. Every time a thread releases a table, it checks if
the refresh version of the table (updated at open) is the same as
the current refresh_version. If not, it will close it and broadcast
a signal on COND_refresh (to await any thread that is waiting for
all instances of a table to be closed).
refresh_version is also compared to the open
refresh_version after a thread gets a lock on a table. If the
refresh version is different, the thread will free all locks, reopen the
table and try to get the locks again. This is just to quickly get all
tables to use the newest version. This is handled by
`sql/lock.cc::mysql_lock_tables()' and
`sql/sql_base.cc::wait_for_tables()'.
FLUSH TABLES returns an okay
to the client.
FLUSH TABLES has a lock on some tables,
it will first close the locked tables, then wait until all other threads
have also closed them, and then reopen them and get the locks.
After this it will give other threads a chance to open the same tables.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Hopefully, sometime there will be complete description of full-text search algorithms. For now, it's just unsorted notes.
Weighting in boolean mode
The basic idea is as follows: In an expression of the form
A or B or (C and D and E), either A or B alone
is enough to match the whole expression, whereas C,
D, and E should all match. So it's
reasonable to assign weight 1 to each of A, B, and
(C and D and E). Furthermore, C, D, and E
each should get a weight of 1/3.
Things become more complicated when considering boolean
operators, as used in MySQL full-text boolean searching. Obviously, +A +B
should be treated as A and B, and A B -
as A or B. The problem is that +A B can not
be rewritten in and/or terms (that's the reason why this--extended--set
of operators was chosen). Still, aproximations can be used.
+A B C can be approximated as A or (A and (B or C))
or as A or (A and B) or (A and C) or (A and B and C).
Applying the above logic (and omitting mathematical
transformations and normalization) one gets that for
+A_1 +A_2 ... +A_N B_1 B_2 ... B_M the weights
should be: A_i = 1/N, B_j=1 if N==0, and,
otherwise, in the first rewriting approach B_j = 1/3,
and in the second one - B_j = (1+(M-1)*2^M)/(M*(2^(M+1)-1)).
The second expression gives a somewhat steeper increase in total weight as
number of matched B_j values increases, because it assigns higher
weights to individual B_j values. Also, the first expression is much
simpler, so it is the first one that is implemented in MySQL.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
FLOAT and DOUBLE data types and their representation. A quotation from the MySQL Reference Manual regarding numeric types:
"12.2 Numeric Types ... For floating-point column types, MySQL uses four bytes for single-precision values and eight bytes for double-precision values.
The FLOAT type is used to represent approximate numeric data
types. The SQL standard allows an optional specification of the precision
(but not the range of the exponent) in bits following the keyword
FLOAT in parentheses. The MySQL implementation also supports this
optional precision specification, but the precision value is used only
to determine storage size. A precision from 0 to 23 results in four-byte
single-precision FLOAT column. A precision from 24 to 53 results
in eight-byte double-precision DOUBLE column.
When the keyword FLOAT is used for a column type without a precision
specification, MySQL uses four bytes to store the values. MySQL also
supports variant syntax with two numbers given in parentheses following
the FLOAT keyword. The first number represents the display width
and the second number specifies the number of digits to be stored and
displayed following the decimal point (as with DECIMAL and NUMERIC). When
MySQL is asked to store a number for such a column with more decimal digits
following the decimal point than specified for the column, the value is
rounded to eliminate the extra digits when the value is stored.
In standard SQL, the REAL and DOUBLE PRECISION types do not
accept precision specifications. MySQL supports a variant syntax with
two numbers given in parentheses following the type name. The first
number represents the display width and the second number specifies
the number of digits to be stored and displayed following the decimal
point. As an extension to the SQL standard, MySQL recognizes DOUBLE
as a synonym for the DOUBLE PRECISION type. In contrast with the
standard's requirement that the precision for REAL be smaller than
that used for DOUBLE PRECISION, MySQL implements both as eight-byte
double-precision floating-point values (unless the server SQL mode includes
the REAL_AS_FLOAT option).
For maximum portability, code requiring storage of approximate numeric
data values should use FLOAT or DOUBLE PRECISION with no
specification of precision or number of decimal points."
The following discussion concentrates on the case where no display width
and decimals are given. This means that FLOAT is stored as whatever
the C type float is and REAL or DOUBLE [PRECISION]
is stored as whatever the C type double is. The field length is
selected by the MySQL code.
This document was created when Bug #4457 (Different results in SQL-Statements for the same record) was fixed at the end of August 2004. Until then there was some confusion in the double-to-string conversion at different places in the code.
The bugfix for Bug #4937 (INSERT + SELECT + UNION ALL + DATE to
VARCHAR(8) conversion problem) produced a conversion function which was a
promising approach to the conversion problems. Unfortunately it was only
used for direct field conversions and not for function results etc. It
did not take small numbers (absolute value less than 1) and negative
numbers into account. It did not take the limited precision of float and
double data types into account. The bugfix was developed in two steps:
The first attempt looked like this (in principle):
length= sprintf(buff, "%.*g", field_length, nr); if (length > field_length) length= sprintf(buff, "%.*g", field_length-5, nr); |
If the libc conversion brings too much characters, the precision is
reduced by the space required for the scientific notation (1.234e+05). Thus
the printf() conversion is forced to switch to the scientific
notation, since the value would not fit otherwise. Or, if it was scientific
already, the precision was reduced and also uses less space. I left out some
important stuff around limit checking just to show the idea. This simple
algorithm should work quite well in most cases, but has been discarded
for the sake of performance. The double call to the slow printf()
conversion %g didn't seem reasonable, though it would only be
used for extreme values and small fields. During my explorations of the
code I didn't find places where float or double were to
be converted into small fields. Remeber that I talk only of conversions
where field length and precision are not given. In this case a sufficient
field length is selected at several places, except of a bug where it was
selected wrongly. If a field length is given, a different conversion is
used anyway. But since the code is quite complex, I don't claim to have
the full insigth and as such may be in error. Hence, let us look further:
The second attempt in the bugfix was like this:
bool use_scientific_notation=TRUE;
if (field_length < 32 && nr > 1)
{
double e[]={1, 1e1, 1e2, 1e4, 1e8, 1e16 }, p=1;
for (int i=sizeof(e), j=1<<i-- ; j; i--, j>>=1 )
{
if (field_length & j)
p*=e[i];
}
use_scientific_notation=(p < nr);
}
length= sprintf(buff, "%.*g", use_scientific_notation ?
field_length-5 : field_length, nr);
|
Here we evaluate if the string representation of a given number fits into field_length characters. If not, we reduce the precision to make it fit. Again, I left out important details. For example, the evaluation is done only once per field for the sake of performance. The downside here is the unconditional reduction of precision for field length > 31 (which doesn't really matter), for negative numbers and for small numbers (absolute value less than 1).
Both algorithms do not take the limited precision of float
and double values into account. This could lead to conversions
with ridiculous bogus precision output. For example a value of 0.7
converted with %.30g will give a lot of digits, which pretend
to tell about deviations from the value 0.7 and are completely absurd:
0.699999988079071044921875. To understand more about the %g
conversion, I quote from a comment introduced in the source at the beginning
of bugfixing #4937 (this has been removed later, since it mainly describes,
how the printf() conversion works. I think it's valuable enough to
include it here):
/*
Let's try to pretty print a floating point number. Here we use
'%-*.*g' conversion string:
'-' stands for right-padding with spaces, if such padding will take
place
'*' is a placeholder for the first argument, field_length, and
signifies minimal width of result string. If result is less than
field length it will be space-padded. Note, however, that we'll not
pass spaces to Field_string::store(const char *, ...), due to
strcend in the next line.
'.*' is a placeholder for DBL_DIG and defines maximum number of
significant digits in the result string. DBL_DIG is a hardware
specific C define for maximum number of decimal digits of a floating
point number, such that rounding to hardware floating point
representation and back to decimal will not lead to loss of
precision. I.e if DBL_DIG is 15, number 123456789111315 can be
represented as double without precision loss. As one can judge from
this description, chosing DBL_DIG here is questionable, especially
because it introduces a system dependency.
'g' means that conversion will use [-]ddd.ddd (conventional) style,
and fall back to [-]d.ddde[+|i]ddd (scientific) style if there is no
enough space for all digits.
Maximum length of result string (not counting spaces) is (I guess)
DBL_DIG + 8, where 8 is 1 for sign, 1 for decimal point, 1 for
exponent sign, 1 for exponent, and 4 for exponent value.
XXX: why do we use space-padding and trim spaces in the next line?
*/
sprintf(to,"%-*.*g",(int) field_length,DBL_DIG,nr);
to=strcend(to,' ');
|
There is one small misapprehension in the comment. %g does not
switch to scientific notation when there is 'not enough space for all
digits'. As the commentator says, the field length gives the minimal
output length. printf() happily outputs more characters if required
to produce a result with 'precision' digits. In fact it switches to
scientific when the value can no longer be represented by 'precision'
digits in conventional notation. The man page says "Style e is used if the
exponent from its conversion is less than -4 or greater than or equal to
the precision." In explanation, a precision of 3 digits can print a value
of 345 in conventional notation, but 3456 needs scientific notation, as it
would require 4 digits (a precision of 4) in conventional notation. Thus,
it is printed as 3.46e+03 (rounded).
Since we don't want spaces in the output, we should not give a field
length, but always use "%.*g". However, the precision matters,
as seen above. It is worth its own paragraph.
Since MySQL uses the machine-dependent binary representation of
float and double to store values in the database, we have
to care about these. Today, most systems use the IEEE standard 754 for
binary floating-point arithmetic. It describes a representation for
single precision numbers as 1 bit for sign, 8 bits for biased exponent
and 23 bits for fraction and for double precision numbers as 1-bit sign,
11-bit biased exponent and 52-bit fraction. However, we can not rely in
the fact that every system uses this representation. Luckily, the ISO
C standard requires the standard C library to have a header `float.h'
that describes some details of the floating point representation on a
machine. The comment above describes the value DBL_DIG. There is
an equivalent value FLT_DIG for the C data type float.
So, whenever we print a floating-point value, we must not specify a
precision above DBL_DIG or FLT_DIG respectively. Otherwise
we produce a bogus precision, which is wrong. For the honor of the writer
of the first attempt above, I must say that his complete algorithm took
DBL_DIG into account, if however only for the second call to
sprintf(). But FLT_DIG has never been accounted for. At the
place of the conversions it was not even known, if the value has come from
a float or double field.
My attempt to solve the problems tries to take all this into account. I tried
to concentrate all float/double-to-string conversions in one
function, and to bring the knowledge about float versus double
to this function wherever it is called. This solution managed to keep
the test suite happy while solving the new problem of Bug #4457. Luckily,
the first was not a big problem, as the test cases have been very carefully
selected, so that they succeed as long as the machine uses IEEE 754 ;).
Nevertheless, the function is still not perfect. It is not possible to guess how many sigificant digits a number has, and as such it is not simple to tell how long the resulting string would be. This applies to numbers with an absolute value smaller then 1. There are probably ways to figure this out, but I doubt that we would win in terms of performance over the simple solution of the first attempt, let alone the chance for new bugs. The compromise taken here is to accept that the resulting string may exceed the destined field length by five characters in the worst case.
if (nr < 0.0)
{
abs_nr= -nr;
extra_space= 1;
}
else
{
abs_nr= nr;
extra_space= 0;
}
precision= is_float ? FLT_DIG : DBL_DIG;
if (precision > field_length)
precision= field_length;
if (! initialized)
{
/* Better switch to scientific too early than too late. */
double mult;
mult= 1e0;
for (length= 0; length < DBL_DIG; length++)
mult/= 1e1;
mult= 1e1 - mult;
double val;
val= 1.0;
for (int idx= 0; idx < DBL_DIG+1; idx++)
{
DBUG_PRINT("info",("double_to_string_conv: big[%d] %.*g",
idx, DBL_DIG+3, val));
big_number[idx]= val;
val*= mult;
}
small_number[0]= 1e0;
small_number[1]= 1e0;
small_number[2]= 1e0;
small_number[3]= 1e-1;
small_number[4]= 1e-2;
small_number[5]= 1e-3;
small_number[6]= 1e-4;
/* %g switches to scientific when exponent < -4. */
for (int idx= 7; idx < DBL_DIG+1; idx++)
small_number[idx]= 1e-4;
initialized= TRUE;
}
use_scientific_notation= (abs_nr != 0.0) &&
((abs_nr > big_number[precision]) ||
(abs_nr < small_number[precision]));
if (use_scientific_notation)
{
if (((nr >= 0.0) && ((nr >= 1e+100) || (nr <= 1e-100))) ||
((nr < 0.0) && ((nr <= -1e+100) || (nr >= -1e-100))))
extra_space+= 6; /* .e+100 or .e-100 */
else
extra_space+= 5; /* .e+99 or .e-99 */
}
if (field_length < extra_space)
precision= 0;
else if (precision > (field_length - extra_space))
precision= field_length - extra_space;
length= sprintf(buff, "%.*g", precision, nr);
|
This solution takes performance into account by initializing the limiting
numbers arrays only once into static space. It copes with negative
numbers and tries to decide even over small numbers. The latter has only
small implications, as the prefix 0.000 is exactly the same size as the
postfix e-100. But to know if scientific notation will be selected by
sprintf() allows for saving one digit when the exponent is larger
than -100.
The calculations for the big number array are less precise than in the second
attempt, but faster. The precision is sufficient for the guess whether
sprintf() uses scientific notation. There may be number to field length
combinations which exploit the gap, but these won't emerge anyway as I found
no situation where this function is called with small field lengths. Remember
again that it is not called with user supplied field lengths.
However in the current stable releases (including gamma) we have some
places where the field length is too small by one character. Thus, the
precision is sometimes by one digit smaller than DBL_DIG would
allow for. Consequently, we cannot use the simple algorithm in the stable
releases. There is a chance for doing it in a development release, though.
Addendum:
There turned out to be a new solution to the "big number array" problem. We
have a statically initialized array log_10, which holds the necessary
values. But I did not check, if these values are safe. Even if computed by
the compiler, they could carry values slightly above the decimal powers,
which would be bad. In this case we needed to initialize by 9.99999999e+xxx,
where the number of nines is equal to DBL_DIG. This must be protected
by #if DBL_DIG == yy, so that a new DBL_DIG on a new platform
is detected. And the array is of limited length. We must at least protect
it by a DBUG_ASSERT(sizeof(log_10)/sizeof(log_10[0]) > DBL_DIG).
But all of this is probably completely unneccessary, since we are only
speaking of ceses where no user-supplied field length is given. So MySQL
selects the field length on its own. So it is totally possible, yes highly
desired that MySQL selects a field length, which allows for a maximum
of precision for all possible values. And these are DBL_DIG+7 or
FLT_DIG+6 respectively as far as IEEE 754 is used. In this case
we can have values of about +/-1e-307 to +/-1e+308 for double and
+/-1e-37 to +/-1e+38 for float. That is, for example -1.<DBL_DIG-1
digits>e+100. For cases where a precision above IEEE 754 is possible, we
may need +8 instead. We can detect this with #if DBL_MAX_10_EXP >=
1000. So using a field length of DBL_DIG+8 in all cases should
be sufficient for a simple sprintf(buff, "%.*g", DBL_DIG, nr)
or sprintf(buff, "%.*g", FLT_DIG, nr), respectively. To be safe,
we should not use the machine dependend constants everywhere, but instead
concentrate them into definitions like these:
#if (DBL_MAX_10_EXP > 9999) || (DBL_MIN_10_EXP < -9999) # error "Need new definition for UNSPECIFIED_DOUBLE_FIELD_LENGTH" #elif (DBL_MAX_10_EXP > 999) || (DBL_MIN_10_EXP < -999) # define UNSPECIFIED_DOUBLE_FIELD_LENGTH (DBL_DIG+8) #else # define UNSPECIFIED_DOUBLE_FIELD_LENGTH (DBL_DIG+7) #endif #if (FLT_MAX_10_EXP > 999) || (FLT_MIN_10_EXP < -999) #error "Need new definition for UNSPECIFIED_FLOAT_FIELD_LENGTH" #elif (FLT_MAX_10_EXP > 99) || (FLT_MIN_10_EXP < -99) # define UNSPECIFIED_FLOAT_FIELD_LENGTH (FLT_DIG+7) #else # define UNSPECIFIED_FLOAT_FIELD_LENGTH (FLT_DIG+6) #endif |
These definitions should be used wherever an item or field of type
float or double without an explicit field length specification
is encountered. We have to propagate these lengths though all deviated
items and fields and to select the maximum of all field lengths wherever
in an expression or function two or more of them are used.
We need to treat the precision (DBL_DIG/FLT_DIG) similarly,
but have to select the smallest in expressions or functions.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
mysys Library Functions in mysys: (For flags see `my_sys.h')
int my_copy _A((const char *from, const char *to, myf MyFlags));Copy file from from to to.
int my_rename _A((const char *from, const char *to, myf MyFlags));Rename file from from to to.
int my_delete _A((const char *name, myf MyFlags));Delete file name.
int my_redel _A((const char *from, const char *to, int MyFlags));Delete from before rename of to to from. Copies state
from old file to new file. If MY_COPY_TIME is set, sets old time.
int my_getwd _A((string buf, uint size, myf MyFlags));int my_setwd _A((const char *dir, myf MyFlags));Get and set working directory.
string my_tempnam _A((const char *dir, const char *pfx, myf MyFlags));Make a unique temporary file name by using dir and adding something after
pfx to make the name unique. The file name is made by adding a unique
six character string and TMP_EXT after pfx.
Returns pointer to malloc()'ed area for filename. Should be freed by
free().
File my_open _A((const char *FileName,int Flags,myf MyFlags));File my_create _A((const char *FileName, int CreateFlags, int AccsesFlags, myf MyFlags));int my_close _A((File Filedes, myf MyFlags));uint my_read _A((File Filedes, byte *Buffer, uint Count, myf MyFlags));uint my_write _A((File Filedes, const byte *Buffer, uint Count, myf MyFlags));ulong my_seek _A((File fd,ulong pos,int whence,myf MyFlags));ulong my_tell _A((File fd,myf MyFlags));Use instead of open, open-with-create-flag, close, read, and write
to get automatic error messages (flag MYF_WME) and only have
to test for != 0 if error (flag MY_NABP).
FILE *my_fopen _A((const char *FileName,int Flags,myf MyFlags));FILE *my_fdopen _A((File Filedes,int Flags,myf MyFlags));int my_fclose _A((FILE *fd,myf MyFlags));uint my_fread _A((FILE *stream,byte *Buffer,uint Count,myf MyFlags));uint my_fwrite _A((FILE *stream,const byte *Buffer,uint Count, myf MyFlags));ulong my_fseek _A((FILE *stream,ulong pos,int whence,myf MyFlags));ulong my_ftell _A((FILE *stream,myf MyFlags));Same read-interface for streams as for files.
gptr _mymalloc _A((uint uSize,const char *sFile,uint uLine, myf MyFlag));gptr _myrealloc _A((string pPtr,uint uSize,const char *sFile,uint uLine, myf MyFlag));void _myfree _A((gptr pPtr,const char *sFile,uint uLine));int _sanity _A((const char *sFile,unsigned int uLine));gptr _myget_copy_of_memory _A((const byte *from,uint length,const char *sFile, uint uLine,myf MyFlag));malloc(size,myflag) is mapped to these functions if not compiled
with -DSAFEMALLOC.
void TERMINATE _A((void));Writes malloc() info on stdout if compiled with
-DSAFEMALLOC.
int my_chsize _A((File fd, ulong newlength, myf MyFlags));Change size of file fd to newlength.
void my_error _D((int nr, myf MyFlags, ...));Writes message using error number (see `mysys/errors.h') on stdout,
or using curses, if MYSYS_PROGRAM_USES_CURSES() has been called.
void my_message _A((const char *str, myf MyFlags));Writes str on stdout, or using curses, if
MYSYS_PROGRAM_USES_CURSES() has been called.
void my_init _A((void ));Start each program (in main()) with this.
void my_end _A((int infoflag));Gives info about program.
If infoflag & MY_CHECK_ERROR, prints if some files are left open.
If infoflag & MY_GIVE_INFO, prints timing info and malloc() info
about program.
int my_copystat _A((const char *from, const char *to, int MyFlags));Copy state from old file to new file. If MY_COPY_TIME is set,
sets old time.
string my_filename _A((File fd));Returns filename of open file.
int dirname _A((string to, const char *name));Copy name of directory from filename.
int test_if_hard_path _A((const char *dir_name));Test if dir_name is a hard path (starts from root).
void convert_dirname _A((string name));Convert dirname according to system. On Windows, changes all characters to capitals and changes `/' to `\'.
string fn_ext _A((const char *name));Returns pointer to extension in filename.
string fn_format _A((string to,const char *name,const char *dsk,const char *form,int flag));Format a filename with replacement of library and extension and
convert between different systems.
The to and name parameters may be identical.
Function doesn't change name if name != to.
flag may be:
1 | Force replace filnames library with 'dsk' |
2 | Force replace extension with 'form' */ |
4 | Force unpack filename (replace |
8 | Pack filename as short as possible for output to user |
All open requests should always use at least
open(fn_format(temp_buffer, name, "", "", 4), ...) to unpack home and
convert filename to system-form.
string fn_same _A((string toname, const char *name, int flag));Copies directory and extension from name to toname if needed.
Copying can be forced by same flags used in fn_format().
int wild_compare _A((const char *str, const char *wildstr));Compare if str matches wildstr. wildstr can contain
`*' and `?' as wildcard characters.
Returns 0 if str and wildstr match.
void get_date _A((string to, int timeflag));Get current date in a form ready for printing.
void soundex _A((string out_pntr, string in_pntr))Makes in_pntr to a 5 char long string. All words that sound
alike have the same string.
int init_key_cache _A((ulong use_mem, ulong leave_this_much_mem));Use caching of keys in MISAM, PISAM, and ISAM.
KEY_CACHE_SIZE is a good size.
Remember to lock databases for optimal caching.
void end_key_cache _A((void));End key caching.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
4.1 CHARSET_INFO Structure |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
CHARSET_INFO Structure | [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Every select is performed in these base steps:
JOIN::prepare
JOIN structure to st_select_lex.
fix_fields() for all items (after fix_fields(), we know
everything about item).
HAVING to WHERE if possible.
JOIN::optimize
JOIN::exec
JOIN::cleanup
JOIN::reinit
SELECT (with
JOIN::exec).
| [ < ] | [ > ] | [ |