Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

foreign_key_info is un-optimized for large sets of databases #326

Open
SineSwiper opened this issue Jan 21, 2021 · 2 comments
Open

foreign_key_info is un-optimized for large sets of databases #326

SineSwiper opened this issue Jan 21, 2021 · 2 comments

Comments

@SineSwiper
Copy link

Due to the way MySQL 5 supports information_schema, the SQL statement used for foreign_key_info requires a full scan of all databases. This can break servers in environments with a large amount of databases (literally die from an OOM), or take a long time to complete in the best case. Even with filters for key_column_usage.constraint_schema, the table_constraints scan isn't filtered properly and results in an EXPLAIN plan like this:

           id: 1
  select_type: SIMPLE
        table: A
         type: ALL
possible_keys: NULL
          key: TABLE_SCHEMA
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Using where; Open_full_table; Scanned 1 database
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: B
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Using where; Open_full_table; Scanned all databases; Using join buffer (Block Nested Loop)
2 rows in set (0.00 sec)

So, for table B, the optimizer Scanned all databases, which is bad. The table_constraints inclusion is unnecessary, anyway, since all foreign/unique/PK data is available in key_column_usage. Better to just take out the problematic join, and fix the optimization problem at the same time.

@SineSwiper
Copy link
Author

SineSwiper commented Jan 22, 2021

The code could include JOINs to fill in the NULL gaps in the SELECT clause, but those JOIN statements would need to apply the filters directly in order to get the EXPLAIN plan benefit. This is because MySQL 5's implementation of filtering is rather kludgy and doesn't take advantage of the optimizer. For example:

JOIN INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS R ON (
    R.CONSTRAINT_SCHEMA = $database AND
    R.CONSTRAINT_NAME = A.CONSTRAINT_NAME
)

Not R.CONSTRAINT_SCHEMA = A.CONSTRAINT_SCHEMA.

Anyway, I can provide a PR if you like.

@dveeden
Copy link
Collaborator

dveeden commented Nov 11, 2024

The code for this can be found here:

DBD-mysql/lib/DBD/mysql.pm

Lines 624 to 705 in 5a669fb

sub foreign_key_info {
my ($dbh,
$pk_catalog, $pk_schema, $pk_table,
$fk_catalog, $fk_schema, $fk_table,
) = @_;
return unless $dbh->func('_async_check');
# INFORMATION_SCHEMA.KEY_COLUMN_USAGE was added in 5.0.6
# no one is going to be running 5.0.6, taking out the check for $point > .6
my ($maj, $min, $point) = _version($dbh);
return if $maj < 5 ;
my $sql = <<'EOF';
SELECT NULL AS PKTABLE_CAT,
A.REFERENCED_TABLE_SCHEMA AS PKTABLE_SCHEM,
A.REFERENCED_TABLE_NAME AS PKTABLE_NAME,
A.REFERENCED_COLUMN_NAME AS PKCOLUMN_NAME,
A.TABLE_CATALOG AS FKTABLE_CAT,
A.TABLE_SCHEMA AS FKTABLE_SCHEM,
A.TABLE_NAME AS FKTABLE_NAME,
A.COLUMN_NAME AS FKCOLUMN_NAME,
A.ORDINAL_POSITION AS KEY_SEQ,
NULL AS UPDATE_RULE,
NULL AS DELETE_RULE,
A.CONSTRAINT_NAME AS FK_NAME,
NULL AS PK_NAME,
NULL AS DEFERABILITY,
NULL AS UNIQUE_OR_PRIMARY
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE A,
INFORMATION_SCHEMA.TABLE_CONSTRAINTS B
WHERE A.TABLE_SCHEMA = B.TABLE_SCHEMA AND A.TABLE_NAME = B.TABLE_NAME
AND A.CONSTRAINT_NAME = B.CONSTRAINT_NAME AND B.CONSTRAINT_TYPE IS NOT NULL
EOF
my @where;
my @bind;
# catalogs are not yet supported by MySQL
# if (defined $pk_catalog) {
# push @where, 'A.REFERENCED_TABLE_CATALOG = ?';
# push @bind, $pk_catalog;
# }
if (defined $pk_schema) {
push @where, 'A.REFERENCED_TABLE_SCHEMA = ?';
push @bind, $pk_schema;
}
if (defined $pk_table) {
push @where, 'A.REFERENCED_TABLE_NAME = ?';
push @bind, $pk_table;
}
# if (defined $fk_catalog) {
# push @where, 'A.TABLE_CATALOG = ?';
# push @bind, $fk_schema;
# }
if (defined $fk_schema) {
push @where, 'A.TABLE_SCHEMA = ?';
push @bind, $fk_schema;
}
if (defined $fk_table) {
push @where, 'A.TABLE_NAME = ?';
push @bind, $fk_table;
}
if (@where) {
$sql .= ' AND ';
$sql .= join ' AND ', @where;
}
$sql .= " ORDER BY A.TABLE_SCHEMA, A.TABLE_NAME, A.ORDINAL_POSITION";
local $dbh->{FetchHashKeyName} = 'NAME_uc';
my $sth = $dbh->prepare($sql);
$sth->execute(@bind);
return $sth;
}

@SineSwiper would you be interested in submitting a PR to improve this?

Also MySQL 8.0 switched to an InnoDB based data dictionary, which improves performance for at least some information_schema queries (see also https://dev.mysql.com/blog-archive/mysql-8-0-scaling-and-performance-of-information_schema/ ). This may or may not improve the performance of the queries mentioned here. Maybe someone could run some tests to benchmark the performance of the original query vs a modified one and with various MySQL major versions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants