Join (SQL)

A join clause in the Structured Query Language (SQL) combines columns from one or more tables into a new table. The operation corresponds to a join operation in relational algebra. Informally, a join stitches two tables and puts on the same row records with matching fields : INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER and CROSS.

Example tables

To explain join types, the rest of this article uses the following tables:

Employee table
LastName	DepartmentID
Rafferty	31
Jones	33
Heisenberg	33
Robinson	34
Smith	34
Williams	`NULL`

Department table
DepartmentID	DepartmentName
31	Sales
33	Engineering
34	Clerical
35	Marketing

Department.DepartmentID is the primary key of the Department table, whereas Employee.DepartmentID is a foreign key.

Note that in Employee, "Williams" has not yet been assigned to a department. Also, no employees have been assigned to the "Marketing" department.

These are the SQL statements to create the above tables:

CREATE TABLE department(
    DepartmentID INT PRIMARY KEY NOT NULL,
    DepartmentName VARCHAR(20)
);

CREATE TABLE employee (
    LastName VARCHAR(20),
    DepartmentID INT REFERENCES department(DepartmentID)
);

INSERT INTO department
VALUES (31, 'Sales'),
       (33, 'Engineering'),
       (34, 'Clerical'),
       (35, 'Marketing');

INSERT INTO employee
VALUES ('Rafferty', 31),
       ('Jones', 33),
       ('Heisenberg', 33),
       ('Robinson', 34),
       ('Smith', 34),
       ('Williams', NULL);

Cross join

CROSS JOIN returns the Cartesian product of rows from tables in the join. In other words, it will produce rows which combine each row from the first table with each row from the second table.^[1]

Employee.LastName	Employee.DepartmentID	Department.DepartmentName	Department.DepartmentID
Rafferty	31	Sales	31
Jones	33	Sales	31
Heisenberg	33	Sales	31
Smith	34	Sales	31
Robinson	34	Sales	31
Williams	`NULL`	Sales	31
Rafferty	31	Engineering	33
Jones	33	Engineering	33
Heisenberg	33	Engineering	33
Smith	34	Engineering	33
Robinson	34	Engineering	33
Williams	`NULL`	Engineering	33
Rafferty	31	Clerical	34
Jones	33	Clerical	34
Heisenberg	33	Clerical	34
Smith	34	Clerical	34
Robinson	34	Clerical	34
Williams	`NULL`	Clerical	34
Rafferty	31	Marketing	35
Jones	33	Marketing	35
Heisenberg	33	Marketing	35
Smith	34	Marketing	35
Robinson	34	Marketing	35
Williams	`NULL`	Marketing	35

Example of an explicit cross join:

SELECT *
FROM employee CROSS JOIN department;

Example of an implicit cross join:

SELECT *
FROM employee, department;

The cross join can be replaced with an inner join with an always-true condition:

SELECT *
FROM employee INNER JOIN department ON 1=1;

CROSS JOIN does not itself apply any predicate to filter rows from the joined table. The results of a CROSS JOIN can be filtered using a WHERE clause, which may then produce the equivalent of an inner join.

In the SQL:2011 standard, cross joins are part of the optional F401, "Extended joined table", package.

Normal uses are for checking the server's performance.^[why?]

Inner join

An inner join (or join) requires each row in the two joined tables to have matching column values, and is a commonly used join operation in applications but should not be assumed to be the best choice in all situations. Inner join creates a new result table by combining column values of two tables (A and B) based upon the join-predicate. The query compares each row of A with each row of B to find all pairs of rows that satisfy the join-predicate. When the join-predicate is satisfied by matching non-NULL values, column values for each matched pair of rows of A and B are combined into a result row.

The result of the join can be defined as the outcome of first taking the cartesian product (or cross join) of all rows in the tables (combining every row in table A with every row in table B) and then returning all rows that satisfy the join predicate. Actual SQL implementations normally use other approaches, such as hash joins or sort-merge joins, since computing the Cartesian product is slower and would often require a prohibitively large amount of memory to store.

SQL specifies two different syntactical ways to express joins: the "explicit join notation" and the "implicit join notation". The "implicit join notation" is no longer considered a best practice^{[by whom?]}, although database systems still support it.

The "explicit join notation" uses the JOIN keyword, optionally preceded by the INNER keyword, to specify the table to join, and the ON keyword to specify the predicates for the join, as in the following example:

SELECT employee.LastName, employee.DepartmentID, department.DepartmentName 
FROM employee 
INNER JOIN department ON
employee.DepartmentID = department.DepartmentID;

Employee.LastName	Employee.DepartmentID	Department.DepartmentName
Robinson	34	Clerical
Jones	33	Engineering
Smith	34	Clerical
Heisenberg	33	Engineering
Rafferty	31	Sales

The "implicit join notation" simply lists the tables for joining, in the FROM clause of the SELECT statement, using commas to separate them. Thus it specifies a cross join, and the WHERE clause may apply additional filter-predicates (which function comparably to the join-predicates in the explicit notation).

The following example is equivalent to the previous one, but this time using implicit join notation:

SELECT employee.LastName, employee.DepartmentID, department.DepartmentName 
FROM employee, department
WHERE employee.DepartmentID = department.DepartmentID;

The queries given in the examples above will join the Employee and department tables using the DepartmentID column of both tables. Where the DepartmentID of these tables match (i.e. the join-predicate is satisfied), the query will combine the LastName, DepartmentID and DepartmentName columns from the two tables into a result row. Where the DepartmentID does not match, no result row is generated.

Thus the result of the execution of the query above will be:

Employee.LastName	Employee.DepartmentID	Department.DepartmentName
Robinson	34	Clerical
Jones	33	Engineering
Smith	34	Clerical
Heisenberg	33	Engineering
Rafferty	31	Sales

The employee "Williams" and the department "Marketing" do not appear in the query execution results. Neither of these has any matching rows in the other respective table: "Williams" has no associated department, and no employee has the department ID 35 ("Marketing"). Depending on the desired results, this behavior may be a subtle bug, which can be avoided by replacing the inner join with an outer join.

Inner join and NULL values

Programmers should take special care when joining tables on columns that can contain NULL values, since NULL will never match any other value (not even NULL itself), unless the join condition explicitly uses a combination predicate that first checks that the joins columns are NOT NULL before applying the remaining predicate condition(s). The Inner Join can only be safely used in a database that enforces referential integrity or where the join columns are guaranteed not to be NULL. Many transaction processing relational databases rely on atomicity, consistency, isolation, durability (ACID) data update standards to ensure data integrity, making inner joins an appropriate choice. However, transaction databases usually also have desirable join columns that are allowed to be NULL. Many reporting relational database and data warehouses use high volume extract, transform, load (ETL) batch updates which make referential integrity difficult or impossible to enforce, resulting in potentially NULL join columns that an SQL query author cannot modify and which cause inner joins to omit data with no indication of an error. The choice to use an inner join depends on the database design and data characteristics. A left outer join can usually be substituted for an inner join when the join columns in one table may contain NULL values.

Any data column that may be NULL (empty) should never be used as a link in an inner join, unless the intended result is to eliminate the rows with the NULL value. If NULL join columns are to be deliberately removed from the result set, an inner join can be faster than an outer join because the table join and filtering is done in a single step. Conversely, an inner join can result in disastrously slow performance or even a server crash when used in a large volume query in combination with database functions in an SQL Where clause.^[2]^[3]^[4] A function in an SQL Where clause can result in the database ignoring relatively compact table indexes. The database may read and inner join the selected columns from both tables before reducing the number of rows using the filter that depends on a calculated value, resulting in a relatively enormous amount of inefficient processing.

When a result set is produced by joining several tables, including master tables used to look up full-text descriptions of numeric identifier codes (a Lookup table), a NULL value in any one of the foreign keys can result in the entire row being eliminated from the result set, with no indication of error. A complex SQL query that includes one or more inner joins and several outer joins has the same risk for NULL values in the inner join link columns.

A commitment to SQL code containing inner joins assumes NULL join columns will not be introduced by future changes, including vendor updates, design changes and bulk processing outside of the application's data validation rules such as data conversions, migrations, bulk imports and merges.

One can further classify inner joins as equi-joins, as natural joins, or as cross-joins.

Equi-join

An equi-join is a specific type of comparator-based join, that uses only equality comparisons in the join-predicate. Using other comparison operators (such as <) disqualifies a join as an equi-join. The query shown above has already provided an example of an equi-join:

SELECT *
FROM employee JOIN department
  ON employee.DepartmentID = department.DepartmentID;

We can write equi-join as below,

SELECT *
FROM employee, department
WHERE employee.DepartmentID = department.DepartmentID;

If columns in an equi-join have the same name, SQL-92 provides an optional shorthand notation for expressing equi-joins, by way of the USING construct:^[5]

SELECT *
FROM employee INNER JOIN department USING (DepartmentID);

The USING construct is more than mere syntactic sugar, however, since the result set differs from the result set of the version with the explicit predicate. Specifically, any columns mentioned in the USING list will appear only once, with an unqualified name, rather than once for each table in the join. In the case above, there will be a single DepartmentID column and no employee.DepartmentID or department.DepartmentID.

The USING clause is not supported by MS SQL Server and Sybase.

Natural join

The natural join is a special case of equi-join. Natural join (⋈) is a binary operator that is written as (R ⋈ S) where R and S are relations.^[6] The result of the natural join is the set of all combinations of tuples in R and S that are equal on their common attribute names. For an example consider the tables Employee and Dept and their natural join:

*Employee*
Name	EmpId	DeptName
Harry	3415	Finance
Sally	2241	Sales
George	3401	Finance
Harriet	2202	Sales

*Dept*
DeptName	Manager
Finance	George
Sales	Harriet
Production	Charles

*Employee* $\bowtie$ *Dept*
Name	EmpId	DeptName	Manager
Harry	3415	Finance	George
Sally	2241	Sales	Harriet
George	3401	Finance	George
Harriet	2202	Sales	Harriet

This can also be used to define composition of relations. For example, the composition of Employee and Dept is their join as shown above, projected on all but the common attribute DeptName. In category theory, the join is precisely the fiber product.

The natural join is arguably one of the most important operators since it is the relational counterpart of logical AND. Note that if the same variable appears in each of two predicates that are connected by AND, then that variable stands for the same thing and both appearances must always be substituted by the same value. In particular, the natural join allows the combination of relations that are associated by a foreign key. For example, in the above example a foreign key probably holds from Employee.DeptName to Dept.DeptName and then the natural join of Employee and Dept combines all employees with their departments. This works because the foreign key holds between attributes with the same name. If this is not the case such as in the foreign key from Dept.manager to Employee.Name then these columns have to be renamed before the natural join is taken. Such a join is sometimes also referred to as an equi-join.

More formally the semantics of the natural join are defined as follows:

R\bowtie S=\left\{t\cup s\mid t\in R\ \land \ s\in S\ \land \ {\mathit {Fun}}(t\cup s)\right\}

,

where Fun is a predicate that is true for a relation r if and only if r is a function. It is usually required that R and S must have at least one common attribute, but if this constraint is omitted, and R and S have no common attributes, then the natural join becomes exactly the Cartesian product.

The natural join can be simulated with Codd's primitives as follows. Let c₁, ..., c_m be the attribute names common to R and S, r₁, ..., r_n be the attribute names unique to R and let s₁, ..., s_k be the attributes unique to S. Furthermore, assume that the attribute names x₁, ..., x_m are neither in R nor in S. In a first step the common attribute names in S can now be renamed:

T=\rho _{x_{1}/c_{1},\ldots ,x_{m}/c_{m}}(S)=\rho _{x_{1}/c_{1}}(\rho _{x_{2}/c_{2}}(\ldots \rho _{x_{m}/c_{m}}(S)\ldots ))

Navigácia: Veda >

Analytika
Antropológia
Aplikované vedy
Bibliometria
Dejiny vedy
Encyklopédie
Filozofia vedy
Forenzné vedy
Humanitné vedy
Knižničná veda
Kryogenika
Kryptológia
Kulturológia
Literárna veda
Medzidisciplinárne oblasti
Metódy kvantitatívnej analýzy
Metavedy
Metodika

Metodológia vedy
Náboženstvo a veda
Náučná literatúra
Podvody vo vede
Popularizácia vedy
Potravinárstvo
Prírodné vedy
Pseudoveda
Scientometria
Spoločenské vedy
Teórie
Teatrológia
Technické vedy
Technika
Terminológia
Umenie
Výskum

Veda
Veda a technika podľa štátu
Veda a technika podľa kontinentu
Veda a technika podľa roka
Veda v kozme
Vedci
Vedecká literatúra
Vedecké databázy
Vedecké experimenty
Vedecké konferencie
Vedecké metódy
Vedecké ocenenia
Vedecké organizácie
Vedecké parky
Vedeckí spisovatelia
Vzdelávanie
Záhady

Príbuzné výrazy:

Text je dostupný za podmienok Creative Commons Attribution/Share-Alike License 3.0 Unported; prípadne za ďalších podmienok.
Podrobnejšie informácie nájdete na stránke Podmienky použitia.

[1]

[2]

[3]

[4]

[5]

[6]