SQL Basics Every Data Analyst Must Know
SQL (Structured Query Language) is one of the most essential skills for any data analyst. It allows you to retrieve, clean, analyze, and transform data stored in relational databases. Below are the core SQL concepts and commands every data analyst should understand.
1. Understanding Databases and Tables
A database is a collection of organized data.
A table stores data in rows and columns.
Rows = records
Columns = fields (attributes)
Example:
customers
--------------------------------
id | name | email | country
2. SELECT: Retrieving Data
The SELECT statement is the foundation of SQL.
SELECT * FROM customers;
Select specific columns:
SELECT name, country FROM customers;
3. WHERE: Filtering Data
Use WHERE to filter rows based on conditions.
SELECT * FROM customers
WHERE country = 'USA';
Common operators:
= equal to
!= or <> not equal to
> < >= <=
BETWEEN
IN
LIKE
Example:
SELECT * FROM customers
WHERE email LIKE '%gmail.com';
4. ORDER BY: Sorting Results
Sort data in ascending (ASC) or descending (DESC) order.
SELECT * FROM customers
ORDER BY name ASC;
SELECT * FROM customers
ORDER BY id DESC;
5. LIMIT: Controlling Output Size
Limit the number of rows returned.
SELECT * FROM customers
LIMIT 10;
Very useful for previews and large datasets.
6. Aggregate Functions
Used for summary statistics.
Common functions:
COUNT() – number of rows
SUM() – total value
AVG() – average
MIN() / MAX()
Examples:
SELECT COUNT(*) FROM customers;
SELECT AVG(sales) FROM orders;
7. GROUP BY: Data Aggregation
Group rows to perform calculations per category.
SELECT country, COUNT(*) AS total_customers
FROM customers
GROUP BY country;
Rule to remember:
Any column in SELECT that is not aggregated must be in GROUP BY.
8. HAVING: Filtering Aggregated Data
HAVING is used with GROUP BY.
SELECT country, COUNT(*) AS total_customers
FROM customers
GROUP BY country
HAVING COUNT(*) > 100;
WHERE filters rows before grouping
HAVING filters after grouping
9. JOINs: Combining Tables
Data analysts often work with multiple tables.
INNER JOIN
Returns matching rows from both tables.
SELECT o.order_id, c.name
FROM orders o
INNER JOIN customers c
ON o.customer_id = c.id;
LEFT JOIN
Returns all rows from the left table.
SELECT c.name, o.order_id
FROM customers c
LEFT JOIN orders o
ON c.id = o.customer_id;
Other joins to know:
RIGHT JOIN
FULL JOIN
10. Aliases (AS)
Aliases make queries more readable.
SELECT name AS customer_name
FROM customers;
SELECT COUNT(*) AS total_orders
FROM orders;
11. NULL Values
NULL means missing or unknown data.
Check for NULL:
SELECT * FROM customers
WHERE email IS NULL;
Not NULL:
WHERE email IS NOT NULL;
12. CASE: Conditional Logic
Similar to IF/ELSE logic.
SELECT name,
CASE
WHEN country = 'USA' THEN 'Domestic'
ELSE 'International'
END AS customer_type
FROM customers;
13. DISTINCT: Removing Duplicates
SELECT DISTINCT country
FROM customers;
14. Basic Data Cleaning in SQL
Removing duplicates
Handling NULLs
Standardizing text (UPPER, LOWER, TRIM)
Examples:
SELECT TRIM(name) FROM customers;
SELECT LOWER(email) FROM customers;
15. Best Practices for Data Analysts
Always start with SELECT * to explore data
Use aliases for clarity
Write readable queries with proper formatting
Test filters with LIMIT
Comment your SQL code
-- Count customers by country
SELECT country, COUNT(*)
FROM customers
GROUP BY country;
Final Thoughts
Mastering these SQL basics will cover 80–90% of daily tasks for most data analysts. Once you’re comfortable with them, you can move on to:
Subqueries
Window functions
CTEs
Performance optimization
Learn Data Analytics Course in Hyderabad
Read More
Python for Data Analytics: Where to Begin
Best Programming Languages for Data Analytics
Skills & Tools in Data Analytics
Common Misconceptions About Data Analytics
Visit Our Quality Thought Training Institute in Hyderabad
Subscribe by Email
Follow Updates Articles from This Blog via Email
No Comments