Stuck with High or Full Disk Usage with Amazon Redshift? We can help you.
Here, at Bobcares, we assist our customers with several AWS queries as part of our AWS Support Services.
Today, let us see the factors that lead to high disk usage errors and how to troubleshoot them.
High or Full Disk Usage with Amazon Redshift
Generally, high disk usage errors depend on several factors. Moving ahead, let us discuss in detail these factors.
-
Distribution and sort key
Here, we review the table’s distribution style, distribution key, and sort key selection.
Tables that have a distribution skew with more data on one node than the others can cause a full disk node.
In this case, we need to change the distribution style to a more uniform distribution.
To determine the cardinality of the distribution key, we run:
SELECT <distkey column>, COUNT(*) FROM <schema name>.<table with distribution skew> GROUP BY <distkey column> HAVING COUNT(*) > 1 ORDER BY 2 DESC;
-
Query processing
While a query process, intermediate query results store in temporary blocks. Without enough memory, the tables cause a disk spill. The results won’t compress and affect the available disk space.
Amazon Redshift has a table structure with even distribution and no column encoding for temporary tables. However, if we use the SELECT…INTO syntax, we need to use a CREATE statement as well.
In case of insufficient memory, we may see a step in SVL_QUERY_SUMMARY where is_diskbased shows the value “true”.
To resolve this, we increase the number of query slots to allocate more memory.
-
Tables with VARCHAR(MAX) columns
We check the VARCHAR or CHARACTER VARYING columns to trail blanks that might omit when data stores on the disk.
Trailing blanks can occupy the full length in memory. So, our Support Techs recommend using the smallest possible column size.
We can generate a list of tables with maximum column widths using:
SELECT database, schema || '.' || "table" AS "table", max_varchar FROM svv_table_info WHERE max_varchar > 150 ORDER BY 2;
Then to identify and display the true widths of the wide VARCHAR table columns, we run:
SELECT max(octet_length (rtrim(column_name))) FROM table_name;
-
High column compression
For optimal column encoding, we encode columns except for the sort key. To do so, we use the ANALYZE COMPRESSION or Amazon Redshift column encoding utility.
Amazon Redshift provides column encoding to increase read performance while reducing overall storage consumption.
-
Maintenance operations
We need to ensure regular analysis and vacuum of the database tables.
The best method to maintain the health of the database is to identify any missing or outdated stats.
This prevents unnecessary table rows scan and also helps to optimize the query processing.
For example, here we can identify outdated stats in Amazon Redshift:
SELECT * FROM svv_table_info WHERE stats_off > 10 ORDER BY size DESC;
-
Cartesian products with cross-joins
Cartesian products can result in higher memory utilization and more table spills. To look for queries with Cartesian products we can use the EXPLAIN plan of the query.
Suppose it does not share a JOIN condition. Then the joins produce a Cartesian product of two tables. Every row of one table then joins to every row of the other table.
-
Minimum table size
It is possible for the same table to have different sizes in different clusters.
It determines the minimum table size by the number of columns and whether the table has a SORTKEY and the number of slices populated.
A change in the number of slices can result in overall disk storage.
-
Tombstone blocks
Generally, tombstone blocks occur when a WRITE transaction to an Amazon Redshift table occurs and there is a concurrent Read.
Suppose long-running table transactions occur regularly and across several loads. Then enough tombstones can accumulate to result in a Disk Full error.
With Amazon Redshift, we can analyze tombstone blocks using a commit command.
If there are long-running queries that are active, then we need to terminate them:
begin;
create table a (id int);
insert into a values(1);
commit;
drop table a;
To confirm tombstone blocks, we run:
select trim(name) as tablename, count(case when tombstone > 0 then 1 else null end) as tombstones from svv_diskusage group by 1 having count(case when tombstone > 0 then 1 else null end) > 0 order by 2 desc;
-
Copy a large file
During a COPY operation, we might receive a Disk Full error even if there is enough storage available.
This happens if the sorting operation spills to the disk, creating temporary blocks.
In this case, we check the STL_DISK_FULL_DIAG table:
select '2000-01-01'::timestamp + (currenttime/1000000.0)* interval '1 second' as currenttime,node_num,query_id,temp_blocks from pg_catalog.stl_disk_full_diag;
Additional troubleshooting
Initially, we check the percentage of disk space under the Performance tab.
In case of a sudden spike in utilization, we use the STL_QUERY to identify the activities and jobs that are running:
select * from stl_query where starttime between '2018-01-01 00:30:00' and '2018-01-01 00:40:00';
We can identify the top 20 disk spill queries using:
select A.userid, A.query, blocks_to_disk, trim(B.querytxt) text from stl_query_metrics A, stl_query B where A.query = B.query and segment=-1 and step = -1 and max_blocks_to_disk > 0 order by 3 desc limit 20;
To determine if the queries properly write to a disk, we run:
SELECT q.query, trim(q.cat_text)
FROM (
SELECT query,
replace( listagg(text,' ') WITHIN GROUP (ORDER BY sequence), '\\n', ' ') AS cat_text
FROM stl_querytext
WHERE userid>1
GROUP BY query) q
JOIN (
SELECT distinct query
FROM svl_query_summary
WHERE is_diskbased='t' AND (LABEL ILIKE 'hash%' OR LABEL ILIKE 'sort%' OR LABEL ILIKE 'aggr%' OR LABEL ILIKE 'save%' OR LABEL ILIKE 'window%' OR LABEL ILIKE 'unique%')
AND userid > 1) qs
ON qs.query = q.query;
[Stuck in between? We are here for you]
Conclusion
In short, we saw how our Support Techs troubleshoot high or full Disk Usage with Amazon Redshift
0 Comments