Start Newsfeeds Planet MySQL
Newsfeeds
Planet MySQL
Planet MySQL - https://planet.mysql.com

  • On MySQL Replication Bugs
    While writing about problematic Oracle MySQL features previously I concentrated mostly on InnoDB problems that I have to fight with really often and deliberately skipped replication from even the preliminary list of features to study in details for that blog post. First of all, I mostly work with MariaDB users now, and implementation of many replication features in MariaDB is notably different already (and has its own list of known problems). But this happened also because (asynchronous) replication plays a key role in most MySQL environments and deserves a detailed study in a separate post.It took me some time to prepare lists of recent bugs reported for MySQL replication (row-based and statement-based, GTIDs, multi-threaded and crash safe, hopefully) and then try to summarize my findings from them. Looks like I have something related to share today, finally. Gondolas in Venice often look very similar to each other. They all are probably replicas of few different original designs. But small differences exist. This should not happen to your data in a replication setup... Let me start with a summary of problems I found and then proceed with the lists of bug reports I based my conclusions on: Both minor problems with mysqlbinlog outputs, small regressions, missing useful features and real bugs (that may lead to data differences on slaves or break replication), hangs (including MTS deadlocks), crashes and performance problems with replication that can affect production setups do not seem to get high priority over last 2 years. Even those reported in public bugs database by famous MySQL Community members or current and former Oracle MySQL engineers. GTIDs and replication crash safety features were supposed to make it easier to use replication in production. In reality there are some GTID-specific ways to break replication, and it is still not as crash safe as expected in many use cases. Multi-threaded slave (a.k.a MTS or parallel replication) bugs, even serious and verified, are ignored for way more than last 2 years! The feature is not completed and may not help as much as one may expect. Manual is still wrong or missing when describing some details of current replication implementation. Nobody cares much about debug assertions found by Percona QA. We may have some bombs in the replication code that may explode in a weird way any time.  It seems group replication bugs often get proper attention fast, but nobody in Oracle has enough time to deal with asynchronous replication bugs properly. Some of them hang around without even proper verification for months... Now let's get into details. Here is the list of some recent row-based replication specific bugs and feature requests: Bug #91822 - "incorrect datatype in RBR event when column is NULL and not explicit in query". The column missing in the INSERT statement is given the datatype of the column immediately preceding it. This bug was reported by Ernie Souhrada back in July. Bug #89339 - "RBR slave index choosing logic is too simple". As Fungo Wang pointed out based on source code review, it is based on rules, not considered data distribution at all. Slave uses primary key if it exists, otherwise tries unique key that does not allow NULLs (if any), then tries other keys and finally falls back to full table scan if there are no indexes. When there is no primary or unique key on columns not allowing NULL values, no statistics is taken into account when picking up indexes vs full table scan, and this easily leads to bad plans. Surely, one has to have primary keys in all tables, especially those involved in replication, but it would be great if slave code was a bit smarter in corner cases (as MariaDB's one).By the way, setting slave_rows_search_algorithms to HASH_SCAN may not help, as it may be too slow in practice. See Bug #84887 by Dennis Gao for more details and an example of perf and gdb usage for the analysis. Bug #89297 - "please allow number to varchar slave_type_conversions if target is large enough". Nice feature request by Shane Bester. Bug #88791 - "Binary log for generated column contains new value as WHERE clause, not old value". This bug was reported by MC Brown. Surely, mysqlbinlog output for row-based events is not supposed to be 100% valid SQL for copy-paste, but it should not be misleading. Bug #88595 - "Row-based master-master replication broken by add column or drop column". This is more a documentation issue. It's strange to see this limitation not documented here. Bug #88223 - "Replication with no tmpdir space and InnoDB as tmp_storage_engine can break". It's really weird to see this bug reported by Sveta Smirnova hanging in "Analyzing" status for almost a year! Bug #88057 - "Intermediary slave does not log master changes with binlog_rows_query_log_events". Yet another bug report from Sveta, this time "Verified". Row query log events for UPDATEs and DELETEs are NOT recorded in the binary log by the intermediate masters... Bug #85292 - "unreachable statement in function process_event with event type QUERY_EVENT". Shane Bester found a part of the code that probably should be rewritten. No further public attention to this report for 1.5 years... Bug #84287 - "row inserts, statement updates, persistent stats lead to table scans+ lag slaves". Yet another great finding by Shane Bester. As often happens, another problematic feature I truly hate, InnoDB's persistent statistics, is also involved in this case, leading to bad performance on slave. Bug #83431 - "RBR slave's "BEGIN" binlog query event doesn't carry a session database name". It was reported almost 2 years ago by Teng Li. Bug #83418 - "When binlog_row_image is MINIMAL, some updates queries logs redundant columns". This was reported by Abhinav Sharma also almost two years ago. Bug #83090 - "AUTO_INCREMENT not updated when using INSERT ON DUPLICATE KEY UPDATE". This bug was reported by Monty Solomon two years ago. See also his related Bug #83030 - "AUTO_INCREMENT not updated when using REPLACE INTO". Bug #80821 - "Replication breaks if multi-table DELETE is used in conjunction with Foreign Key". I do not understand why this perfect (regression?) bug report by Philip Stoev, with a clear MTR test case was immediately verified and then just hangs around for 2.5 years... Bug #80770 - "Compression of row-based event". Great feature contributed by Davi Arnaut. Bug #79831- "Unexpected error message on crash-safe slave with max_relay_log_size set". Let me just conclude this with a 5.6-spcific (probably) bug report by yours truly... I have to stop at this point, as other RBR-related bug reports are known for way more time and probably are just forgotten for good already by MySQL development. They hardly have a chance for the fix any time soon.Let's continue with replication bugs that are not closely related to row-based binlog format: Bug #92398 - "point in time recovery using mysqlbinlog broken with temporary table -> errors". Shane Bester found yet another case when statement-based binlog format may lead to problems for point in time recovery. Bug #92252 - "Mysql generates gtid gaps because of option slave-skip-errors". This bug was reported by Pin Lin. We get gaps in GTID values on slave when errors are skipped. If binlog files on master include transactions which were applied on slave and hit errors in slave-skip-errors were purged, slave can't reconnect to master and reports error 1236. Features that were supposed to make replication more robust lead to more breaks in replication. Bug #92132 - "secure-file-priv breaks LOAD DATA INFILE replication in statement mode on 5.7.23". This regression bug was reported by Nicolai Plum. Recent MySQL 5.7 and 8.0 versions are affected. Bug #92109 - "Please make replication crash safe with GTID and less durable setting (bis)." Jean-François Gagné keeps looking for options to make replication really crash safe in all practically important cases. Bug #92073 - "int change_master(): Assertion `mi->rli->inited' failed." Roel Van de Paar continues his QA efforts in a hope to make MySQL better one day and reports all debug assertions in public. In this case we seem to have a kind of regression in 8.0 vs 5.7. Check also these his bug reports: Bug #91980 - "Assertion `head->variables.gtid_next.type == ANONYMOUS_GTID' failed." and Bug #90946 - "Assertion `get_state() >= 0' failed on RESET MASTER while tx_read_only". Debug assertions are in the code for a reason usually, so in a latter bug report Roel recently asked for some update... Bug #91766 - "Log of replication position when running CHANGE MASTER TO is wrong.". Jean-François Gagné found yet another annoying detail - one would expect the error log to have data about the SQL Thread position, but it has data about the IO Thread position. Bug #91744 - "START SLAVE UNTIL going further than it should." Now this bug reported by  Jean-François Gagné is really serious. Still it hangs around "Open" since July 20... Bug #91633 - "Replication failure (errno 1399) on update in XA tx after deadlock". XA is yet another MySQL feature that is doomed to stay half-baked it seems... More on this you should expect in a separate blog post one day. The bug that causes replication break was reported by Lukas Sydorowski. The last but not the least bug in this list (that is actually long if we try to check all active bugs reported over last 2 years) that I want to include is Bug #90448 - "INSERT IGNORE not ignored on SQL thread", by Simon Mudd. Restarting slave helps and it seems MySQL 8.0.x is not affected. See also his Bug #89876 - "mysqladmin flush-hosts is not safe in GTID mode".  I have to stop at some point. In conclusion let me review a list of (less) recent bugs in a multi-threaded slave (MTS) implementation specifically: Bug #91477 - "Slave writes less when connected intermediary master with blackhole engine". Just 2 threads are used in this case as found by Sveta Smirnova. Bug #90342 - "MySQL parallel replication automatically restarts". Nobody cares about this crashing bug for 5+ months. It's still "Open". Bug #89247 - "Deadlock with MTS when slave_preserve_commit_order = ON."  Jean-François Gagné reported this 9 months ago, the bug was verifvied and based on last comments and pt-pmp outputs the hang really happen for many users. Still nobody cares in any obvious way. Bug #89229 - "FLUSH PRIVILEGES may cause MTS deadlock". This bug was reported by Libing Song 9 months ago. Bug #87796 - "Commit_order_manager can't terminate MTS worker properly when deadlock happens". This bug was reported by Seunguck Lee a year ago! Bug #85747 - "Documentation on MTS statistics". Yet another case of missing manual reported by Juan Arruti:"I didn't find information in MySQL reference manual about MTS statistics, I had to look into source code in order to get more information about them." Bug #85142 - "reducing MTS checkpointing causes high IO load". The bug reported by Trey Raymond is "Verified" for 1.5 years. Check also his Bug #81386 - "MTS hangs on file related *_EVENT forced checkpoint" and Bug #81385 - "relay log info not updated after using sql_after_mts_gaps" that are "Open" since May 2016! Bug #73066 - "Replication stall with multi-threaded replication". Ovais Tariq reported this bug more than 4 years ago, when both still worked in Percona. Nobody cares to fix it. Bug #71495 - "Logging of MTS breakage is insufficient." One of Oracle customers who use MTS in production, Simon Mudd had reported this practical problem 4.5+ years ago. No fix, no comments from Oracle engineers from more than 3 years. I have to stop at this stage, as checking more/older replication related bugs makes me even more sad than I planned or want to be. We see fixes in every release, but the backlog seems huge. Dear Oracle MySQL engineers, please, review public MySQL replication bug reports collected over last 4-5 years and spend some more time and resources on processing and fixing them! Otherwise this Acqua alta of bugs will make this great legacy feature obsolete one sad day.I had not planned to study semi-sync replication bugs in this post (maybe, some day). If you are interested in more details about this feature, crash safety of replication, parallel replication and replication in general, I suggest you to check J-F Gagné's MySQL Blog.

  • Using a trigger to parse non-conforming data in MySQL
    In another post, I explained how to use the MySQL Query Re-write Plugin to manipulate data that didn’t exactly match SQL standards of MySQL. In this post, I am going to give you another example on how to use a trigger to parse non-conforming data so the data can be written to a MySQL database. A customer came to me with a problem. They were using third-party software which produced multiple rows of comma-separated data (like a .csv file), but all of the data wasn’t on a single row. Instead, the data consisted of multiple rows of data, separated by commas and with line feeds after each row of data. Here is a shortened example of the data (the original data was 32-lines long): Header information which may, or may not, contain one or more commas Item Name,Type,Description,Value Potato Chips - Sealed Bag,,, Manifest Number,Number,MANIFEST_NUMBER,8480014 Manufacturer Code,Number,BRAND_CODE,80375993 Information 1,Number,INFO_ONE,15869563 Information 2,Number,INFO_TWO,6569569665 Information 3,Number,INFO_THREE,562 Planned Start Date,String,PROD_START_DATE,9/13/2018 Planned Start Time,String,PROD_START_TIME,081234 For some reason, the output couldn’t be separated, but the software could use an ODBC connection. They needed to be able to insert this data into a MySQL database, but they only needed certain values – they didn’t need a lot of the descriptive information. Of the example above, they only wanted to insert the information in yellow: “Header information which may, or may not, contain one or more commas.” Item Name Type Description Value Potato Chips – Sealed Bag Manifest Number Number MANIFEST_NUMBER 8480014 Manufacturer Code Number BRAND_CODE 80375993 Information 1 Number INFO_ONE 15869563 Information 2 Number INFO_TWO 6569569665 Information 3 Number INFO_THREE 562 Planned Start Date String PROD_START_DATE 9/13/2018 Planned Start Time String PROD_START_TIME 081234 At first, I suggested using a script (Python, Perl, PHP, etc.) to parse the data before inserting into the database. But, the software which produced the data had the ability to insert the data directly into a MySQL database – but all of the data could only be inserted into one field. They didn’t have an option to add an extra step outside of the database connection – and they were new to MySQL, so they needed some help. I decided to write a trigger for the table, in order to parse the data as it came into the database, but before it was inserted into a row. I also wanted to keep the original data in a column as well. The first step was to create a database and a table. I mapped the column names to the first value of each row, skipping the header information which wasn’t needed. The column names are in blue, and the data is in yellow: “Header information which may, or may not, contain one or more commas.” Item Name Type Description Value Potato Chips – Sealed Bag Manifest Number Number MANIFEST_NUMBER 8480014 Manufacturer Code Number BRAND_CODE 80375993 Information 1 Number INFO_ONE 15869563 Information 2 Number INFO_TWO 6569569665 Information 3 Number INFO_THREE 562 Planned Start Date String PROD_START_DATE 9/13/2018 Planned Start Time String PROD_START_TIME 081234 I can then match the columns I want to the data values: “Header information which may, or may not, contain one or more commas.” Item Name Potato Chips – Sealed Bag Manifest Number 8480014 Manufacturer Code 80375993 Information 1 15869563 Information 2 6569569665 Information 3 562 Planned Start Date 9/13/2018 Planned Start Time 081234 Now, I can create the database and the table. I substituted the spaces in the field names with underscores (_). I also added a primary key column (id_MANIFEST_DATA) and a proper DATETIME column – as I want to combine the two columns PROD_START_DATE and PROD_START_TIME into one column to match MySQL’s DATETIME format. This way, they can perform a search on this column later (if necessary). mysql> create database MANIFEST_DATA_DB; Query OK, 1 row affected (0.00 sec) mysql> use MANIFEST_DATA_DB; Database changed mysql> CREATE TABLE `MANIFEST_DATA_DB`.`MANIFEST_DATA_TABLE` ( -> `id_MANIFEST_DATA` INT NOT NULL AUTO_INCREMENT, -> `MANIFEST_DATA` VARCHAR(4096) NULL, -> `ITEM_NAME` VARCHAR(1024) NULL, -> `MANIFEST_NUMBER` INT NULL, -> `MANUFACTURER_CODE` VARCHAR(1024) NULL, -> `INFO_ONE` CHAR(32) NULL, -> `INFO_TWO` CHAR(32) NULL, -> `INFO_THREE` CHAR(32) NULL, -> `PROD_START_DATE` CHAR(10) NULL, -> `PROD_START_TIME` CHAR(6) NULL, -> `PROD_TIMESTAMP` DATETIME NULL, -> PRIMARY KEY (`id_MANIFEST_DATA`)) -> AUTO_INCREMENT = 1000000; Query OK, 0 rows affected (0.00 sec) The initial “non-conforming data” will be inserted into the MANIFEST_DATA field, so all I have to do is to create a trigger to parse this field before the data is inserted. Even though the data is on separate lines, parsing this data will be relatively easy, since the data is comma-delimited. I can use the SUBSTRING_INDEX function to create an array to store all of the data. But, since the first line of the data may or may not contain a comma, instead of counting commas from the beginning of the data, I will start at the end. Also, in this example, they don’t need the first line of data, as it is header information. Let’s take a look at why I want to count backwards. Here are three rows of data – where the first column may or may not contain a comma – or it might contain two or more commas. I really only want to capture the last two columns of data. "Hello, it's me",12345,ABCDE "Hello it's me",67890,FGHIJ "Hello, yes, it's me",112233,A1B2C3 If I parse the data based upon commas and start at the beginning, I will get different results when counting commas from the beginning of the data when the first line of data contains a comma. And I only want the data in green: 1 2 3 4 5 “Hello it’s me” 67890 FGHIJ “Hello it’s me” 12345 ABCDE “Hello yes it’s me” 112233 A1B2C3 But if I count backwards, I will get the same result set regardless of how many commas are in the first line of data: -5 -4 -3 -2 -1 “Hello it’s me” 67890 FGHIJ “Hello it’s me” 12345 ABCDE “Hello yes it’s me” 112233 A1B2C3 In the actual data I want to sue, I don’t want to store the first row of data anyway – so it can be ignored. If I did need the first line, I would have to search for the quotes and parse that column separately. Since the initial data will contain a line feed, I will want to replace the line feeds with a comma, so I can have all of the data on a single line and be able to use the SUBSTRING_INDEX function. Here is the original data again, but this time, I have added a column for the line feeds. Also, I am going to count backwards from the last data value. The numbers are in (bold): (-37)“Header information which may, or may not, contain one or more commas.” (-36) Item Name (-35) Type (-34) Description (-33) Value -line feed- (-32) Potato Chips – Sealed Bag (-31) (-30) (-29) -line feed- (-28) Manifest Number (-27) Number (-26) MANIFEST_NUMBER (-25) 8480014 -line feed- (-24) Manufacturer Code (-23) Number (-22) BRAND_CODE (-21) 80375993 -line feed- (-20) Information 1 (-19) Number (-18) INFO_ONE (-17) 15869563 -line feed- (-16) Information 2 (-15) Number (-14) INFO_TWO (-13) 6569569665 -line feed- (-12) Information 3 (-11) Number (-10) INFO_THREE (-9) 562 -line feed- (-8) Planned Start Date (-7) String (-6) PROD_START_DATE (-5) 9/13/2018 -line feed- (-4) Planned Start Time (-3) String (-2) PROD_START_TIME (-1) 081234 /tr> Of course, you don’t have to map out the fields like I did – you can just count backwards. The SUBSTRING_INDEX function works similar to grabbing a value out of an array – except the count value is where you want to stop grabbing data. You specify the string you want to parse, the delimiter you want to use, and the count of the value from the string where you want to stop grabbing data. The format is: SUBSTRING_INDEX(string,delimiter,count) Note: When the count value is a positive number, the count value is the ending position within the string. It doesn’t select the element itself. Here is an example using the phrase “Hello, I must be going, see you later, bye.” which contains four values separated by commas. I want to capture the third value of “see you later”. 1 2 3 4 Hello I must be going See you later bye To select “see you later”, I choose the third column as the place where I will stop grabbing data – but, since this is the stopping point, I will get the rest of the phrase up until the third column. mysql> select SUBSTRING_INDEX("Hello, I must be going, see you later, bye.",",",3) as PARSE; +---------------------------------------+ | PARSE | +---------------------------------------+ | Hello, I must be going, see you later | +---------------------------------------+ 1 row in set (0.00 sec) I don’t want all of the phrase, I only want “see you later”. With a positive count value of three, I am getting the ending point – it stops at the third delimited value. If I use a negative count value, then I am getting the starting point of the string. If I count backwards, which would give me a negative count value (and I want to do this anyway because the first line of data in the main example may have multiple commas), I can use the value of negative two (-2) and get this: mysql> select SUBSTRING_INDEX("Hello, I must be going, see you later, bye.",",",-2) as PARSE; +----------------------+ | PARSE | +----------------------+ | see you later, bye. | +----------------------+ 1 row in set (0.00 sec) That gives me a starting point for grabbing “see you later”, but I don’t want any data (the single word “bye”) after this. So, I can wrap an additional SUBSTRING_INDEX around the first one, and then only choose the first delimited set of data from the inner SUBSTRING_INDEX. It looks like this: The statement select SUBSTRING_INDEX(“Hello, I must be going, see you later, bye.”,”,”,-2) grabs both of these columns in blue: 1 2 3 4 Hello I must be going See you later bye But then I can use the second SUBSTRING_INDEX command to only select the first column of the remaining two columns: SUBSTRING_INDEX(SUBSTRING_INDEX(“Hello, I must be going, see you later, bye.”,”,”,-2), “,”, 1) 1 2 See you later bye I can test it: mysql> select SUBSTRING_INDEX(SUBSTRING_INDEX("Hello, I must be going, see you later, bye.",",",-2), ",", 1) as PARSE; +----------------+ | PARSE | +----------------+ | see you later | +----------------+ 1 row in set (0.01 sec) This is essentially how I am going to parse the long line of data (after I convert the line feeds to commas). I will go backwards to the starting point, and then only select the first column beginning at that starting point. I can then use the table I created to map put the data values I want to extract. Column Name Backwards Value ITEM_NAME -32 MANIFEST_NUMBER -25 MANUFACTURER_CODE -21 INFO_ONE -17 INFO_TWO -13 INFO_THREE -9 PROD_START_DATE -5 PROD_START_TIME -1 Example: To extract the PROD_START_TIME, which is at position -1, I will use this in the trigger: SET NEW.PROD_START_TIME = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -1), ",", 1); I can now create my trigger to parse all of the data being inserted into the MANIFEST_DATA column. In the trigger, I will use some temporary variables to parse the MANIFEST_DATA, and I will also use the SUBSTRING function to parse the PROD_START_DATE and PROD_START_TIME fields to rearrange them into the correct MySQL DATETIME format. I will use the CONCAT function to combine them into a new DATETIME field. I have some notes in the stored procedure to help you figure out what I am doing. DELIMITER $$ CREATE TRIGGER _convert_MANIFEST_DATA BEFORE INSERT ON MANIFEST_DATA_DB.MANIFEST_DATA_TABLE FOR EACH ROW BEGIN # Declare temporary variables # This is the temporary field which contains all of the data DECLARE _parse_MANIFEST_DATA varchar(4096); # The temporary date and time fields needed to create the DATETIME field named PROD_TIMESTAMP DECLARE _parse_TIME_DATE_DATE char(10); DECLARE _parse_TIME_DATE_TIME char(10); DECLARE _parse_TIME_DATE_date_final char(10); DECLARE _parse_TIME_DATE_hour char(2); DECLARE _parse_TIME_DATE_minutes char(2); DECLARE _parse_TIME_DATE_sec char(2); # Take the incoming large dataset which has line feeds and # Replace the line feeds with a comma set _parse_MANIFEST_DATA = replace(NEW.MANIFEST_DATA, "\n", ","); # Parse each new column from the temporary field SET NEW.ITEM_NAME = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -35), ",", 1); SET NEW.MANIFEST_NUMBER = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -25), ",", 1); SET NEW.MANUFACTURER_CODE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -21), ",", 1); SET NEW.INFO_ONE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -17), ",", 1); SET NEW.INFO_TWO = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -13), ",", 1); SET NEW.INFO_THREE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -9), ",", 1); SET NEW.PROD_START_DATE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -5), ",", 1); SET NEW.PROD_START_TIME = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -1), ",", 1); # Get the values from these two fields in order to combine them into a DATETIME field SET _parse_TIME_DATE_DATE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -5), ",", 1); SET _parse_TIME_DATE_TIME = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -1), ",", 1); # Convert the date from MM/DD/YYYY to YYYY-MM-DD SET _parse_TIME_DATE_date_final = STR_TO_DATE(_parse_TIME_DATE_DATE, '%c/%d/%Y'); # Parse the time so we can add colons between the hour, minutes and seconds SET _parse_TIME_DATE_hour = SUBSTRING(_parse_TIME_DATE_TIME, 1, 2); SET _parse_TIME_DATE_minutes = SUBSTRING(_parse_TIME_DATE_TIME, 3, 2); SET _parse_TIME_DATE_sec = SUBSTRING(_parse_TIME_DATE_TIME, 5, 2); # Combine the newly-formatted date with the newly-formatted time so it conforms to the DATETIME format SET NEW.PROD_TIMESTAMP = CONCAT(_parse_TIME_DATE_date_final, " ", _parse_TIME_DATE_hour, ":", _parse_TIME_DATE_minutes, ":", _parse_TIME_DATE_sec) ; END$$ DELIMITER ; Now I can insert the sample data into the database – where all of the data will go into the MANIFEST_DATA column, and the trigger will populate the rest of the columns by parsing the data. mysql> INSERT INTO MANIFEST_DATA_TABLE (MANIFEST_DATA) VALUES ('Header information which may, or may not, contain one or more commas '> Item Name,Type,Description,Value '> Potato Chips - Sealed Bag,,, '> Manifest Number,Number,MANIFEST_NUMBER,8480014 '> Manufacturer Code,Number,BRAND_CODE,80375993 '> Information 1,Number,INFO_ONE,15869563 '> Information 2,Number,INFO_TWO,6569569665 '> Information 3,Number,INFO_THREE,562 '> Planned Start Date,String,PROD_START_DATE,9/13/2018 '> Planned Start Time,String,PROD_START_TIME,081234'); Query OK, 1 row affected (0.07 sec) When I look at the contents of the table, I can see the trigger executed successfully. All of the “non-conforming” data is stored in the MANIFEST_DATA field, but the other fields were populated: mysql> select * from MANIFEST_DATA_TABLE\G *************************** 1. row *************************** id_MANIFEST_DATA: 1000000 MANIFEST_DATA: Header information which may, or may not, contain one or more commas Item Name,Type,Description,Value Potato Chips - Sealed Bag,,, Manifest Number,Number,MANIFEST_NUMBER,8480014 Manufacturer Code,Number,BRAND_CODE,80375993 Information 1,Number,INFO_ONE,15869563 Information 2,Number,INFO_TWO,6569569665 Information 3,Number,INFO_THREE,562 Planned Start Date,String,PROD_START_DATE,9/13/2018 Planned Start Time,String,PROD_START_TIME,081234 ITEM_NAME: Type MANIFEST_NUMBER: 8480014 MANUFACTURER_CODE: 80375993 INFO_ONE: 15869563 INFO_TWO: 6569569665 INFO_THREE: 562 PROD_START_DATE: 9/13/2018 PROD_START_TIME: 081234 PROD_TIMESTAMP: 2018-09-13 08:12:34 1 row in set (0.00 sec) Of course, this is only a short example of all of the parsing you can do with triggers in MySQL. And, if you want to look at your trigger, you can query the INFORMATION_SCHEMA.TRIGGERS table: mysql> SELECT * FROM INFORMATION_SCHEMA.TRIGGERS WHERE TRIGGER_NAME='_convert_MANIFEST_DATA'\G *************************** 1. row *************************** TRIGGER_CATALOG: def TRIGGER_SCHEMA: MANIFEST_DATA_DB TRIGGER_NAME: _convert_MANIFEST_DATA EVENT_MANIPULATION: INSERT EVENT_OBJECT_CATALOG: def EVENT_OBJECT_SCHEMA: MANIFEST_DATA_DB EVENT_OBJECT_TABLE: MANIFEST_DATA_TABLE ACTION_ORDER: 1 ACTION_CONDITION: NULL ACTION_STATEMENT: BEGIN # Declare temporary variables # This is the temporary field which contains all of the data DECLARE _parse_MANIFEST_DATA varchar(4096); # The temporary date and time fields needed to create the DATETIME field named PROD_TIMESTAMP DECLARE _parse_TIME_DATE_DATE char(10); DECLARE _parse_TIME_DATE_TIME char(10); DECLARE _parse_TIME_DATE_date_final char(10); DECLARE _parse_TIME_DATE_hour char(2); DECLARE _parse_TIME_DATE_minutes char(2); DECLARE _parse_TIME_DATE_sec char(2); # Take the incoming large dataset which has line feeds and # Replace the line feeds with a comma set _parse_MANIFEST_DATA = replace(NEW.MANIFEST_DATA, " ", ","); # Parse each new column from the temporary field SET NEW.ITEM_NAME = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -35), ",", 1); SET NEW.MANIFEST_NUMBER = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -25), ",", 1); SET NEW.MANUFACTURER_CODE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -21), ",", 1); SET NEW.INFO_ONE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -17), ",", 1); SET NEW.INFO_TWO = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -13), ",", 1); SET NEW.INFO_THREE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -9), ",", 1); SET NEW.PROD_START_DATE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -5), ",", 1); SET NEW.PROD_START_TIME = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -1), ",", 1); # Get the values from these two fields in order to combine them into a DATETIME field SET _parse_TIME_DATE_DATE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -5), ",", 1); SET _parse_TIME_DATE_TIME = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -1), ",", 1); # Convert the date from MM/DD/YYYY to YYYY-MM-DD SET _parse_TIME_DATE_date_final = STR_TO_DATE(_parse_TIME_DATE_DATE, '%c/%d/%Y'); # Parse the time so we can add colons between the hour, minutes and seconds SET _parse_TIME_DATE_hour = SUBSTRING(_parse_TIME_DATE_TIME, 1, 2); SET _parse_TIME_DATE_minutes = SUBSTRING(_parse_TIME_DATE_TIME, 3, 2); SET _parse_TIME_DATE_sec = SUBSTRING(_parse_TIME_DATE_TIME, 5, 2); # Combine the newly-formatted date with the newly-formatted time so it conforms to the DATETIME format SET NEW.PROD_TIMESTAMP = CONCAT(_parse_TIME_DATE_date_final, " ", _parse_TIME_DATE_hour, ":", _parse_TIME_DATE_minutes, ":", _parse_TIME_DATE_sec) ; END ACTION_ORIENTATION: ROW ACTION_TIMING: BEFORE ACTION_REFERENCE_OLD_TABLE: NULL ACTION_REFERENCE_NEW_TABLE: NULL ACTION_REFERENCE_OLD_ROW: OLD ACTION_REFERENCE_NEW_ROW: NEW CREATED: 2018-09-20 22:13:28.54 SQL_MODE: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION DEFINER: root@localhost CHARACTER_SET_CLIENT: utf8mb4 COLLATION_CONNECTION: utf8mb4_0900_ai_ci DATABASE_COLLATION: utf8mb4_0900_ai_ci 1 row in set (0.02 sec)   Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc. MySQL is the world’s most popular open-source database program. Tony may be reached at info [at] ScriptingMySQL.com and on LinkedIn. Tony is the author of Twenty Forty-Four: The League of Patriots Visit http://2044thebook.com for more information. Tony is the editor/illustrator for NASA Graphics Standards Manual Remastered Edition Visit https://amzn.to/2oPFLI0 for more information.

  • How to check and fix MySQL replication inconsistencies ?
    There are several possibilities to end up with inconsistent MySQL replication, This could be accidental or intentional. In this blog I would like to discuss on how to identify the inconsistent slaves with master and fix them. I used here pt-table-checksum (to find the difference between master and slave) and pt-table-sync (to sync. between MySQL master and slave) from Percona Toolkit, The detailed documentation of Percona toolkit is available here for your reference. I expect / recommend you to be careful (as I mentioned above, sometimes records are inserted / deleted on MySQL slave intentionally) before using pt-table-checksum to sync. slave with master because rollbacking this task is even more expensive. The objective of this blog is to show you how to find differences between master and slave in an MySQL replication infrastructure, If you have decided to sync. slave with master then please proceed with pt-table-sync tool. Both pt-table-checksum and pt-table-sync are highly customizable tools, I have used very simple form of them in the examples below: Master – 192.168.56.3 Slave – 192.168.56.4 Percona Toolkit Version – 3.0.12 MySQL Version -MySQL GA 5.7.23 I have created a user in the master to check and repair: GRANT ALL ON *.* to chksum@'%' identified by 'Password@2018'; In case you have non default ports (3306) for MySQL, Please set the following variables on the slaves: # my.cnf [mysqld] report_host = slave report_port = 3307 Confirm data inconsistencies in Master (192.168.56.3) and Slave (192.168.56.4): Master mysql> select count(1) from titles; +----------+ | count(1) | +----------+ | 443308 | +----------+ 1 row in set (0.09 sec) Slave mysql> select count(1) from titles; +----------+ | count(1) | +----------+ | 443311 | +----------+ 1 row in set (0.09 sec) Check data inconsistencies using pt-table-checksum: Check for data inconsistencies by executing the following command** on the Master: ** command I have executed below :- pt-table-checksum h=192.168.56.3,u=chksum,p=Password@2018,P=3306 –set-vars innodb_lock_wait_timeout=30 –no-check-binlog-format –databases=employees –tables=titles [root@localhost ~]# pt-table-checksum h=192.168.56.3,u=chksum,p=Password@2018,P=3306 --set-vars innodb_lock_wait_timeout=30 --no-check-binlog-format --databases=employees --tables=titles Checking if all tables can be checksummed ... Starting checksum ... TS ERRORS DIFFS ROWS DIFF_ROWS CHUNKS SKIPPED TIME TABLE 09-20T22:53:02 0 2 443308 5 6 0 1.319 employees.titles Fixing data inconsistencies using pt-table-checksum We are checking data inconsistency from Master (192.168.56.3) to Slave (192.168.56.3) by executing the following command** on the Master: ** command I have executed below :- pt-table-sync –sync-to-master h=192.168.56.4,u=chksum,p=Password@2018,P=3306 –databases=employees –tables=titles –print [root@localhost ~]# pt-table-sync --sync-to-master h=192.168.56.4,u=chksum,p=Password@2018,P=3306 --databases=employees --tables=titles --print REPLACE INTO `employees`.`titles`(`emp_no`, `title`, `from_date`, `to_date`) VALUES ('10144', 'Senior Staff', '1992-10-14', '1993-08-10') /*percona-toolkit src_db:employees src_tbl:titles src_dsn:P=3306,h=192.168.56.3,p=...,u=chksum dst_db:employees dst_tbl:titles dst_dsn:P=3306,h=192.168.56.4,p=...,u=chksum lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:3789 user:root host:localhost.localdomain*/; REPLACE INTO `employees`.`titles`(`emp_no`, `title`, `from_date`, `to_date`) VALUES ('10144', 'Staff', '1985-10-14', '1992-10-14') /*percona-toolkit src_db:employees src_tbl:titles src_dsn:P=3306,h=192.168.56.3,p=...,u=chksum dst_db:employees dst_tbl:titles dst_dsn:P=3306,h=192.168.56.4,p=...,u=chksum lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:3789 user:root host:localhost.localdomain*/; DELETE FROM `employees`.`titles` WHERE `emp_no`='87000' AND `title`='Staff Engineer' AND `from_date`='1990-01-01' LIMIT 1 /*percona-toolkit src_db:employees src_tbl:titles src_dsn:P=3306,h=192.168.56.3,p=...,u=chksum dst_db:employees dst_tbl:titles dst_dsn:P=3306,h=192.168.56.4,p=...,u=chksum lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:3789 user:root host:localhost.localdomain*/; DELETE FROM `employees`.`titles` WHERE `emp_no`='97000' AND `title`='Database Engineer' AND `from_date`='1991-01-01' LIMIT 1 /*percona-toolkit src_db:employees src_tbl:titles src_dsn:P=3306,h=192.168.56.3,p=...,u=chksum dst_db:employees dst_tbl:titles dst_dsn:P=3306,h=192.168.56.4,p=...,u=chksum lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:3789 user:root host:localhost.localdomain*/; DELETE FROM `employees`.`titles` WHERE `emp_no`='97500' AND `title`='Project Manager' AND `from_date`='1983-04-11' LIMIT 1 /*percona-toolkit src_db:employees src_tbl:titles src_dsn:P=3306,h=192.168.56.3,p=...,u=chksum dst_db:employees dst_tbl:titles dst_dsn:P=3306,h=192.168.56.4,p=...,u=chksum lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:3789 user:root host:localhost.localdomain*/; DELETE FROM `employees`.`titles` WHERE `emp_no`='97501' AND `title`='Project Manager' AND `from_date`='1983-04-11' LIMIT 1 /*percona-toolkit src_db:employees src_tbl:titles src_dsn:P=3306,h=192.168.56.3,p=...,u=chksum dst_db:employees dst_tbl:titles dst_dsn:P=3306,h=192.168.56.4,p=...,u=chksum lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:3789 user:root host:localhost.localdomain*/; DELETE FROM `employees`.`titles` WHERE `emp_no`='97502' AND `title`='Project Engineer' AND `from_date`='1993-04-11' LIMIT 1 /*percona-toolkit src_db:employees src_tbl:titles src_dsn:P=3306,h=192.168.56.3,p=...,u=chksum dst_db:employees dst_tbl:titles dst_dsn:P=3306,h=192.168.56.4,p=...,u=chksum lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:3789 user:root host:localhost.localdomain*/; [root@localhost ~]# To fix inconsistencies on MySQL Master against the Slave execute the following command on the Master: [root@localhost ~]# pt-table-sync --sync-to-master h=192.168.56.4,u=chksum,p=Password@2018,P=3306 --databases=employees --tables=titles --execute Confirm the data inconsistencies in Master (192.168.56.3) and Slave (192.168.56.4) are fixed: Master mysql> select count(1) from titles; +----------+ | count(1) | +----------+ | 443308 | +----------+ 1 row in set (0.09 sec) Slave mysql> select count(1) from titles; +----------+ | count(1) | +----------+ | 443308 | +----------+ 1 row in set (0.09 sec) Conclusion  I recommend / encourage my customers to perform pt-table-checksum exercise regularly in their MySQL replication infrastructure to avoid unpleasant experiences due to data consistency issues. The post How to check and fix MySQL replication inconsistencies ? appeared first on MySQL Consulting, Support and Remote DBA Services.

  • This Week in Data with Colin Charles 53: It’s MariaDB Week PLUS Percona Live Europe Update
    Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community. This week is clearly what I’d call a “MariaDB week” — plenty of announcements coming from MariaDB Corporation and MariaDB Foundation. It started with Alibaba Cloud and MariaDB Announce the Launch of ApsaraDB RDS for MariaDB TX, which makes Alibaba Cloud the first public cloud to offer the enterprise offering of MariaDB, MariaDB TX 3.0. It is not available yet as of this announcement for rolling out from the interface, but I expect it will be soon. Exciting, as you can already get MariaDB Server on Amazon RDS for MariaDB, and you can join the waitlist preview for Azure. MariaDB Corporation has received more funding from ServiceNow Ventures in the Series C round, and has gained a new board member in Pat Casey. ServiceNow is a user of MariaDB, and “ServiceNow’s platform runs on up to 85,000 MariaDB databases that serve more than 25 billion queries per hour.” There was an excellent keynote session at M|18 about how ServiceNow uses MariaDB. The Register refers to this as “protecting ServiceNow’s toolchain”. For good measure, MariaDB acquired Clustrix as well. This is the second acquisition after MammothDB earlier in the year. It is worth reading the TechCrunch take on this. Clustrix, a Y Combinator company, has been around since 2006 and raised $72 million. The price of the acquisition was not announced. For a bit of behind the scenes chatter from ex-employee shareholders, Hacker News delivers. From a MariaDB Foundation standpoint, we see Otto Kekäläinen, the MariaDB Foundation CEO stepping down. Thanks for all your hard work Otto! And maybe you missed it, but not long ago, Percona Became a Bronze Sponsor of MariaDB Foundation. Speaking of conferences, the tutorial schedule and a sneak peek of sessions for Percona Live Europe Frankfurt have been announced. In addition, the Call for Papers – 2019 Annual MariaDB User Conference closes October 31, 2018. Releases SQLite Release 3.25.1 now comes with enhancements to ALTER TABLE and window functions. Percona XtraDB Cluster 5.6.41-28.28 Percona Server for MongoDB 3.2.21-3.12 and Percona Server for MongoDB 3.6.7-1.5 Link List MongoDB 4.1.3 is ready for testing and MMAPv1 has been removed. A very apt goodbye, “sometimes a piece of software has served its purpose and it’s time to move on. The original MongoDB storage engine, MMAPv1, was deprecated in 4.0 and will no longer be an option in 4.2.” Postgres 11 – a First Look — don’t forget to check out PostgreSQL 11 Beta 4. Mark Callaghan on durability debt, defined as “the amount of work that can be done to persist changes that have been applied to a database.“ Upcoming Appearances Open Source Summit Europe 2018 – 22-24 October 2018 Feedback I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.   The post This Week in Data with Colin Charles 53: It’s MariaDB Week PLUS Percona Live Europe Update appeared first on Percona Database Performance Blog.

  • SQL Order of Operations – SQL Query Order of Execution
    Knowing the bits and bytes of an SQL query’s order of operations can be very valuable, as it can ease the process of writing new queries, while also being very beneficial when trying to optimize an SQL query. If you’re looking for the short version, this is the logical order of operations, also known as the order of execution, for an SQL query: FROM, including JOINs WHERE GROUP BY HAVING WINDOW functions SELECT DISTINCT UNION ORDER BY LIMIT and OFFSET But the reality isn’t that easy nor straight forward. As we said, the SQL standard defines the order of execution for the different SQL query clauses. Said that, modern databases are already challanaging that default order by applying some optimization tricks which might change the actual order of execution, though they must end up returning the same result as if they were running the query at the default execution order. Why would they do that? Well, it can be silly if the database would first fetch all data mentioned in the FROM clause (including the JOINs), before looking into the WHERE clause and its indexes. Those tables can hold lots of data, so you can imagine what will happen if the database’s optimizer would stick to the traditional order of operations of an SQL query. Let’s look into each of the SQL query parts according to their execution order. FROM and JOINs The tables specified in the FROM clause (including JOINs), will be evaluated first, to determine the entire working set which is relevant for the query. The database will merge the data from all tables, according to the JOINs ON clauses, while also fetching data from the subqueries, and might even create some temporary tables to hold the data returned from the subqueries in this clause. In many cases though, the database’s optimizer will choose to evaluate the WHERE part first, to see which part of the working set can be left out (preferably using indexes), so it won’t inflate the data set too much if it doesn’t really have to. WHERE clause The WHERE clause will be the second to be evaluated, after the FROM clause. We have the working data set in place, and now we can filter the data according to the conditions in the WHERE clause. These conditions can include references to the data and tables from the FROM clause, but cannot include any references to aliases defined in the SELECT clause, as that data and those aliases may not yet ‘exist’ in that context, as that clause wasn’t yet evaluated by the database. Also, a common pitfall for the WHERE clause would be to try and filter out aggregated values in the WHERE clause, for example with this clause: “WHERE sum(available_stock) > 0“. This statement will fail the query execution, because aggregations will be evaluated later in the process (see the GROUP BY section below). To apply filtering condition on aggregated data, you should use the HAVING clause and not the WHERE clause. GROUP BY clause Now that we filtered the data set using the WHERE clause, we can aggregate the data according to one or more columns appearing in the GROUP BY clause. Grouping the data is actually splitting it to different chunks or buckets, where each bucket has one key and a list of rows that match that key. Not having a GROUP BY clause is like putting all rows in a one huge bucket. Once you aggregate the data, you can now use aggregation functions to return a per-group value for each of the buckets. Such aggregation functions include COUNT, MIN, MAX, SUM and others. HAVING clause Now that we have grouped the data using the GROUP BY clause, we can use the HAVING clause to filter out some buckets. The conditions in the HAVING clause can refer to the aggregation functions, so the example which didn’t work in the WHERE clause above, will work just fine in the HAVING clause: “HAVING sum(available_stock) > 0″. As we’ve already grouped the data, we can no longer access the original rows at this point, so we can only apply conditions to filter entire buckets, and not single rows in a bucket. Also, as we mentioned in previous sections, aliases defined in the SELECT clause cannot be accessed in the section either, as they weren’t yet evaluated by the database (this is true in most databases). Window functions If you are using Window functions, this is the point where they’ll be executed. Just like the grouping mechanism, Window functions are also performing a calculation on a set of rows. The main difference is that when using Window functions, each row will keep its own identity and won’t be grouped into a bucket of other similar rows. Window functions can only be used in either the SELECT or the ORDER BY clause. You can use aggregation functions inside the Window functions, for example: SUM(COUNT(*)) OVER () SELECT clause Now that we are done with discarding rows from the data set and grouping the data, we can select the data we want to be fetched from the query to the client side. You can use column names, aggregations and subqueries inside the SELECT clause. Keep in mind that if you’re using a reference to an aggregation function, such as COUNT(*) in the SELECT clause, it’s merely a reference to an aggregation which already occurred when the grouping took place, so the aggregation itself doesn’t happen in the SELECT clause, but this is only a reference to its result set. DISTINCT keyword The syntax of the DISTINCT keyword is a bit confusing, because the keyword takes its place before the column names in the SELECT clause. But, the actual DISTINCT operation takes place after the SELECT. When using the DISTINCT keyword, the database will discard rows with duplicate values from the remaining rows left after the filtering and aggregations took place. UNION keyword The UNION keyword combines the result sets of two queries into one result set. Most databases will allow you to choose between UNION DISTINCT (which will discard duplicate rows from the combined result set) or UNION ALL (which just combines the result sets without applying any duplication check). You can apply sorting (ORDER BY) and limiting (LIMIT) on the UNION’s result set, the same way you can apply it on a regular query. ORDER BY clause Sorting takes place once the database has the entire result set ready (after filtering, grouping, duplication removal). Once we have that, the database can now sort the result set using columns, selected aliases, or aggregation functions, even if they aren’t part of the selected data. The only exception is when using the DISTINCT keyword, which prevents sorting by a non-selected column, as in that case the result set’s order will be undefined. You can choose to sort the data using a descending (DESC) order or an ascending (ASC) order. The order can be unique for each of the order parts, so the following is valid: ORDER BY firstname ASC, age DESC LIMIT and OFFSET In most use cases (excluding a few like reporting), we would want to discard all rows but the first X rows of the query’s result. The LIMIT clause, which is executed after sorting, allows us to do just that. In addition, you can choose which row to start fetching the data from and how many to exclude, using a combination of the LIMIT and OFFSET keywords. The following example will fetch 50 rows starting row #100: LIMIT 50 OFFSET 100

Banner
Copyright © 2018 TVA-Faustball. Alle Rechte vorbehalten.
Joomla! ist freie, unter der GNU/GPL-Lizenz veröffentlichte Software.