Home

Advertisement

friends [entries|archive|friends|userinfo]
Wax Pancake

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

A handy regular expression for 'tokenizing' a SQL statement [Dec. 28th, 2009|11:55 am]

swanhart
[Tags|, , , , ]
[Current Location |United States, San Francisco [Gazillion]]

Often times I find myself having to handle a particular portion of a SQL statement via a script. I've written a lot of specialized regular expressions over time to handle these tasks, but the one that I've had to write the most is a basic 'tokenizer' which understands the quoting semantics of a MySQL statement. That is, it understands `BackTick`, 'Single Quote' and "Double Quoted" strings.

#PHP Heredoc syntax
$regex = <<< END_OF_REGEX
/
  [^ \"'`(),]*\([^)]*\)    #match functions like concat(x,"y",`a`.`z`) or sum(`xyz`);
  |\([^)]*?\)              #match grouped items
  |"[^"]*?"                #match double quoted items
  |'[^']*?'                #match single quoted items
  |`[A-Za-z0-9_ .'\"()+\\-&^%\$+?%\\/\\\\!`]+`  #match backtick mysql names
  |[^ ,]+                  #match keywords, operators and aliases
  |,
/xi
END_OF_REGEX;


When used with the preg_match_all() function, an array is produced which represents the tokenized string. Functions calls, quoted strings and grouped expressions are returned together, and may need additional processing, depending on your needs.

For example:
-- select sum(a * b) sweety,'abc' as a1, 
--        "abc",concat(`def`.`abc`,'hello', x.y, z) as `blah`,
--        null + 1 
--   from `abc`.`def` as def1 
--   join xyz.zzz z1 
--     on (z1.a = def1.a) 

Array
(
    [0] => select
    [1] => sum(a * b)
    [2] => sweety
    [3] => ,
    [4] => 'abc'
    [5] => as
    [6] => a1
    [7] => ,
    [8] => "abc"
    [9] => ,
    [10] => concat(`def`.`abc`,'hello', x.y, z)
    [11] => as
    [12] => `blah`
    [13] => ,
    [14] => null
    [15] => +
    [16] => 1
    [17] => from
    [18] => `abc`.`def`
    [19] => as
    [20] => def1
    [21] => join
    [22] => xyz.zzz
    [23] => z1
    [24] => on
    [25] => (z1.a = def1.a)
)


I'm using this to process SQL statements and automatically create the Flexviews calls for the query.
The following output is generated by my program for the above SQL statement (only the SELECT clause is currently represented):

call flexviews.add_expr(@mvid, 'SUM', 'a * b', 'sweety');
call flexviews.add_expr(@mvid, 'GROUP', 'abc', '1');
call flexviews.add_expr(@mvid, 'GROUP', "abc", '_abc_');
call flexviews.add_expr(@mvid, 'GROUP', 'concat(`def`.`abc`,\'hello\', x.y, z)', 'blah');
call flexviews.add_expr(@mvid, 'GROUP', 'null + 1', 'null_+_1');

link3 comments|post comment

When is MySQL going to get a global transaction ID in the binary log? [Dec. 18th, 2009|05:24 pm]

swanhart
MySQL 5.5 includes a number of features first implemented by the "Google patch", but in a different way. One feature in the patch that is still missing from 5.5 appears to be a monotonically increasing transaction identifier in the binary log.

I'm wondering where this feature is on the roadmap, and if we are likely to see it in one of the upcoming milestones?
link5 comments|post comment

navigation
[ viewing | most recent entries ]

Advertisement