This post concludes the series of three posts that I started about a month ago; you can find part 1 and part 2.
(Note: All code in this post was originally written by me for Leaguepedia and is licensed under CC BY-SA 3.0.)
Original query
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
|
function h.makeAndRunQuery(player)
local query = {
tables = {
'MatchSchedule=MS',
'MatchSchedule__InterviewWith=MSInt',
'PlayerRedirects=PRMS',
'MatchScheduleGame=MSG',
'MatchScheduleGame__InterviewWith=MSGInt',
'PlayerRedirects=PRMSG',
'Tournaments=T',
},
join = {
'MS._ID=MSInt._rowID',
'MSInt._value=PRMS.AllName',
'MS.UniqueMatch=MSG.UniqueMatch',
'MSG._ID=MSGInt._rowID',
'MSGInt._value=PRMSG.AllName',
'MS.OverviewPage=T.OverviewPage',
},
fields = {
'T.Name=Tournament',
'MS.VodInterview=MSInterview',
'MSG.VodInterview=MSGInterview',
'MS.Winner=MSWinner[number]',
'MS.DateTime_UTC=Date',
'MS.Team1 [team]',
'MS.Team2 [team]',
'MSG.Blue [team]',
'MSG.Red [team]',
'MSG.Winner=MSGWinner',
},
where = {
('PRMS.OverviewPage="%s" OR PRMSG.OverviewPage="%s"'):format(player, player)
},
orderBy = 'MS.DateTime_UTC DESC',
groupBy = 'MSInterview,MSGInterview',
}
return util_cargo.queryAndCast(query)
end
|
What’s going on? Once again, we have an OR
in the where
condition, and two copies of PlayerRedirects
, one from MatchSchedule
and MatchScheduleGame
.
This setup is similar to the first query we looked at, but there is one big confounding factor: the groupBy
. We still don’t ever have to splice data together - that is, there will never be a case when some of the columns in a row are coming from union query 1 and some of the columns in that same row are coming from union query 2 - but we WILL have to handle “merge conflicts”, where one row shows up in both of our data sets and we want to see it only a single time.
New Cargo wrapper application logic
Before, the end of my queryAndCast
function looked like this:
1
2
3
4
5
|
result = util_table.mergeArrays(unpack(result))
if query.sortKey then
util_sort.tablesByKeys(result, query.sortKey, query.sortOrder)
end
return result
|
Now, to accommodate checking for duplicates at the same time as merging, I added a new function to Module:TableUtil
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
function p.uniqueKeyMergeArrays(key, tbl1, ...)
-- tbl1 is modified to include the elements of tbl2 appended to the end. Order is preserved.
-- skips rows in later tables that have repeat values of a key.
-- useful when unioning Cargo tables with a net groupBy.
if not tbl1 then tbl1 = {} end
local newTables = {...}
local seenKeys = {}
for _, tbl2 in ipairs(newTables) do
for _, row in ipairs(tbl2) do
if not seenKeys[row[key]] then
seenKeys[row[key]] = true
tbl1[#tbl1+1] = row
end
end
end
return tbl1
end
|
And here is my new queryAndCast
function, which also updates the syntax to require only one layer of nesting as opposed to two, like I mentioned I wanted to change last time:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
function p.queryAndCast(query)
if not query.union then
return h.queryAndCastOne(query)
end
-- h.queryAndCastOne will clone the table so this can be in place
local result = util_map.arrayInPlace(query, h.queryAndCastOne)
if query.uniqueKey then
-- currently query.uniqueKey MUST be a single constant key
-- use CONCAT in the query as needed
-- remember you STILL must groupBy your first result!!!
result = util_table.uniqueKeyMergeArrays(query.uniqueKey, unpack(result))
else
result = util_table.mergeArrays(unpack(result))
end
if query.sortKey then
util_sort.tablesByKeys(result, query.sortKey, query.sortOrder)
end
return result
end
|
The reason for that warning about using a groupBy
on the first result is that I very deliberately do NOT check for uniqueness of keys within the first table in uniqueKeyMergeArrays
, and instead only between members of later tables and the first table. The reason for this omission is that I’d rather let SQL handle getting rid of duplicates and process less data in Lua.
One additional wrapper note
As it turns out, this wrapper is missing one last clause:
1
2
3
4
5
|
-- we normally already added an index, but it is completely untrustworthy now
-- so just overwrite it entirely
for i, row in ipairs(result) do
row.index = i
end
|
When I cast values in my Cargo wrapper, I provide a parameter row.index
; however, when taking the union of distinct queries, this index becomes useless, and so I need to go back over the entire thing and re-index from scratch.
New query
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
|
local QB = LCS.class()
function QB:init(interviewTableShort, interviewTableLong)
self.interviewTableShort = interviewTableShort
self.interviewTableLong = interviewTableLong
end
function QB:getTables()
local tables = {
'MatchSchedule=MS',
'MatchScheduleGame=MSG',
('%s__InterviewWith=InterviewWith'):format(self.interviewTableLong),
'PlayerRedirects=PR',
'Tournaments=T',
}
return tables
end
function QB:getJoin()
local join = {
'MS.UniqueMatch=MSG.UniqueMatch',
('%s._ID=InterviewWith._rowID'):format(self.interviewTableShort),
'InterviewWith._value=PR.AllName',
'MS.OverviewPage=T.OverviewPage',
}
return join
end
function QB:getFields()
local fields = {
'T.Name=Tournament',
'MS.VodInterview=MSInterview',
'MSG.VodInterview=MSGInterview',
'MS.Winner=MSWinner[number]',
'MS.DateTime_UTC=Date',
'MS.Team1 [team]',
'MS.Team2 [team]',
'MSG.Blue [team]',
'MSG.Red [team]',
'MSG.Winner=MSGWinner',
'CONCAT(COALESCE(MS.VodInterview,"EmptyMSInterview"),"____",COALESCE(MSG.VodInterview,"EmptyMSGInterview"))=UniqueKey'
}
return fields
end
function QB:getWhere(player)
return ('PR.OverviewPage="%s"'):format(player)
end
function QB:getGroupBy()
return 'MS.VodInterview, MSG.VodInterview'
end
function h.makeAndRunQuery(player)
local builderMS = QB('MS', 'MatchSchedule')
local builderMSG = QB('MSG', 'MatchScheduleGame')
local query = {
union = true,
{
tables = builderMS:getTables(),
join = builderMS:getJoin(),
where = builderMS:getWhere(player),
fields = builderMS:getFields(),
groupBy = builderMSG:getGroupBy(),
},
{
tables = builderMSG:getTables(),
join = builderMSG:getJoin(),
where = builderMSG:getWhere(player),
fields = builderMSG:getFields(),
groupBy = builderMSG:getGroupBy(),
},
sortKey = { 'Date' },
sortOrder = { false },
uniqueKey = 'UniqueKey',
}
return util_cargo.queryAndCast(query)
end
|
In this case, I was able to choose an ordering to the tables & join that works regardless of which table - MatchSchedule
or MatchScheduleGame
- I’m getting the interview from. That possibility makes the setup about 10 lines of code shorter than it would otherwise be. It also pretty much removes the need to use LuaClassSystem
at all, but that’s ok; I hadn’t realized I’d be able to do this until I’d already set up this much structure, and at that point it wasn’t worth undoing it.
You can also see how much neater the syntax looks with only one level of nesting needed: taking advantage of Lua’s equivalence between dicts and lists, I’m now just mapping over ipairs(query)
instead of ipairs(query.union)
after checking that union
is true
.
And in keeping with my decision to use sortKey
& sortOrder
as a distinct term from orderBy
before, I’m using uniqueKey
instead of groupBy
as the outer param for what is essentially my application-logic-level groupBy
param for the union query.
Conclusion
This post concludes the short series! We saw how a feature evolved slowly over time:
- Union support was originally added with a verbose syntax, and no
uniqueKey
for merging the queries.
- Grouping support was added later.
- I then improved the syntax (this was ok because it was deployed in only 2 places at the time, and on only one wiki).
- Finally, I added the
index
fix.
The last three steps were condensed into this one post, but in reality they were done over the span of about a month.
We also saw a case of an edit where I didn’t have to use this module at all - just because you have a tool doesn’t mean you should rule out making easier fixes when approaching a problem (not everything is a nail!). And I’ve mentioned a couple times the possibility of “splicing” results from separate queries together into a single row - but this is computational complexity that I want to avoid forever and never support if possible.
My Cargo wrapper has been an evolving product over the course of years, and I’m sure its union
support will continue to be improved in the future as well. The important thing is to continue to keep the interface clean, understandable, and backwards-compatible.