This page looks best with JavaScript enabled

Optimizing Cargo - query 3

 ·  ☕ 7 min read

This post concludes the series of three posts that I started about a month ago; you can find part 1 and part 2.

(Note: All code in this post was originally written by me for Leaguepedia and is licensed under CC BY-SA 3.0.)

Original query

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
function h.makeAndRunQuery(player)
	local query = {
		tables = {
			'MatchSchedule=MS',
			'MatchSchedule__InterviewWith=MSInt',
			'PlayerRedirects=PRMS',
			'MatchScheduleGame=MSG',
			'MatchScheduleGame__InterviewWith=MSGInt',
			'PlayerRedirects=PRMSG',
			'Tournaments=T',
		},
		join = {
			'MS._ID=MSInt._rowID',
			'MSInt._value=PRMS.AllName',
			'MS.UniqueMatch=MSG.UniqueMatch',
			'MSG._ID=MSGInt._rowID',
			'MSGInt._value=PRMSG.AllName',
			'MS.OverviewPage=T.OverviewPage',
		},
		fields = {
			'T.Name=Tournament',
			'MS.VodInterview=MSInterview',
			'MSG.VodInterview=MSGInterview',
			'MS.Winner=MSWinner[number]',
			'MS.DateTime_UTC=Date',
			'MS.Team1 [team]',
			'MS.Team2 [team]',
			'MSG.Blue [team]',
			'MSG.Red [team]',
			'MSG.Winner=MSGWinner',
		},
		where = {
			('PRMS.OverviewPage="%s" OR PRMSG.OverviewPage="%s"'):format(player, player)
		},
		orderBy = 'MS.DateTime_UTC DESC',
		groupBy = 'MSInterview,MSGInterview',
	}
	return util_cargo.queryAndCast(query)
end

What’s going on? Once again, we have an OR in the where condition, and two copies of PlayerRedirects, one from MatchSchedule and MatchScheduleGame.

This setup is similar to the first query we looked at, but there is one big confounding factor: the groupBy. We still don’t ever have to splice data together - that is, there will never be a case when some of the columns in a row are coming from union query 1 and some of the columns in that same row are coming from union query 2 - but we WILL have to handle “merge conflicts”, where one row shows up in both of our data sets and we want to see it only a single time.

New Cargo wrapper application logic

Before, the end of my queryAndCast function looked like this:

1
2
3
4
5
	result = util_table.mergeArrays(unpack(result))
	if query.sortKey then
		util_sort.tablesByKeys(result, query.sortKey, query.sortOrder)
	end
	return result

Now, to accommodate checking for duplicates at the same time as merging, I added a new function to Module:TableUtil:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
function p.uniqueKeyMergeArrays(key, tbl1, ...)
	-- tbl1 is modified to include the elements of tbl2 appended to the end. Order is preserved.
	-- skips rows in later tables that have repeat values of a key.
	-- useful when unioning Cargo tables with a net groupBy.
	if not tbl1 then tbl1 = {} end
	local newTables = {...}
	local seenKeys = {}
	for _, tbl2 in ipairs(newTables) do
		for _, row in ipairs(tbl2) do
			if not seenKeys[row[key]] then
				seenKeys[row[key]] = true
				tbl1[#tbl1+1] = row
			end
		end
	end
	return tbl1
end

And here is my new queryAndCast function, which also updates the syntax to require only one layer of nesting as opposed to two, like I mentioned I wanted to change last time:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
function p.queryAndCast(query)
	if not query.union then
		return h.queryAndCastOne(query)
	end
	-- h.queryAndCastOne will clone the table so this can be in place
	local result = util_map.arrayInPlace(query, h.queryAndCastOne)
	if query.uniqueKey then
		-- currently query.uniqueKey MUST be a single constant key
		-- use CONCAT in the query as needed
		-- remember you STILL must groupBy your first result!!!
		result = util_table.uniqueKeyMergeArrays(query.uniqueKey, unpack(result))
	else
		result = util_table.mergeArrays(unpack(result))
	end
	if query.sortKey then
		util_sort.tablesByKeys(result, query.sortKey, query.sortOrder)
	end

	return result
end

The reason for that warning about using a groupBy on the first result is that I very deliberately do NOT check for uniqueness of keys within the first table in uniqueKeyMergeArrays, and instead only between members of later tables and the first table. The reason for this omission is that I’d rather let SQL handle getting rid of duplicates and process less data in Lua.

One additional wrapper note

As it turns out, this wrapper is missing one last clause:

1
2
3
4
5
	-- we normally already added an index, but it is completely untrustworthy now
	-- so just overwrite it entirely
	for i, row in ipairs(result) do
		row.index = i
	end

When I cast values in my Cargo wrapper, I provide a parameter row.index; however, when taking the union of distinct queries, this index becomes useless, and so I need to go back over the entire thing and re-index from scratch.

New query

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
local QB = LCS.class()

function QB:init(interviewTableShort, interviewTableLong)
	self.interviewTableShort = interviewTableShort
	self.interviewTableLong = interviewTableLong
end

function QB:getTables()
	local tables = {
		'MatchSchedule=MS',
		'MatchScheduleGame=MSG',
		('%s__InterviewWith=InterviewWith'):format(self.interviewTableLong),
		'PlayerRedirects=PR',
		'Tournaments=T',
	}
	return tables
end

function QB:getJoin()
	local join = {
		'MS.UniqueMatch=MSG.UniqueMatch',
		('%s._ID=InterviewWith._rowID'):format(self.interviewTableShort),
		'InterviewWith._value=PR.AllName',
		'MS.OverviewPage=T.OverviewPage',
	}
	return join
end

function QB:getFields()
	local fields = {
		'T.Name=Tournament',
		'MS.VodInterview=MSInterview',
		'MSG.VodInterview=MSGInterview',
		'MS.Winner=MSWinner[number]',
		'MS.DateTime_UTC=Date',
		'MS.Team1 [team]',
		'MS.Team2 [team]',
		'MSG.Blue [team]',
		'MSG.Red [team]',
		'MSG.Winner=MSGWinner',
		'CONCAT(COALESCE(MS.VodInterview,"EmptyMSInterview"),"____",COALESCE(MSG.VodInterview,"EmptyMSGInterview"))=UniqueKey'
	}
	return fields
end

function QB:getWhere(player)
	return ('PR.OverviewPage="%s"'):format(player)
end

function QB:getGroupBy()
	return 'MS.VodInterview, MSG.VodInterview'
end

function h.makeAndRunQuery(player)
	local builderMS = QB('MS', 'MatchSchedule')
	local builderMSG = QB('MSG', 'MatchScheduleGame')
	local query = {
		union = true,
		{
			tables = builderMS:getTables(),
			join = builderMS:getJoin(),
			where = builderMS:getWhere(player),
			fields = builderMS:getFields(),
			groupBy = builderMSG:getGroupBy(),
		},
		{
			tables = builderMSG:getTables(),
			join = builderMSG:getJoin(),
			where = builderMSG:getWhere(player),
			fields = builderMSG:getFields(),
			groupBy = builderMSG:getGroupBy(),
		},
		sortKey = { 'Date' },
		sortOrder = { false },
		uniqueKey = 'UniqueKey',
	}
	return util_cargo.queryAndCast(query)
end

In this case, I was able to choose an ordering to the tables & join that works regardless of which table - MatchSchedule or MatchScheduleGame - I’m getting the interview from. That possibility makes the setup about 10 lines of code shorter than it would otherwise be. It also pretty much removes the need to use LuaClassSystem at all, but that’s ok; I hadn’t realized I’d be able to do this until I’d already set up this much structure, and at that point it wasn’t worth undoing it.

You can also see how much neater the syntax looks with only one level of nesting needed: taking advantage of Lua’s equivalence between dicts and lists, I’m now just mapping over ipairs(query) instead of ipairs(query.union) after checking that union is true.

And in keeping with my decision to use sortKey & sortOrder as a distinct term from orderBy before, I’m using uniqueKey instead of groupBy as the outer param for what is essentially my application-logic-level groupBy param for the union query.

Conclusion

This post concludes the short series! We saw how a feature evolved slowly over time:

  • Union support was originally added with a verbose syntax, and no uniqueKey for merging the queries.
  • Grouping support was added later.
  • I then improved the syntax (this was ok because it was deployed in only 2 places at the time, and on only one wiki).
  • Finally, I added the index fix.

The last three steps were condensed into this one post, but in reality they were done over the span of about a month.

We also saw a case of an edit where I didn’t have to use this module at all - just because you have a tool doesn’t mean you should rule out making easier fixes when approaching a problem (not everything is a nail!). And I’ve mentioned a couple times the possibility of “splicing” results from separate queries together into a single row - but this is computational complexity that I want to avoid forever and never support if possible.

My Cargo wrapper has been an evolving product over the course of years, and I’m sure its union support will continue to be improved in the future as well. The important thing is to continue to keep the interface clean, understandable, and backwards-compatible.

Share on

river
WRITTEN BY
River
River (RheingoldRiver) is a MediaWiki developer and the manager of Leaguepedia. She likes cats.


What's on this Page