How to get a nested tree of data via SQL

How to get a nested tree of data via SQL

by Matt Gibson -
Number of replies: 8

Not sure I've used the right terminology, but what I want to do is display a list of submissions for module X (which I'm writing), where each submission may have an arbitrary number of files associated with it.

I can write SQL to get each row from the submissions table, but will then have to loop through the results with a separate SQL query, getting file records from the submission_files table where submissionid = whatever. Looped SQL kills kittens, so is there a better way?

I'm assuming that I should get all the files in a big list via SQL that are linked to a submission that's linked to the relevant module instance. I could then sort them out in PHP, adding each one to the appropriate submission object, however this doesn't seem very elegant.

Is there an accepted best practice way of dealing with such problems?

Average of ratings: -
In reply to Matt Gibson

Re: How to get a nested tree of data via SQL

by Wes Matchett -

It sounds what you are looking for is a table join...

SELECT submissions.field1, submissions.field2, submission_files.field3, submission_files.field4 FROM submissions JOIN submission_files ON submissions.submissionid = submission_files.whatever;

This will return records that have the fields selected from both tables. There are various types of joins which will alter the set of records returned; any good SQL resource will tell you the differences.

In reply to Matt Gibson

Re: How to get a nested tree of data via SQL

by sam marshall -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers

Hi,

The way you describe is probably the best way to do it in general.

I'll give an example - there is table A, which has fields id and frog, and table B, which has fields id, aid (foreign key to the id field in A), and zombie. You can have between 0 - N B entries for each A.

1. Simple way (1 query)

SELECT
A.id, A.frog, B.zombie
FROM
A
LEFT JOIN B ON B.aid = A.id

Then loop through results in PHP - each time A.id changes, start a new object, otherwise add the 'zombie' field to an array in the existing object.

2. Complex way (2 queries)

SELECT A.id, A.frog FROM A;
SELECT B.aid, B.zombie FROM B;

Then loop through results of the second query in php - for each one, add the resulting zombie to an array in the object you already have for a.

 

The complex approach is slightly more elegant and is better if there is a significant quantity of data in the fields of A that you are retrieving (i.e. if it that 'frog' field in my example was large) since you will retrieve those multiple times, unnecessarily.

The simple approach is uglier but can perform better in certain cases because there is only a single query, particularly if both queries have to do a complicated set of conditions and more joins in order to find out which rows of A and B to include.

Looping through all the results of A and calling a separate query each time to get the corresponding Bs would be a third option, but as you already noted that one is unacceptable in performance terms (especially when the alternatives are not that complicated) unless you are certain that the number of entries returned from your query for A is bounded (ie can never be greater than, say, 3 or something).

--sam

 

 

 

 

Average of ratings: Useful (1)
In reply to sam marshall

Re: How to get a nested tree of data via SQL

by Matt Gibson -

Thanks Sam, that's exactly what I needed to know.

Having said that, I've been considering trying to follow OO principles by having a submission class to represent each submission record, where the constructor takes an id from the module's submissions table. Having ->display(), ->save() methods, etc seems a lot cleaner, but if I want to instantiate loads of them e.g. to display all student submissions, I can't encapsulate SQL for retrieving the files for each one without using the inefficient method I've been trying to avoid.

The solution that comes to mind is to have a module_filemanager singleton class, which will load and cache all the files for that module instance in one go via one of the queries you mentioned, so that the submission objects can request their files one at a time from it as needed.

Alternatively, I could avoid the OO stuff and just loop through the submissions result set building stuff for the display as I go, but this seems like bad practice.

In reply to Matt Gibson

Re: How to get a nested tree of data via SQL

by sam marshall -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers

Yes this is a difficult problem.

In some places I've done functions like:

$object->get_data($wholecourse=true)

where even though you are technically requesting data for a single object, by default it retrieves the whole course data (in your case that could be modinstance not course) into a static cache of some kind. Obviously if you call the same function for the other objects in a loop, the future calls will just use this cache.

In the case where you really only want data for one object, you can set the option to false.

So the behaviour about getting lots of data is encapsulated in the object.

This still seems a bit hacky to me so I'm not necessarily recommending it, just saying.

--sam

Average of ratings: Useful (1)
In reply to Matt Gibson

Re: How to get a nested tree of data via SQL

by Hubert Chathi -

We have something similar, except that our data class can take a record ID, or it can take a full record object.  This way, we can fetch a bunch of records using fast SQL, but still use OO methods.  If you need, you could modify your class to take the submission record, plus an array of files, or something like that.

Average of ratings: Useful (1)