One way to put it is this:
LSI - allows you to perform a query on a single Hash-Key while using multiple different attributes to "filter" or restrict the query.
GSI - allows you to perform queries on multiple Hash-Keys in a table, but costs extra in throughput, as a result.
A more extensive breakdown of the table types and how they work, below:
Hash Only
As you probably already know; a Hash-Key by itself must be unique as writing to a Hash-Key that already exists will overwrite the existing data.
Hash+Range
A Hash-Key + Range-Key allows you to have multiple Hash Keys that are the same, as long as they have a different range key. In this case, if you write to a Hash-Key that already exists, but use a Range-Key that is not already used by that Hash-Key, it makes a new item, whereas if an item with the same Hash+Range combination already exists, it overwrites the matching item.
Another way to think of this is like a file with a format. You can have a file with the same name (hash) as another, in the same folder (table), as long as their format (range) is different. Likewise, you can have multiple files of the same format as long as their name is different.
LSI
An LSI is basically the same as a Hash-Key + Range-Key, and follows the same rules as it, when creating items, except that you must also provide values for the LSIs, as well; they cannot be left empty/null.
To say an LSI is "Range-Key 2" is not entirely correct as you cannot have (using my file and format analogy from earlier) a file named: file.format.lsi
and file.format.lsi2
. You can, however, have file.format.lsi
and file.format2.lsi
or file.format.lsi
and file2.format.lsi
.
Basically, an LSI is just a "Filter-key", not an actual Range-Key; your base Hash and Range value combination must still be unique while the LSI values do not have to be unique, at all. An easier way to look at it may be to think of the LSI as data within the files. You could write code that finds all the files with the name "PROJECT101", regardless of their fileFormat
, then reads the data inside to determine what should be included in the query and what is omitted. This is basically how LSI works (just without the extra overhead of opening the file to read its contents).
GSI
For GSI, you're essentially creating another table for each GSI, but without the hassle of maintaining multiple separate tables that mirror data between them; this is why they cost more throughput.
So for a GSI, you could specify fileName
as your base Hash-Key, and fileFormat
as your base Range-Key. You can then specify a GSI that has a Hash-Key of fileName2
and a Range-Key of fileFormat2
. You can then query on either fileName
or fileName2
if you like, unlike LSI where you can only query on fileName
.
The main advantages are that you only have to maintain one table, instead of 2, and anytime you write to either the primary Hash/Range or the GSI Hash/Range(s), the other(s) will automatically be updated as well, so you can't "forget" to update the other table(s) like you can with a multi-table setup. Also, there's no chance of a lost connection after updating one and before updating the other, like there is with the multi-table setup.
Additionally, a GSI can "overlap" the base Hash/Range combination. So if you wanted to make a table with fileName
and fileFormat
as your base Hash/Range and filePriority
and fileName
as your GSI, you can.
Lastly, a GSI Hash+Range combination does not have to be unique, while the base Hash+Range combination does have to be unique. This is something that is not possible with a dual/multi table setup, but is with GSI. As a result, you MUST provide values for both the base AND GSI Hash+Range, when updating; none of these values can be empty/null.