If the table is actually several 8 bytes records, I would store it as 8 1-byte record tables, which can be efficiently indexed.
i.e struct of arrays, rather than array of structs
This has the advantage of allowing the indexing of 256 records with just register offsets, and also the use of different sized records without changing the address calculation (so if you want 5-byte records, you just use less arrays, rather than changing the calculations).
LDA ELEMENT1,Y ; This is an array of the first bytes of our record
STA $1000
LDA ELEMENT2,Y ; This is the second bytes...
STA $1001
...etc...
INY ; Just need to inc Y to access next record
As INX
/INY
are 2-cycle instructions, you're not going to beat that for updating an index.
This is my primary recommendation as it's fast and simple, though it does mean reformatting your data.
If this isn't an option you could either reformat the data dynamically such that it is (assuming you're going to then access it enough that it's worth the cost, and you have space to do so), or else you could use a variety of techniques, depending on how you actually want to access the data, and how important performance is.
If you really need to add an arbitrary amount to an index, rather than just increment, then the naive way to do it would be:
TYA (2)
CLC (2)
ADC #amt (2)
TAY (2)
Howver that naive version takes 8 cycles. You can do it in 6 cycles if you omit the clear-carry, which you can do in cases where you know that it's already been cleared (for example, if following some other loop calculation where being in the loop implies that a carry has not been generated). So it can be worth juggling the code around to avoid setting flags. Anything less than incrementing by 4 could be done simply by incrementing multiple times.
If you cannot guarantee that carry is cleared, but you can spare a page in memory for a lookup table, then you could store 0 -> 255 in the bytes in that page and do:
LDA table + amt, Y (4)
TAY (2)
The carry flag isn't set by that, but the zero flag is, so if the table wraps around to zero, you can check for that.
If you have to index off an address, then you could do:
LDA (zeropagevector), Y
And increment the high-byte of zeropagevector
. However that takes 5 cycles for the read. If you are only reading from your record in a single instruction, you could simply use normal absolute addressing, and modify the address in the instruction itself, saving a cycle.
Basically, there are loads of ways to optimise this kind of thing in 6502, but it really depends on exactly what your data is and how you want to access it.