Normalization: What does "repeating groups" mean?

Question 1

The term "repeating group" originally meant the concept in CODASYL and COBOL based languages where a single field could contain an array of repeating values. When E.F.Codd described his First Normal Form that was what he meant by a repeating group. The concept does not exist in any modern relational or SQL-based DBMS.

The term "repeating group" has also come to be used informally and imprecisely by database designers to mean a repeating set of columns, meaning a collection of columns containing similar kinds of values in a table. This is different to its original meaning in relation to 1NF. For instance in the case of a table called Families with columns named Parent1, Parent2, Child1, Child2, Child3, ... etc the collection of Child N columns is sometimes referred to as a repeating group and assumed to be in violation of 1NF even though it is not a repeating group in the sense that Codd intended.

This latter sense of a so-called repeating group is not technically a violation of 1NF if each attribute is only single-valued. The attributes themselves do not contain repeating values and therefore there is no violation of 1NF for that reason. Such a design is often considered an anti-pattern however because it constrains the table to a predetermined fixed number of values (maximum N children in a family) and because it forces queries and other business logic to be repeated for each of the columns. In other words it violates the "DRY" principle of design. Because it is generally considered poor design it suits database designers and sometimes even teachers to refer to repeated columns of this kind as a "repeating group" and a violation of the spirit of the First Normal Form.

This informal usage of terminology is slightly unfortunate because it can be a little arbitrary and confusing (when does a set of columns actually constitute a repetition?) and also because it is a distraction from a more fundamental issue, namely the Null problem. All of the Normal Forms are concerned with relations that don't permit the possibility of nulls. If a table permits a null in any column then it doesn't meet the requirements of a relation schema satisfying 1NF. In the case of our Families table, if the Child columns permit nulls (to represent families who have fewer than N children) then the Families table doesn't satisfy 1NF. The possibility of nulls is often forgotten or ignored in normalization exercises but the avoidance of unnecessary nullable columns is one very good reason for avoiding repeating sets of columns, whether or not you call them "repeating groups".

See also this article.

Question 2

the value English is repeating again and again. Is this a repeating group?

No. The multiple appearances of English in SUBJECT_MODULE are not a repeating group or even either of the two things that people mistakenly mean by a repeating group. They are also not evidence of redundancy or lack of normalization. Such multiple appearances might be connected to redundancy or normalization, but they appear all the time when there is no redundancy and various levels of normalization.

If SUBJECT_MODULE is rows where "[SUBJECT_NAME] has [MODULE_NAME] identified by [MODULE_ID]" and a subject might have more than one module then somewhere you must have multiple mentions of that subject (perhaps via its name) with mentions of different modules (perhaps by name or id). That would not involve redundancy.

Student Age Subject

Adam    15  Biology
Adam    15  Maths
Alex    14  Maths
Stuart  17  Maths

The redundancy in this example from your question's second "this" link is not that Adam appears in two rows or that Adam appears with 15 in two rows. It is that if the table is rows where "[Student] is [Age] years old and takes [Subject]" then Student (eg Adam) can appear in multiple rows but always appears with the same Age (eg 15). But if the table were rows where "[Student] has a friend [Age] years old in [Subject]" then the table could be fully normalized already.

Sure it gets rid of the repeating value, but I am not sure if this is the right thing.

It does for your example data, but it might not for other example data. You haven't told us enough. (Anyway as I said above the multiple appearances might not even need normalizing.)

Whether there are any normalization-relevant redundancies in SUBJECT_MODULE or even whether there are any valid decompositions including the one you gave depends on the usual information necessary to normalize to above 1NF. Namely whether some of its columns are functions of others (functional dependencies) and whether its rows are also those where "..." AND "..." (join dependencies).

By giving a possible decomposition you have said that it is also rows where "...[Subject_Name]...[Module_ID]..." AND "...[Module_Name]...[Module_ID]..." And you have given some example decomposition data. But we only know that it could be so decomposed because you added the decomposition. And the decomposition plus data still isn't enough for us to know whether it should be so decomposed.

I have read different tutorials and seen different examples of normalization, specially the notion of "repeating groups" in the first normal form.

"Repeating groups" are something from pre-relational databases and cannot possibly appear in a relational table (relation). They are like a named set of values that is like a field of a record but is not quite. A relational table is always in 1NF. Each column of a row has a single value of the column's type. A non-relational database is "normalized" to tables ie 1NF (first sense of "normalized") which gets rid of repeating groups. Then those tables/relations are "normalized" to higher normal forms (second sense of "normalized").

A relational table having multiple similar columns or having a column type with multiple similar parts are each just reminiscent of having a repeating group in a non-relational database. And the multiple columns and parts should become multiple rows in a separate table, just like the multiple members of a repeating group. But these problems have to do with relational quality of design, not repeating groups or normalization (in either sense) or being relational (ie being in 1NF).

Note that a non-relational database might itself have similar problems with multiple similar fields and/or named sets or with multiple similar parts of values of fields. Normalization to tables does not get rid of these when it gets rid of repeating groups.

Regardless of how they got into a relational design, removing them gives a "better" design. It is just because these design problems are reminiscent of repeating groups that people get confused and imagine that somehow a table could contain a repeating group. So the multiple similar columns and values with multiple similar parts (or the parts) get incorrectly called "repeating groups".

See this answer re "atomicity".