Pergunta

It's well known that science has given us large amount of free accessible data, such as http://www.1000genomes.org and http://www.ncbi.nlm.nih.gov/genbank. How can we play around with the data and apply data science/machine learning to it? What could be some ideas?

My own ideas:

  • Biological data visualisation
  • Gene prediction using hidden-markov-model

Any more?

Foi útil?

Solução

  • Determine the function of genes and the elements that regulate genes throughout the genome.
  • Find variations in the DNA sequence among people and determine their significance. The most common type of genetic variation is known as a single nucleotide polymorphism or SNP (pronounced “snip”). These small differences may help predict a person’s risk of particular diseases and response to certain medications.
  • Discover the 3-dimensional structures of proteins and identify their functions.
  • Explore how DNA and proteins interact with one another and with the environment to create complex living systems.
  • Develop and apply genome-based strategies for the early detection, diagnosis, and treatment of disease.
  • Sequence the genomes of other organisms, such as the rat, cow, and chimpanzee, in order to compare similar genes between species.
  • Develop new technologies to study genes and DNA on a large scale and store genomic data efficiently.
  • Continue to explore the ethical, legal, and social issues raised by genomic research.
  • Source

Outras dicas

You may build models to classify genomes by population. Run unsupervised learning (clustering) to see if populations are reconstructed in the model. Build models to infer missing genotypes

To do a Scalable DNA analysis you may check Adam software based on Apache Spark

Licenciado em: CC-BY-SA com atribuição
scroll top