Scatter plot of 10k record extracted from database

https://stackoverflow.com/questions/8732127

14-04-2021
|

Question

I am trying to make a scatter plot in Python. I supposed it will be fairly simple but got stuck with understanding in scatterplot (x and y value) while plotting.

==My mission ==

I have database and more then 10k record (all float) till now and will increase on daily basis.
The record range is from 200-2000 (in float decimal).
So, I want to see the most populated region in my dataset.

==What I did?==

import numpy as np
import pylab as pl
import MySQLdb
import sys
import math

conn = MySQLdb.connect(
    host="localhost",
    user="root",
    passwd="root",
    db="myproject")

with conn:
    cur = conn.cursor()

    #will fetch all recoreds called monoiso field
    cur.execute("SELECT monoiso FROM pmass_selectedion")
    rows = cur.fetchall()

    for row in rows:

        #xvalue for monoiso variable and yvalue for range 
        xvalue = row
        yvalue = [600]

        # tried this way too but got x and y dimension error
        #yvalue = [400,800,1200,1600]

        pl.plot(xvalue,yvalue,'ro')
pl.show()

Scatterplot Understanding (link)

enter image description here

Ok! this plot doesnt make any sense.

==Question ==

How to make scatter plot to see the most populated region?
How can I assign y variable to make equal dimension with x variable(total number of fetched records)?

New to plotting and statistic so please help me out

Solution

Perhaps you are looking for a matplotlib histogram:

import numpy as np
import MySQLdb
import matplotlib.pyplot as plt # This is meant for scripts
# import pylab as pl # This is meant for interactive sessions; 
import operator

conn = MySQLdb.connect(
    host="localhost",
    user="root",
    passwd="root",
    db="myproject")

with conn:
    cur = conn.cursor()

    #will fetch all recoreds called monoiso field
    cur.execute("SELECT monoiso FROM pmass_selectedion")
    rows = cur.fetchall()

monoisos = [row[0] for row in rows]

# Make a histogram of `monoisos` with 50 bins.
n, bins, histpatches = plt.hist(monoisos, 50, facecolor = 'green')
plt.show()

enter image description here

You can also make a histogram/dot-plot by using numpy.histogram:

momoisos = [row[0] for row in rows]
hist, bin_edges = np.histogram(monoisos, bins = 50)
mid = (bin_edges[1:] + bin_edges[:-1])/2
plt.plot(mid, hist, 'o')
plt.show()

enter image description here

Regarding the use of pylab: The docstring for pyplot says

matplotlib.pylab combines pyplot with numpy into a single namespace. This is convenient for interactive work, but for programming it is recommended that the namespaces be kept separate.

OTHER TIPS

For a scatter plot, you need an equal number of x and y values. Usually in a scatter plot, one of the variables is a function of the other one, or at least both have numerical values. For example you could have x values [1, 2, 3] and y values [4, 5, 6], so then on a 2-dimensional plot, the (x, y) values of (1, 4), (2, 5) and (3, 6) will be plotted.

In your case, it seems to me there are no y-values, but only x values, and you are keeping y fixed. From what it seems to me we cannot generate a scatter plot like this. We need one y value corresponding to each x value. You could try serial numbers as y, but it might not make much sense in the plot.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow