Test data for statistical t-test in Python

https://datascience.stackexchange.com/questions/73587

11-12-2020
|

Pergunta

first of all sorry if this is not the proper place to ask but i have been trying to create some dummy variables in order to run a students t-test as well as a welch t-test and then run a monte-carlo simulation.Problem is, I am only given the sample size and standard deviation of the 2 populations. How can I go about creating some sort of representation for this data in order for me to run these tests? I wish to run these tests in either python or R. Thanks in advance.

EDIT : both populations come from a normal distribution

Solução

In Python, to generate random numbers from a certain distribution you would pick the corresponding distribution from np.random (documentation) and pass the corresponding parameters. Thus to draw from a normal distribution you would do

import numpy as np

# for reproducible results, seed the number generator
np.random.seed(42)

n = 100
mu_1, std_1 = 0, 1
mu_2, std_2 = 0.2, 1.5

dataset1 = np.random.normal(loc=mu_1, scale=std_1, size=n)
dataset2 = np.random.normal(loc=mu_2, scale=std_2, size=n)

And the output

print('dataset 1:')
print(f'mean: {dataset1.mean():.2f}')
print(f'std: {dataset1.std():.2f}')
print(f'shape: {dataset1.shape}')
print('--------------')
print('dataset 2:')
print(f'mean: {dataset2.mean():.2f}')
print(f'std: {dataset2.std():.2f}')
print(f'shape: {dataset2.shape}')

--------------
dataset 1:
mean: -0.10
std: 0.90
shape: (100,)
--------------
dataset 2:
mean: 0.23
std: 1.42
shape: (100,)

PS: You don't have to use np.random.seed that's just to make the random generator consistent with the output every time the code is run.

EDIT: Also, if you want to use a t-test on python you can use scipy.stats, thus if you want to calculate the T-test for the means of two independent samples use scipy.stats.ttest_ind, or if you want to calculate the t-test on two related samples use scipy.stats.ttest_rel

Licenciado em: CC-BY-SA com atribuição

Não afiliado a datascience.stackexchange