How to do it:
I don't know if you could do it 'purely' in assembly, but: If you make a "proxy class" (lets call it that way) in C that calls the assembly function, and you then write the assembler with the C convention, then, by simply compiling the assembler code:
nasm -felf64 -g -F dwarf assembly_function.asm
and then, using a setup.py file containing:
from distutils.core import setup, Extension
setup(name='assembly_include_name', version='1.0', ext_modules=[Extension('assembly_include_name', ['c_assembly_proxy.c'],extra_objects=["assembly_function.o"])])
you will be able to do what you wanted. Notice that you have to add the parameter "extra_objects" to the Extension constructor in order to tell python to link the assembly code, otherwise it will crash saying that it can't find the function's name.
Why would you do it:
If you want to use SSE instructions (SSE2, SSE3...) regardless the optimization the compiler could make.
Extension api: https://docs.python.org/2/extending/extending.html
disutils.core reference: https://docs.python.org/2/distutils/apiref.html?highlight=distutils.core#module-distutils.core