معاقبة شار بأمان* لمضاعفة في ج

https://stackoverflow.com/questions/222266

03-07-2019
|

سؤال

في المصدر المفتوح برنامج كتبته, ، أقرأ البيانات الثنائية (التي كتبها برنامج آخر) من ملف وإخراج ints ، الزوجي ، وأنواع البيانات الأخرى المتنوعة. أحد التحديات هو أنه يحتاج إلى تشغيل آلات 32 بت و 64 بت من كلا endiannesses ، مما يعني أنني انتهى بي الأمر إلى القيام بالكثير من التقلبات المنخفضة المستوى. أنا أعرف (قليلاً) قليلاً عن المعاقبة والتعليق الصارم وأريد التأكد من أنني أفعل الأشياء بالطريقة الصحيحة.

في الأساس ، من السهل التحويل من char* إلى int من مختلف الأحجام:

int64_t snativeint64_t(const char *buf) 
{
    /* Interpret the first 8 bytes of buf as a 64-bit int */
    return *(int64_t *) buf;
}

ولدي مجموعة من وظائف الدعم لتبديل أوامر البايت حسب الحاجة ، مثل:

int64_t swappedint64_t(const int64_t wrongend)
{
    /* Change the endianness of a 64-bit integer */
    return (((wrongend & 0xff00000000000000LL) >> 56) |
            ((wrongend & 0x00ff000000000000LL) >> 40) |
            ((wrongend & 0x0000ff0000000000LL) >> 24) |
            ((wrongend & 0x000000ff00000000LL) >> 8)  |
            ((wrongend & 0x00000000ff000000LL) << 8)  |
            ((wrongend & 0x0000000000ff0000LL) << 24) |
            ((wrongend & 0x000000000000ff00LL) << 40) |
            ((wrongend & 0x00000000000000ffLL) << 56));
}

في وقت التشغيل ، يكتشف البرنامج endianness للآلة ويعين أحد ما سبق إلى مؤشر الوظيفة:

int64_t (*slittleint64_t)(const char *);
if(littleendian) {
    slittleint64_t = snativeint64_t;
} else {
    slittleint64_t = sswappedint64_t;
}

الآن ، يأتي الجزء الصعب عندما أحاول إلقاء char* على مضاعفة. أرغب في إعادة استخدام رمز التبادل الإنديان مثل ذلك:

union 
{
    double  d;
    int64_t i;
} int64todouble;

int64todouble.i = slittleint64_t(bufoffset);
printf("%lf", int64todouble.d);

انتهى بي الأمر باستخدام ستيف جيسوب الإجابة لأن وظائف التحويل مكتوبة لاستخدام memcpy ، مثل ذلك:

int64_t snativeint64_t(const char *buf) 
{
    /* Interpret the first 8 bytes of buf as a 64-bit int */
    int64_t output;
    memcpy(&output, buf, 8);
    return output;
}

تم تجميعها في نفس المجمع بالضبط مثل الكود الأصلي الخاص بي:

snativeint64_t:
        movq    (%rdi), %rax
        ret

من بين الاثنين ، يعبر إصدار Memcpy بشكل أكثر صراحة عما أحاول القيام به ويجب أن يعمل حتى على المترجمين الأكثر سذاجة.

آدم ، كانت إجابتك رائعة أيضًا وتعلمت الكثير منها. شكرا على نشرك!

المحلول

نظرًا لأنك تعرف ما يكفي عن تطبيقك للتأكد من أن int64_t و double هما نفس الحجم ، ولديك تمثيلات تخزين مناسبة ، فقد تخاطر بمثابة memcpy. ثم ليس عليك حتى التفكير في الاسم المستعار.

نظرًا لأنك تستخدم مؤشر وظيفة لوظيفة قد تكون سهلة الإبطال إذا كنت على استعداد لإصدار ثنائيات متعددة ، يجب ألا يكون الأداء مشكلة كبيرة على أي حال ، ولكن قد ترغب في معرفة أن بعض المترجمين يمكن أن يكونوا مميزين مميزين للغاية - بالنسبة لأحجام عدد صحيح صغير ، يمكن إثبات مجموعة من الأحمال والمتاجر ، وقد تجد حتى أن المتغيرات يتم تحسينها بالكامل ويقوم المترجم "النسخ" ببساطة بإعادة تعيين فتحات المكدس التي تستخدمها للمتغيرات ، تمامًا مثل الاتحاد.

int64_t i = slittleint64_t(buffoffset);
double d;
memcpy(&d,&i,8); /* might emit no code if you're lucky */
printf("%lf", d);

فحص الكود الناتج ، أو مجرد ملف تعريفه. هناك احتمالات حتى في أسوأ الحالات لن تكون بطيئة.

بشكل عام ، على الرغم من ذلك ، فإن القيام بأي شيء ذكي للغاية مع Byteswapping يؤدي إلى مشكلات قابلية النقل. يوجد Abis مع الزوجي الأوسط الأوسط ، حيث تكون كل كلمة صغيرة إنديان ، لكن الكلمة الكبيرة تأتي أولاً.

في العادة ، يمكنك التفكير في تخزين الزوجي الخاص بك باستخدام SprintF و SSCANF ، ولكن بالنسبة لمشروعك ، فإن تنسيقات الملفات ليست تحت سيطرتك. ولكن إذا كان تطبيقك يتجرف فقط IEEE من ملف إدخال بتنسيق واحد إلى ملف إخراج بتنسيق آخر (لست متأكدًا مما إذا كان ذلك ، لأنني لا أعرف تنسيقات قاعدة البيانات المعنية ، ولكن إذا كان الأمر كذلك) يمكن أن تنسى حقيقة أنه مزدوج ، لأنك لا تستخدمه للحساب على أي حال. ما عليك سوى التعامل معه على أنه char غير شفاف [8] ، والذي يتطلب التغلب على البايت فقط إذا اختلفت تنسيقات الملف.

نصائح أخرى

I highly suggest you read Understanding Strict Aliasing. Specifically, see the sections labeled "Casting through a union". It has a number of very good examples. While the article is on a website about the Cell processor and uses PPC assembly examples, almost all of it is equally applicable to other architectures, including x86.

The standard says that writing to one field of a union and reading from it immediately is undefined behaviour. So if you go by the rule book, the union based method won't work.

Macros are usually a bad idea, but this might be an exception to the rule. It should be possible to get template-like behaviour in C using a set of macros using the input and output types as parameters.

As a very small sub-suggestion, I suggest you investigate if you can swap the masking and the shifting, in the 64-bit case. Since the operation is swapping bytes, you should be able to always get away with a mask of just 0xff. This should lead to faster, more compact code, unless the compiler is smart enough to figure that one out itself.

In brief, changing this:

(((wrongend & 0xff00000000000000LL) >> 56)

into this:

((wrongend >> 56) & 0xff)

should generate the same result.

Edit:
Removed comments regarding how to effectively store data always big endian and swapping to machine endianess, as questioner hasn't mentioned another program writes his data (which is important information).

Still if the data needs conversion from any endian to big and from big to host endian, ntohs/ntohl/htons/htonl are the best methods, most elegant and unbeatable in speed (as they will perform task in hardware if CPU supports that, you can't beat that).

Regarding double/float, just store them to ints by memory casting:

double d = 3.1234;
printf("Double %f\n", d);
int64_t i = *(int64_t *)&d;
// Now i contains the double value as int
double d2 = *(double *)&i;
printf("Double2 %f\n", d2);

Wrap it into a function

int64_t doubleToInt64(double d)
{
    return *(int64_t *)&d;
}

double int64ToDouble(int64_t i)
{
    return *(double *)&i;
}

Questioner provided this link:

http://cocoawithlove.com/2008/04/using-pointers-to-recast-in-c-is-bad.html

as a prove that casting is bad... unfortunately I can only strongly disagree with most of this page. Quotes and comments:

As common as casting through a pointer is, it is actually bad practice and potentially risky code. Casting through a pointer has the potential to create bugs because of type punning.

It is not risky at all and it is also not bad practice. It has only a potential to cause bugs if you do it incorrectly, just like programming in C has the potential to cause bugs if you do it incorrectly, so does any programming in any language. By that argument you must stop programming altogether.

Type punning
A form of pointer aliasing where two pointers and refer to the same location in memory but represent that location as different types. The compiler will treat both "puns" as unrelated pointers. Type punning has the potential to cause dependency problems for any data accessed through both pointers.

This is true, but unfortunately totally unrelated to my code.

What he refers to is code like this:

int64_t * intPointer;
:
// Init intPointer somehow
:
double * doublePointer = (double *)intPointer;

Now doublePointer and intPointer both point to the same memory location, but treating this as the same type. This is the situation you should solve with a union indeed, anything else is pretty bad. Bad that is not what my code does!

My code copies by value, not by reference. I cast a double to int64 pointer (or the other way round) and immediately deference it. Once the functions return, there is no pointer held to anything. There is a int64 and a double and these are totally unrelated to the input parameter of the functions. I never copy any pointer to a pointer of a different type (if you saw this in my code sample, you strongly misread the C code I wrote), I just transfer the value to a variable of different type (in an own memory location). So the definition of type punning does not apply at all, as it says "refer to the same location in memory" and nothing here refers to the same memory location.

int64_t intValue = 12345;
double doubleValue = int64ToDouble(intValue);
// The statement below will not change the value of doubleValue!
// Both are not pointing to the same memory location, both have their
// own storage space on stack and are totally unreleated.
intValue = 5678;

My code is nothing more than a memory copy, just written in C without an external function.

int64_t doubleToInt64(double d)
{
    return *(int64_t *)&d;
}

Could be written as

int64_t doubleToInt64(double d)
{
    int64_t result;
    memcpy(&result, &d, sizeof(d));
    return result;
}

It's nothing more than that, so there is no type punning even in sight anywhere. And this operation is also totally safe, as safe as an operation can be in C. A double is defined to always be 64 Bit (unlike int it does not vary in size, it is fixed at 64 bit), hence it will always fit into a int64_t sized variable.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow