They have L3 cache, and the detection code incorrectly assumed this is a Xeon
Irwindale variant due to an old and no longer valid classification check.
Correctly handle the XEON_IRWIN subcode and add an entry in the matchtable
to fix Sandy Bridge-E Xeon.
- move the INLINE_ASM_SUPPORTED guards outside the body of exec_cpuid, as
suggested by Genoil;
- copy the asm code of busy_sse_loop to masm-x64.asm. Some fixup was
required, because the microsoft calling convention doesn't expect
xmm6 & xmm7 to be clobbered in functions.
Confirmed that --clock-ic from cpuid_tool works with the resulting library.
dpkg-buildpackage build debian packages
new file: debian/README.Debian
new file: debian/README.source
new file: debian/changelog
new file: debian/compat
new file: debian/control
new file: debian/copyright
new file: debian/docs
new file: debian/libcpuid-dev.install
new file: debian/libcpuid-doc.docs
new file: debian/libcpuid.install
new file: debian/rules
new file: debian/source/format
Signed-off-by: Zhang, Guodong <gdzhang@linx-info.com>
Priously all fields in the matchtable were treated equal in importance.
With this change, the cache size a taken with half the weight in the decision.
Also add detection entries for some more recent Haswells, and the respective
tests. These are an i5 Haswell from a Mac Book Pro, and a i7 Haswel from
Thinkpad T540.
- Detect AVX and AVX2 on both Intel and AMD CPUs
- Detect BMI1 and BMI2 instruction sets (BMI2 is only on Haswell, BMI1 is
also present on Bulldozers).
- Fix tests to reflect changes.
The last change to flags detection caused a bunch of tests to fail.
The reason is that they are bogus, all recent Intel chips don't have
RDTSCP indicated in the test files, whereas they have it in reality.
I figured it will be easier to add "--fix" option to run_tests.py,
rather than fixing each testfile by hand.
This is also extended in the Makefile:
"make test" runs the tests and reports discrepancies.
"make fix-tests" fixes any offending tests. This blindly assumes that
libcpuid is sane.
Previously the detection only tested this AMD CPUs and the table check was
only present in recog_amd.c
Thanks to Andrew Roberts for reporting this issue!
Reorganize the detection for Intel Atom CPUs
- no longer make the distinction between single- and dualcore CPUs.
- correctly handle all Pineview and Cedarview CPUs.
- Atom Dual-core (Diamondville) is renamed to just Atom (Diamondville)
- The test with Atom D425 is named "Pineview", while the one with
Atom D525 was incorrectly named "Cedarview". Moving the latter to
atom-pineview-2.test and fixing its codename.
Namely, printing a uint64_t with printf(...%llu...) is considered bad
practice; you need to use the format specifier PRIu64, defined in
inttypes.h. Apparently, it's not safe to assume that
uint64_t == unsigned long long.
However, I don't like this kind of formatting uglyness.
The td variable on my machine is ~30k - and conceivably can't go
above 1M. Moreover, this printf is only for detailed debug purposes.
So it's safe to cast the var to int and print it with %d.
Automake >= 1.12 seems to require AM_PROG_AR to be happy with our
configure.ac. However, this macro is not defined on < 1.12, thus
the ifdef.
Confirmed that the project bootstraps without warnings on both
Fedora 14 and Ubuntu 13.10.
Remove the line with the build date of the library from the
raw serialized file format. It doesn't help anything and it
bothers the openSUSE packaging guys. Having the current
date into a binary triggers a warning:
"Current date containing causes unnecessary package
republishing."
Confirmed that "strings libcpuid.a | grep 2014" no longer
contains the current date.
Also, the sse-width guesswork seems to handle this (wrong) Griffin ext_family
explicitly, so fix it there as well.
Seems that members of ext_family 20 (AMD Fusion based APUs) also are 64-bit,
but they have the authoritative sse width detection bit, so we don't need
to handle them explicitly here.
It seems that our SSE-based speed test is 1 IPC (instructions per clock)
on all current CPUs, and 1.4 IPC on the Bulldozer, which leads to its
result being 40% too high. Correct that in the function.
Instead of one big pile of tests in tests_stash.txt, keep each CPU
example raw data/parsed data in a file, ordered in a tree by
manufacturer and microarchitecture. The 64 .test files have been
extracted from tests_stash.txt. The add_test script is changed to
create_test and it doesn't append to test_stash.txt, instead it
spits out data to be saved in a .test file.
run_tests.py is not refactored yet, to be done in a subsequent commit.