Closed Bug 669384 Opened 13 years ago Closed 10 years ago

Windows 64-bit leak builds fail to buildsymbols

Categories

(Toolkit :: Crash Reporting, defect)

x86_64
Windows Server 2008
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: armenzg, Unassigned)

References

Details

(Whiteboard: [debug win64] It seems to be a Microsoft bug. Reach them with a smaller test case)

Attachments

(2 files)

After compilation the buildsymbols step is run but it times out.
Make buildsymbols would stop doing anything when it reached xul.pdb [1]

If I run manually $srcdir/toolkit/crashreporter/tools/win32/dump_syms_vc1500.exe $objdir/toolkit/library/xul.pdb I get a lot of output and at some point it would just stop and that's it [2].

dump_syms_vc1500.exe will go up to ~294,000 K memory usage and just stop (without finishing) and will go from 3-4% CPU to a 25% constant CPU usage.

I have no idea of what is going on but for now I will have to disable the step for Windows 64-bit leak builds.

[1]
Processing file: .\toolkit\library\xul.pdb

[2]
FUNC 2248b00 15 0 `getReallocHooker'::`2'::`dynamic atexit destructor for 'gReal
locHooker''
FUNC 2248b20 15 0 `getNewHooker'::`2'::`dynamic atexit destructor for 'gNewHooke
r''
FUNC 2248b40 15 0 `getDeleteHooker'::`2'::`dynamic atexit destructor for 'gDelet
eHooker''
FUNC 2248b60 15 0 `getVecNewHooker'::`2'::`dynamic atexit destructor for 'gVecNe
wHooker''
FUNC 2248b80 15 0 `getVecDeleteHooker'::`2'::`dynamic atexit destructor for 'gVe
cDeleteHooker''
FUNC 2248ba0 15 0 `dynamic atexit destructor for 'tracked_objects::ThreadData::l
ist_lock_''
FUNC 2248bc0 15 0 `tracked_objects::Comparator::ParseKeyphrase'::`2'::`dynamic a
texit destructor for 'key_map''
FUNC 2248be0 15 0 `anonymous namespace'::`dynamic atexit destructor for 'gProces
sLog''
FUNC 2248c00 15 0 mozilla::gl::`dynamic atexit destructor for 'gGlobalContext''
FUNC 2248c20 15 0 mozilla::gl::`dynamic atexit destructor for 'gGlobalContext''
FUNC 2248c40 15 0 `dynamic atexit destructor for 'js::JSScriptedProxyHandler::si
ngleton''
FUNC 2248c60 15 0 `dynamic atexit destructor for 'JSWrapper::singleton''
FUNC 2248c80 15 0 `dynamic atexit destructor for 'JSCrossCompartmentWrapper::sin
gleton''
FUNC 2248ca0 15 0 js::`dynamic atexit destructor for 'LogController''
Here is the zipped up xul.pdb file.
http://people.mozilla.com/~armenzg/win64/xul.zip

ted was able to reproduce this locally:
> ted: FWIW, I'm testing with dump_syms.exe built with VC2010 and I see the
> same thing, but it might be a property of the .PDB file being built by VC2008
I ran this in a debugger, the hang appears to be entirely within the MS DIA DLLs, so this might just be a bug in Microsoft's tools. :-/
msdia90!CDiaEnumSymbolsByAddr::Next doesn't return...  Also although I use win64 version of dump_sym.exe, it still hangs...
- If we use SHARED_JS=1 (build mozjs.dll), this doesn't occurs.  Current Win64 build has no mozjs.dll (integrated into xul.dll).
- This works with --enable-options even if --enable-debug
It seems like this is either a bug in msdia, or a bug in the PDB files that the compiler produces. Armen said he was going to try building a debug build with vc2010 to see if the issue happens there as well.
I have a mobile release which is not going cool and will not have time today to work on this.
Very interesting. I just noticed that the optimized build symbols step takes 45 mins rather than the 5 mins that it takes on WINNT 5.2.

Here is a log in case it has some interesting information:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1310032964.1310048195.21709.gz&fulltext=1
No longer blocks: 575912
I am building now a leak build with VS2010.
I have the same problem but with dump_syms_vc1600.exe; do we have a 64-bit version of that file?
It freezes at ~320,000 K of memory usage and 25% CPU usage.

If I kill the process through the task manager and then I kill makecab the buildsymbols output continues.

85929934 firefox-8.0a1.en-US.win64-x86_64.crashreporter-symbols-full.zip
17928151 firefox-8.0a1.en-US.win64-x86_64.crashreporter-symbols.zip


Any ideas on what I could try next?
(In reply to comment #9)
> I have the same problem but with dump_syms_vc1600.exe; do we have a 64-bit
> version of that file?

It still occurs on 64-bit version.
I created this patch to skip creating symbols but I found that make package fails (bug 670915) as well.
Nevertheless, I wanted to attach the patch just in case it happens that we decide taking it if "make package" gets fixed before this bug.
FWIW, I don't know if there's anything we can fix, this seems to be entirely a bug in Microsoft's library triggered by the PDB files that get produced in the debug build. The only thing I can suggest trying is a debug build with Visual C++ 2010 to see if the PDB files that version generates still trigger the problem.
Oh, sorry, I missed comment 9. I don't really have any other suggestions. :-/
Anyone that I can approach to reach Microsoft?
Would you suggest we disable buildsymbols on leak builds for Windows 64-bit so at least we get coverage until we figure this issue? (pending bug 670915)
Depends on: 670915
That sounds like a reasonable approach for now. I don't know who we'd contact at Microsoft, we might have to create a smaller testcase to file a bug with them.
Whiteboard: It seems to be a Microsoft bug. Reach them with a smaller test case
Whiteboard: It seems to be a Microsoft bug. Reach them with a smaller test case → [debug win64] It seems to be a Microsoft bug. Reach them with a smaller test case
Blocks: 685887
I have filed bug 685887 to disable symbols for debug builds for now.

Meanwhile we will need to create a smaller testcase to bug Microsoft with.
Anyone on your side that could help creating the testcase?
filed to connect.microsoft.com.
https://connect.microsoft.com/VisualStudio/feedback/details/722366/idiaenumsymbolsbyaddr-next-doesnt-return-huge-pdb

Also, When I test VS2010 with PGO on try server yesterday, this problem occurs even if it is optimized build.
At connect.microsoft.com bug is closed as fixed. I guess it means it no longer presents in VS 2012.
Blocks: 893139
A few months ago someone on the Breakpad mailing list posted some code he had written using various open source bits to dump PDB files:
https://groups.google.com/forum/#!topic/google-breakpad-discuss/F0jMWxmWk0M

This might be worth looking into if this isn't fixed in a toolchain we can use for our Win64 builds. I just replied to his list post, the only sticking point for us is probably the licensing of the code he used to build it.
Per Makoto in bug 893139 comment 15, this is fixed in VS2012.
Blocks: 886640
Should we be adding VS2012 to our Windows build slaves?
Flags: needinfo?(catlee)
Depends on: 946859
I spoke with catlee today and he said we should look into getting VS2012 on our build machines.  markco of RelOps says it should be easy to do, so I've requested a staging install in bug 946859.
Flags: needinfo?(catlee)
Blocks: 784681
Depends on: 960561
ted: do you see any issue with going straight to VS2013?  See bug 914523 for background.
Flags: needinfo?(ted)
No, I think that's the right move. If the issue was fixed in 2012 then 2013 should be fine. Ideally we'll be moving our x86 builds to 2013 in the near future as well, so it makes sense to try to get to the same version for both platforms.
Flags: needinfo?(ted)
We are going to solve this with VS2013 instead.  Some context from #releng:


[2:34pm] jhopkins: markco: remember the work we did to get VS2012 installed via GPO?  we are instead going to skip to VS2013 (see bug 914523).  could you look into installing VS2013 alongside VS2010 via GPO?
[2:36pm] markco: jhopkins: yes, what is the time frame on it?
[2:36pm] jhopkins: dmajor: ^
[2:36pm] ted: "now"
[2:39pm] dmajor: markco: the compiler update that we want is still in preview status, so it's not a "we need it yesterday" thing, but the sooner we can start testing, the better shape we'll be in when it's released
[2:40pm] jhopkins: dmajor: what is the expected release date?
[2:40pm] dmajor: jhopkins: AFAIK no official word but there are hints that it may be a month or two
[2:41pm] jhopkins: dmajor: ok.  i assume that will be an acceptable delay for win64 debug symbols in bug 669384.  cc: vlad
[2:42pm] markco: dmajor: rgr could you open up a relops bug for it please?
[2:42pm] dmajor: markco: bug 914523 or something separate?
[2:43pm] markco: dmajor, that'll work. I will open up a blocking bug on that.
[2:43pm] dmajor: thanks!
[2:43pm] ted: jhopkins: yeah, the long-term benefits make sense here
[2:43pm] ted: standardizing on the same toolchain version
[2:43pm] jhopkins: ok, great, sounds like a plan.  thanks all!
Depends on: 914523
So, upstream breakpad landed a patch that might work around this problem:
https://code.google.com/p/google-breakpad/source/detail?r=1316

We could update our in-tree copy of dump_syms and see if it helps.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #29)
> So, upstream breakpad landed a patch that might work around this problem:
> https://code.google.com/p/google-breakpad/source/detail?r=1316
> 
> We could update our in-tree copy of dump_syms and see if it helps.

Am I understanding correctly that the STR are to build dump_syms with VS2010 and run it on a 64-bit xul.pdb?

On a stale copy of the breakpad tree (maybe a few months old) I gave up after 10 minutes. With the revision above, it finished in 9 seconds. It did change the output somewhat; I can't say whether that would cause problems.
(In reply to David Major [:dmajor] (UTC+12) from comment #30)
> Am I understanding correctly that the STR are to build dump_syms with VS2010
> and run it on a 64-bit xul.pdb?

Yes.

> On a stale copy of the breakpad tree (maybe a few months old) I gave up
> after 10 minutes. With the revision above, it finished in 9 seconds. It did
> change the output somewhat; I can't say whether that would cause problems.

Sounds like the build speed issue is fixed then. I think what we need to do is compare the output on a 32-bit PGO build, since I know Chrome doesn't build PGO, so Google wouldn't have tested that thoroughly.
I tried on the 29.0 release, which I hope has PGO. The diff has some of the expected things: some new PUBLIC entries, some FUNC entries picked a different name for comdat-folded functions. 

Interestingly, a large region of duplicates has been removed. There used to be a bunch of FUNC entries that appeared twice in the old dump, with identical contents. The new dump only has them once each.
Cool, sounds like it's just fiddly stuff then. I filed bug 1003085 to update the in-tree dump_syms.
Ehsan: can you merge m-c to date to pick up bug 1003085? That should fix this issue.
Flags: needinfo?(ehsan)
I can, but I don't own the date branch any more.  :-)  Can you or someone else who owns it take over please (not sure who that person would be)?
Flags: needinfo?(ehsan)
Sorry, I was going by:
https://wiki.mozilla.org/ReleaseEngineering/DisposableProjectBranches#BOOKING_SCHEDULE

Is there anyone driving Win64 build testing anymore?
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #36)
> Sorry, I was going by:
> https://wiki.mozilla.org/ReleaseEngineering/
> DisposableProjectBranches#BOOKING_SCHEDULE

Uh-oh, that page says Vlad now.  ;-)

> Is there anyone driving Win64 build testing anymore?

Not that I know of.  Vlad?
Flags: needinfo?(vladimir)
johnath & the firefox team/org owns it now and is actively driving it (or should be)
Flags: needinfo?(vladimir)
Rob Strong says we don't have anyone actively working on Win64 support, FWIW. That's okay, I was just under the impression that we did (we did at one point, certainly) and wanted to poke the right person here.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #39)
> Rob Strong says we don't have anyone actively working on Win64 support, FWIW.

Well, I hear that the Firefox team/org (as vlad mentioned) is actively investigating picking up Win64 again. That said, right now we are creating Nightly builds so people can test them but do not actively develop Win64 builds, from all I know.
Depends on: 1004970
Fixed by bug 1003085.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: