Skip to content
GitLab
Menu
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Pavlo Beylin
MaD Patch Yolov5
Commits
19d03a95
Unverified
Commit
19d03a95
authored
Aug 15, 2021
by
Glenn Jocher
Committed by
GitHub
Aug 15, 2021
Browse files
Remove DDP process group timeout (#4422)
parent
4e65052f
Changes
2
Hide whitespace changes
Inline
Side-by-side
train.py
View file @
19d03a95
...
...
@@ -493,7 +493,7 @@ def main(opt):
assert
not
opt
.
sync_bn
,
'--sync-bn known training issue, see https://github.com/ultralytics/yolov5/issues/3998'
torch
.
cuda
.
set_device
(
LOCAL_RANK
)
device
=
torch
.
device
(
'cuda'
,
LOCAL_RANK
)
dist
.
init_process_group
(
backend
=
"nccl"
if
dist
.
is_nccl_available
()
else
"gloo"
,
timeout
=
timedelta
(
seconds
=
60
)
)
dist
.
init_process_group
(
backend
=
"nccl"
if
dist
.
is_nccl_available
()
else
"gloo"
)
# Train
if
not
opt
.
evolve
:
...
...
utils/torch_utils.py
View file @
19d03a95
...
...
@@ -35,10 +35,10 @@ def torch_distributed_zero_first(local_rank: int):
Decorator to make all processes in distributed training wait for each local_master to do something.
"""
if
local_rank
not
in
[
-
1
,
0
]:
dist
.
barrier
()
dist
.
barrier
(
device_ids
=
[
local_rank
]
)
yield
if
local_rank
==
0
:
dist
.
barrier
()
dist
.
barrier
(
device_ids
=
[
0
]
)
def
init_torch_seeds
(
seed
=
0
):
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment