PIN-SLAM 코드 읽기 - 2. Neural Point Map과 Decoder | Hyungtae Lim Robotics & AI Research

들어가며

이 글에서는 PIN-SLAM의 map representation을 읽는다. 핵심 파일은 두 개이다.

model/neural_points.py
model/decoder.py

PIN-SLAM에서 map은 단순히 point cloud가 아니다. 그렇다고 하나의 거대한 MLP가 모든 공간을 외우는 것도 아니다. PIN-SLAM의 map은 다음 두 개의 조합이다.

NeuralPoints  +  Decoder  =  query 가능한 implicit SDF map

NeuralPoints는 sparse하게 놓인 support point들의 data structure이다. Decoder는 각 neural point 주변에서 SDF를 예측하는 작은 MLP이다.

이 둘을 분리해서 이해해야 코드가 보인다.

입력/출력 파이프라인

먼저 PIN-SLAM의 NeuralPoints와 Decoder가 observation point와 query point를 입력으로 받아 neural point map, decoder input feature, SDF prediction을 만드는 흐름을 살펴보자. 처음 읽을 때 헷갈리는 질문은 “neural point들이 decoder의 입력으로 들어가는가?”이다.

neural point들이 decoder의 입력으로 그대로 들어가는가?

정답부터 말하면 아니다. Decoder에 neural_points의 3D position tensor가 그대로 들어가는 것이 아니다. Decoder에 들어가는 것은 NeuralPoints class의 query_feature()가 만들어낸 feature vector이다.

이 feature vector는 대략 다음 두 가지를 붙인 것이다.

[주변 neural point의 latent feature, query point와 neural point 사이의 local coordinate]

즉 Decoder의 입력은 “map point 자체”가 아니라, 어떤 query point를 주변 neural point 기준으로 해석한 local descriptor라고 보는 게 더 정확하다.

모듈별 입력/출력은 다음과 같다.

모듈	입력	출력	역할
`NeuralPoints.update()`	world frame의 새 observation point들	새 neural point 추가, local map reset	map structure를 확장
`NeuralPoints.query_feature()`	SDF를 알고 싶은 query point들	decoder 입력 feature, neighbor weight, valid neighbor count	neural map을 query 가능한 feature로 변환
`Decoder.sdf()`	`NeuralPoints.query_feature()`가 만든 feature vector	SDF prediction	local implicit field decode
`Mapper.mapping()`	LiDAR ray에서 만든 SDF sample과 label	neural point feature / decoder parameter update	map을 학습
`Tracker.tracking()`	current scan point와 pose guess	refined pose	SDF residual로 scan-to-map registration

따라서 전체 구조는 이렇게 봐야 한다.

current scan point N개
  ├─ Tracking branch
  │    -> pose guess로 scan point를 world/map frame에 올림
  │    -> NeuralPoints.query_feature(): 주변 neural point 탐색
  │    -> Decoder.sdf(): SDF residual 계산
  │    -> pose update
  │
  └─ Mapping branch
       -> DataSampler: sensor frame ray 주변 SDF sample 생성
       -> sample/update point를 pose로 world/map frame에 올림
       -> NeuralPoints.update(): 필요한 위치에 neural point 추가
       -> NeuralPoints.query_feature(): sample 주변 neural point 탐색
       -> Decoder.sdf(): SDF prediction 계산
       -> SDF loss로 map update

여기서 current scan point N개가 항상 곧바로 decoder에 들어가는 것은 아니다. tracking에서는 scan point들이 query point가 되어 SDF를 물어본다. mapping에서는 scan point에서 파생된 near-surface/free-space sample들이 query point가 되어 SDF를 물어본다. 두 경우 모두 decoder 앞에는 반드시 NeuralPoints.query_feature()가 있다.

N개의 current scan이 들어왔을 때 pipeline

조금 더 직접적으로 보자. 현재 LiDAR scan이 processing된 뒤 N개의 point가 있다고 하자.

P_sensor = {p_1, p_2, ..., p_N}

이 point들은 처음에는 sensor coordinate에 있다. 현재 pose estimate가 \(T_{w \leftarrow s}\)라면 world coordinate point는 다음과 같다.

P_world = T_{w <- s} P_sensor

PIN-SLAM은 이 point들을 두 가지 용도로 쓴다.

1. Map structure를 늘리는 용도

Mapper.process_frame() 안에서 current scan point 또는 surface 근처 sample을 world frame으로 올린 뒤, NeuralPoints.update()에 넣는다.

update_points = transform_torch(frame_point_torch, cur_pose_torch)
neural_points.update(update_points, frame_origin_torch, frame_orientation_torch, frame_id)

여기서 하는 일은 SDF를 예측하는 것이 아니다. 주변에 neural point가 없으면 새 neural point를 추가하는 것이다.

2. SDF를 query하는 용도

tracking이나 mapping에서는 어떤 3D point의 SDF가 필요하다. 이때 바로 decoder를 부르지 않고, 먼저 NeuralPoints.query_feature()를 부른다.

geo_feature, _, weight_knn, nn_count, _ = neural_points.query_feature(query_points)
sdf_pred = sdf_mlp.sdf(geo_feature)

NeuralPoints class의 query_feature()는 각 query point 주변의 neural point K개를 찾고, 각 neighbor의 latent feature와 relative coordinate를 묶어서 decoder가 먹을 수 있는 tensor를 만든다.

따라서 한 query point \(p\)에 대해 내부적으로 일어나는 일은 다음과 같다. 여기서 index \(i\)는 scan point index가 아니라, \(p\) 주변에서 선택된 \(K\)개의 neighbor neural point 중 i번째를 뜻한다.

p
  -> 주변 neural point x_i 탐색
  -> x_i가 들고 있는 geometric latent feature f_i 가져오기
  -> query point p를 x_i 기준으로 본 local offset d_i = p - x_i 계산
  -> decoder input [f_i, d_i] 생성
  -> Decoder([f_i, d_i])로 neighbor별 SDF S_i(p) 예측
  -> 여러 neighbor 결과를 weight로 조합

조금 더 풀면 \(f_i\)는 i번째 neural point의 learnable feature vector이다. default config에서는 보통 8차원 geometric feature로 쓰이고, map geometry를 어떻게 decode할지에 대한 local memory처럼 보면 된다. 반면 \(d_i\)는 query point \(p\)의 위치 정보이다. 다만 world coordinate \(p\)를 그대로 넣는 것이 아니라, neighbor neural point \(x_i\)를 원점으로 봤을 때 \(p\)가 어디에 있는지를 나타내는 3D relative coordinate이다.

이 관점으로 보면 NeuralPoints와 Decoder의 역할이 분명해진다. NeuralPoints는 “주변에 어떤 latent code들이 있는지” 찾아주는 map이고, Decoder는 그 latent code와 local coordinate를 실제 SDF 값으로 바꿔주는 함수이다.

Decoder 구조: Feature Vector에서 SDF로

model/decoder.py의 Decoder는 생각보다 단순하다. MLP를 만들고, NeuralPoints.query_feature()가 만들어준 feature vector를 받아 SDF/semantic/color를 예측한다.

feature_dim = config.feature_dim
input_dim = feature_dim + position_dim

default config 기준으로 feature_dim = 8이고, positional encoding을 쓰지 않으면 local coordinate 3차원이 붙는다. 즉 geometric SDF decoder의 입력은 대략 다음과 같다.

[neural point latent feature 8D, query point의 local coordinate 3D]

따라서 이 경우 decoder에 들어가는 feature vector는 neighbor neural point 하나당 11차원이다. 즉 한 query point \(p\)와 그 주변 neighbor \(x_i\) 하나를 기준으로 보면,

f_i: i번째 neighbor neural point의 geometric latent feature, 8D
d_i: query point p의 i번째 neighbor 기준 local coordinate, 3D
[f_i, d_i]: decoder input, 11D

여기서 local coordinate란 world coordinate의 절대 위치가 아니다. query point \(p\)가 neighbor neural point \(x_i\)를 원점으로 봤을 때 어디에 있는지를 나타내는 relative coordinate이다. 가장 단순하게 쓰면 다음과 같다.

\[d_i = p - x_i\]

즉 \(d_i\)는 “이 query point가 이 neural point로부터 x/y/z 방향으로 얼마나 떨어져 있는가”를 나타낸다. PGO 이후에는 neural point orientation을 반영해 이 relative vector를 neural point의 local frame 쪽으로 rotate해서 쓴다.

여기서 다시 강조하자면, Decoder에 neural point의 position array 전체가 들어가는 것이 아니다. 한 query point 주변에서 고른 neighbor neural point의 latent feature와, 그 query point가 neighbor 기준으로 어디에 있는지를 나타내는 local coordinate가 들어간다.

그리고 출력은 SDF scalar 하나이다.

def sdf(self, features):
    out = self.mlp(features).squeeze(1) * self.sdf_scale
    return out

optional하게 semantic label probability나 color도 같은 구조로 예측할 수 있다. 하지만 PIN-SLAM을 처음 읽을 때는 SDF만 보면 된다.

초보자용으로 말하면 Decoder는 이런 함수이다.

이 neural point 근처에서,
이 local coordinate가 surface에서 얼마나 떨어졌는가?

NeuralPoints는 무엇을 저장하나

NeuralPoints.__init__()를 보면 map이 어떤 tensor들로 구성되는지 드러난다. 중요한 것은 다음이다.

변수	의미
`neural_points`	neural point 위치, shape `(N, 3)`
`point_orientations`	각 neural point의 orientation quaternion, shape `(N, 4)`
`geo_features`	geometric latent feature, shape `(N+1, feature_dim)`
`point_ts_create`	neural point가 생성된 frame id
`point_ts_update`	마지막으로 update된 frame id
`point_certainties`	query/training 과정에서 누적되는 안정도
`buffer_pt_index`	voxel hash table
`local_neural_points`	현재 frame 주변 local map
`local_geo_features`	local map의 trainable feature view
`global2local`	global neural point index를 local index로 바꾸는 lookup

여기서 geo_features가 (N, feature_dim)이 아니라 (N+1, feature_dim)인 점이 처음에는 이상해 보인다. 코드를 보면 마지막에 padding feature를 하나 둔다. 이건 neighbor search에서 valid neural point를 못 찾은 자리를 처리하기 위한 dummy feature row이다.

상황을 조금 더 구체적으로 보자. query_feature()는 query point마다 주변 neural point를 최대 \(K\)개 찾으려고 한다. 하지만 모든 query point가 항상 \(K\)개의 valid neighbor를 갖는 것은 아니다. 예를 들어 아직 map이 없는 빈 공간을 query하거나, local map boundary 바깥쪽을 query하면 일부 neighbor slot이 비게 된다.

이때 선택지는 두 가지이다.

방법 A:
  invalid neighbor를 -1 같은 특수 index로 표시하고,
  feature gather 전에 매번 mask로 따로 처리한다.

방법 B:
  feature table 마지막에 dummy row를 하나 추가해두고,
  invalid neighbor가 나오면 그 dummy row index를 가리키게 한다.

PIN-SLAM은 B에 가까운 방식을 쓴다. 그래서 실제 neural point가 \(N\)개라면 feature table은 (N, feature_dim)이 아니라 (N+1, feature_dim)이 된다. 앞의 0 ... N-1 row는 실제 neural point feature이고, 마지막 N번째 row는 invalid/padding neighbor가 참조할 수 있는 안전한 feature slot이다.

이렇게 해두면 geo_features[idx]처럼 tensor indexing을 할 때, invalid neighbor 때문에 index error가 나거나 shape이 깨지는 일을 줄일 수 있다. 나중에 weight_vector나 nn_counts로 valid neighbor만 실제 SDF 조합에 반영하면 된다. 즉 padding feature는 “없는 neighbor도 tensor 연산 안에서는 자리를 차지하게 해주는 placeholder”라고 보면 된다.

중요한 해석은 이것이다.

neural point는 위치만 가진 점이 아니라, local SDF shape을 표현하는 latent code를 함께 가진 map element이다.

Voxel Hash가 필요한 이유

LiDAR SLAM에서 map point가 많아지면 nearest neighbor search가 병목이 된다. PIN-SLAM은 neural point를 voxel grid 위에 sparse하게 두고, hash table로 빠르게 찾는다.

초기화에서 큰 prime number를 만든다.

self.primes = torch.tensor([73856093, 19349669, 83492791])
self.buffer_pt_index = torch.full((self.buffer_size,), -1)

point coordinate를 voxel index로 바꾼 뒤,

grid_coords = (sample_points / resolution).floor()
hash = (grid_coords * primes).sum(-1) % buffer_size

이 hash slot에 neural point index를 저장한다. 정확한 KD-tree가 아니라 voxel hash를 쓰는 이유는 online SLAM에서 빠른 insert/query가 중요하기 때문이다.

update(): 새 observation으로 neural point 추가하기

NeuralPoints.update()는 현재 scan에서 새 neural point를 추가하는 함수이다. 입력은 world frame에 올라간 point들이다.

def update(points, sensor_position, sensor_orientation, cur_ts):

흐름은 다음과 같다.

입력 point를 voxel downsample한다.
각 sample point의 voxel hash를 계산한다.
이미 해당 voxel에 neural point가 있는지 본다.
비어 있거나, hash collision이거나, 충분히 오래된 위치라면 새 neural point를 만든다.
새 point에 identity orientation, random feature, timestamp, certainty를 붙인다.
현재 sensor 주변 local map을 다시 만든다.

코드상 핵심은 update_mask이다.

update_mask = (hash_idx == -1) | (dist2 > 3 * resolution**2)

여기에 temporal condition도 더한다. 이미 비슷한 위치에 neural point가 있어도, travel distance 기준으로 너무 오래된 point라면 새 point를 추가할 수 있다. 이렇게 하면 loop closure 전에는 같은 장소가 중복으로 들어갈 수 있다. 하지만 이 중복이 꼭 나쁜 것은 아니다. PGO 전까지는 odometry drift가 있으므로, 과거 map과 현재 observation을 억지로 하나로 merge하면 오히려 map이 망가질 수 있다.

이 부분이 PIN-SLAM스럽다.

local consistency를 유지하면서 일단 neural point를 추가하고, loop closure 이후 global correction으로 map을 맞춘다.

reset_local_map(): 지금 학습할 map만 잘라내기

전체 neural point map은 계속 커진다. 하지만 매 frame마다 전체 map을 다 query하고 학습하면 너무 느리다. 그래서 PIN-SLAM은 현재 sensor 주변의 local map만 만든다.

reset_local_map()은 두 가지 기준으로 neural point를 필터링한다.

시간/이동 거리 기준 현재 frame과 travel distance가 가까운 neural point만 남긴다.
공간 거리 기준 현재 sensor position에서 local_map_radius 안에 있는 point만 남긴다.

그 결과가 다음 tensor들이다.

self.local_neural_points
self.local_point_orientations
self.local_point_certainties
self.local_geo_features
self.local_mask
self.global2local

여기서 특히 중요한 부분은 local_geo_features이다.

self.local_geo_features = nn.Parameter(self.geo_features[local_mask])

즉 mapping optimization은 global feature tensor를 직접 매번 최적화하지 않는다. 현재 local map feature를 nn.Parameter로 잡고 학습한 다음, 나중에 assign_local_to_global()로 global map에 다시 써 넣는다.

이 구조 덕분에 매 frame의 optimization 대상이 local map으로 제한된다.

NeuralPoints.query_feature(): Map Query 핵심 함수

NeuralPoints class의 query_feature()는 PIN-SLAM에서 가장 중요한 함수 중 하나이다. Tracker, Mapper, Mesher가 모두 결국 이 함수를 통해 neural map을 query한다.

입력은 query point들이다.

query_feature(query_points, query_ts=None, training_mode=True, query_locally=True)

출력은 다음이다.

geo_features_vector
color_features_vector
weight_vector
nn_counts
queried_certainty

이 함수의 역할을 한 줄로 말하면 다음과 같다.

query point 주변의 K개 neural point를 찾고, 각 neural point의 latent feature와 local coordinate를 합쳐 decoder에 넣을 feature vector를 만든다.

조금 더 자세히 보면 다음 순서이다.

1. 주변 voxel에서 candidate neural point 찾기

dists2, idx = self.radius_neighborhood_search(query_points, ...)

voxel hash table에서 주변 cell들을 보고 candidate neural point index를 가져온다.

2. local map index로 변환

local query를 할 때는 global index를 local index로 바꿔야 한다.

idx = self.global2local[idx]

이 부분 때문에 reset_local_map()에서 global2local을 만들어둔 것이다.

3. 거리 기준으로 K개만 고르기

candidate가 많을 수 있으므로 distance 기준으로 sort하고 query_nn_k개만 남긴다.

dists2 = sorted_dist2[:, :nn_k]
idx = sorted_idx[:, :nn_k]

4. feature와 local coordinate 만들기

각 query point \(p\)와 neighbor neural point \(x_i\) 사이의 vector를 만든다. 이 vector가 위에서 말한 \(d_i\)이다.

neighb_vector = query_points.view(-1, 1, 3) - local_neural_points[idx]

즉 코드의 neighb_vector는 다음 수식에 해당한다.

\[d_i = p - x_i\]

여기서 \(x_i\)는 i번째 neighbor neural point의 position이고, \(d_i\)는 query point가 그 neural point 기준으로 어느 방향, 어느 거리만큼 떨어져 있는지를 나타낸다.

PGO 이후에는 neural point orientation도 반영한다.

if self.after_pgo:
    neighb_vector = apply_quaternion_rotation(quat, neighb_vector)

그 다음 latent feature \(f_i\)와 local coordinate \(d_i\)를 concatenate한다.

geo_features_vector = torch.cat((geo_features, neighb_vector), dim=2)

이게 decoder 입력이다. 즉 geo_features_vector의 한 원소는 대략 다음처럼 읽으면 된다.

[f_i, d_i]
  = [i번째 neighbor neural point의 latent feature,
     query point p의 i번째 neighbor 기준 local coordinate]

5. inverse distance weighting

여러 neural point가 하나의 query point에 영향을 주므로, 거리 기반 weight를 만든다.

weight_vector = 1.0 / (dists2 + eps)
weight_vector = weight_vector / weight_row_sums

즉 가까운 neural point의 영향을 더 크게 준다.

6. certainty update

training mode에서는 query된 neural point의 certainty를 누적한다.

self.local_point_certainties.scatter_add_(...)

나중에 prune할 때 certainty가 낮은 point를 제거할 수 있다.

weighted_first 옵션

PIN-SLAM 코드를 읽다 보면 weighted_first가 자주 나온다. 이 옵션은 주변 K개 neural point를 어떻게 합칠지 결정한다.

weighted_first = True

먼저 feature vector를 weighted average한 뒤, decoder를 한 번 호출한다.

weighted feature = Σ w_i [f_i, local_i]
S(p) = MLP(weighted feature)

장점은 빠르다.

weighted_first = False

각 neighbor마다 decoder를 호출한 뒤, SDF 값을 weighted average한다.

S_i(p) = MLP([f_i, local_i])
S(p) = Σ w_i S_i(p)

이쪽이 개념적으로 더 자연스럽지만 계산량이 더 크다. default config에서는 weighted_first=True이다.

adjust_map(): PGO 이후 neural point map 변형

PIN-SLAM에서 map이 “elastic”하다고 말할 수 있는 이유는 adjust_map() 때문이다.

PGO 전후 pose 차이가 frame마다 주어지면, 각 neural point의 timestamp를 기준으로 어떤 pose correction을 받을지 정한다.

used_ts = self.point_ts_create
self.neural_points = transform_batch_torch(
    self.neural_points, pose_diff_torch[used_ts]
)

orientation도 같이 바꾼다.

diff_quat_torch = rotmat_to_quat(pose_diff_torch[:, :3, :3])
self.point_orientations = quat_multiply(
    diff_quat_torch[used_ts], self.point_orientations
)

이게 중요한 이유는, loop closure 이후 trajectory가 바뀌었을 때 map point들도 같이 이동해야 하기 때문이다. 그렇지 않으면 pose graph는 좋아졌는데 map은 여전히 과거 odometry drift를 품고 있는 이상한 상태가 된다.

recreate_hash(): Map 변경 이후 Hash 재구성

PGO로 neural point 위치가 바뀌거나, pruning/merging을 하면 hash table도 더 이상 맞지 않는다. 그래서 recreate_hash()가 필요하다.

이 함수는 현재 neural point 위치를 기준으로 voxel hash를 다시 만든다. 옵션에 따라 duplicate neural point를 유지할 수도 있고, voxel 단위로 merge할 수도 있다.

neural_points.recreate_hash(...)

main loop에서 PGO 이후 바로 이 함수를 부르는 이유도 여기에 있다. map point 위치가 바뀌었으니, 다음 tracking/mapping query가 맞게 동작하도록 hash를 새로 만들어야 한다.

이 글의 핵심 정리

PIN-SLAM의 map representation은 다음처럼 이해하면 된다.

Neural point i:
  position x_i
  orientation q_i
  latent feature f_i
  timestamp / certainty

Query point p:
  주변 neural point K개 탐색
  local vector p - x_i 계산
  [f_i, p - x_i]를 Decoder에 입력
  SDF prediction들을 거리 기반으로 조합

초보자 입장에서 가장 중요한 점은 두 가지이다.

NeuralPoints는 sparse map data structure이다.
Decoder는 그 map을 SDF field로 읽어내는 작은 neural function이다.

즉 PIN-SLAM의 implicit map은 “MLP 하나”도 아니고 “point cloud 하나”도 아니다. latent feature를 가진 sparse neural points와 shared decoder의 조합이다.

다음 글에서는 이 neural map이 실제 SLAM loop에서 어떻게 쓰이는지 본다. 특히 Tracker가 SDF gradient로 pose를 어떻게 고치고, Mapper가 LiDAR ray sample로 SDF map을 어떻게 학습하는지 살펴볼 것이다.

PIN-SLAM 코드 읽기 시리즈입니다.

PIN-SLAM 코드 읽기 - 2. Neural Point Map과 Decoder

SDF map은 어디에 저장되고 어떻게 query되는가