Python decorator register

Posted on 2019-08-09 | In Python

| Words count in article 138

mapping = {}

def register(name):
	def wrapper(function):
		mapping[name] = function
		return function
	return wrapper

def get_neural_network_architecture(architecture_name):
	if architecture_name in mapping:
		return mapping[architecture_name] 
	else:
		raise ValueError('Unknown architecture name: {}'.format(architecture_name))

@register("mlp") 
def mlp(hidden_layers):
	print('running mlp(hidden_layers) ' + str(hidden_layers))

@register("cnn") 
def cnn(hidden_layers, stride): 
	print('running cnn(hidden_layers, stride) ' + str(stride))


if __name__ == '__main__':
	#from ... import get_neural_network_architecture
	f = get_neural_network_architecture("cnn")
	print(f)
	f(2, 1) 

Out:

<function cnn at 0x00000219BA78A950>
running cnn(hidden_layers, stride) 1 

Entropy, cross entropy, KL divergence, JS divergence

Posted on 2019-07-27 | In Mathematics

| Words count in article 454

Entropy:

\[H(X) = - \int_\mathcal{X} P(x) \log P(x) dx\]

Cross entropy:

In information theory, the cross entropy between two probability distributions $P$ and $Q$ over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution $Q$, rather than the true $P$.

\[H(P, Q) = - \int_\mathcal{X} P(x) \log Q(x) dx\]

The definition may be formulated using the Kullback–Leibler divergence $D_{\text{KL}}(P || Q)$ of $Q$ from $P$ (also known as the relative entropy of $P$ with respect to $Q$):

\[H(P, Q) = H(P) + D_{\text{KL}}(P || Q)\]

Kullback–Leibler divergence (also known as the relative entropy of $P$ with respect to $Q$):

\[D_{\text{KL}}(P || Q) = \int_{\mathcal{X}} P(x) \log \frac{P(x)}{Q(x)} dx\]

$D_{\text{KL}}$ achieves the minimum zero when $P(x) =Q(x)$ everywhere.

It is noticeable according to the formula that KL divergence is asymmetric. In cases where $P(x)$ is close to zero, but $Q(x)$ is significantly non-zero, the $Q$’s effect is disregarded. It could cause buggy results when we just want to measure the similarity between two equally important distributions.

However, if we are trying to find approximations for a complex (intractable) distribution $Q(x)$ by a (tractable) approximate distribution $P(x)$, we want to be absolutely sure that any $x$ that would be very improbable to be drawn from $Q(x)$ would also be very improbable to be drawn from $P(x)$. When $P(x)$ is small but $Q(x)$ is not, that’s ok. But when $Q(x)$ is small, this grows very rapidly if $P(x)$ isn’t also small. So if we are looking for $P(x)$ by minimizing KL divergence $D_{\text{KL}}(P || Q)$, it’s very improbable that $P(x)$ will assign a lot of mass on regions where $Q(x)$ is near zero.

Jensen–Shannon divergence is another measure of similarity between two probability distributions. JS divergence is symmetric and more smooth. It is defined by:

\[\text{JSD}(P || Q) = \frac{1}{2}D_{\text{KL}}(P || \frac{P+Q}{2}) + \frac{1}{2}D_{\text{KL}} (Q || \frac{P+Q}{2})\]

Some believe (Huszar, 2015) that one reason behind GANs’ big success is switching the loss function from asymmetric KL divergence in traditional maximum-likelihood approach to symmetric JS divergence.

Arithmetics between Pytorch tensor and Numpy array

Posted on 2019-07-22 | In Python

| Words count in article 344

Arithmetics between Pytorch tensor and Numpy array (without explicit casting) are not allowed.

For the following experiments:

PyTorch version: 1.2.0.dev20190611

Numpy version: 1.16.4

a = np.array([1, 6, 5])
b = torch.tensor([2, 8, 9])
print(a + b)

Out:
TypeError: add(): argument 'other' (position 1) must be Tensor, not numpy.ndarray

a = np.array([1, 6, 5])
b = torch.tensor([2, 8, 9])
print(b + a)

Out:
TypeError: add(): argument 'other' (position 1) must be Tensor, not numpy.ndarray

a = np.array([1, 6, 5])
b = torch.tensor([2, 8, 9])
print(a * b)

Out:
TypeError: mul(): argument 'other' (position 1) must be Tensor, not numpy.ndarray

a = np.array([1, 6, 5])
b = torch.tensor([2, 8, 9])
print(b * a)

Out:
TypeError: mul(): argument 'other' (position 1) must be Tensor, not numpy.ndarray

a = [1, 6, 5]
b = torch.tensor([2, 8, 9])
print(a * b)

Out:
TypeError: mul(): argument 'other' (position 1) must be Tensor, not list

a = 3
b = torch.tensor([2, 8, 9])
print(a * b)

Out:
tensor([ 6, 24, 27])

a = np.array([1, 6, 5])
b = torch.tensor([2, 8, 9])
print(a * b.numpy())

Out:
[ 2 48 45]

a = np.array([1, 6, 5])
b = torch.tensor([2, 8, 9])
print(torch.from_numpy(a) * b)

Out:
tensor([ 2, 48, 45])

According to Can’t call numpy() on Variable that requires grad:

Moving to numpy will break the computation graph and so no gradient will be computed. If you don’t actually need gradients, then you can explicitly .detach() the Tensor that requires grad to get a tensor with the same content that does not require grad. This other Tensor can then be converted to a numpy array.

C++ strange behaviors of size_t

Posted on 2019-07-18 | In C++

| Words count in article 328

size_t是标准C库中定义的，应为unsigned int，在64位系统中为 long unsigned int。

size_t是一种无符号的整型数，它的取值没有负数，在数组中也用不到负数，而它的取值范围是整型数的双倍。

#include <typeinfo>

vector<int> data;
cout << sizeof(data.size()) << " " << sizeof(int) << endl;
cout << "typeid(data.size()).name() = " << typeid(data.size()).name() << endl;
cout << "data.size() = " << data.size() << endl;
if(data.size() - 1 > 0){
    cout << "data.size() - 1 = " << data.size() - 1 << endl;
}

if(data.size() - 2 > 0){
    cout << "data.size() - 2 = " << data.size() - 2 << endl;
}

cout << "(int) data.size() - 2 = " << (int) data.size() - 2 << endl;

Out:

8 4
typeid(data.size()).name() = m
data.size() = 0
data.size() - 1 = 18446744073709551615
data.size() - 2 = 18446744073709551614
(int) data.size() - 2 = -2

Uses of typeid in C++:

bool = b
char = c
unsigned char = h
short = s
unsigned short = t
int = i
unsigned int = j
long = l
unsigned long = m
long long = x
unsigned long long = y
float = f
double = d
long double = e

C++ details about division /, modulo %

Posted on 2019-07-14 | In C++

| Words count in article 63

cout << (-23) / 10 << endl;
-2

cout << (-23) % 10 << endl;
-3

cout << (-23) / (-10) << endl;
Out:
2

cout << (-23) % (-10) << endl;
Out:
-3

C++ pointer, *, reference, &

Posted on 2019-07-13 | In C++

| Words count in article 174

指针*，引用&，解引用*，取地址&

&可以作为取地址用。&a返回取a的地址，a是一个对象。&映射对象到地址。

In this case, & can be called Address of operator.

*可以作为解引用操作符。*p返回p所指向的对象，p是一个地址。*映射地址到对象。

In this case, * can be called Indirection operator. Indirection operator returns the value of the variable located at the address specified by its operand.

Indirection operator * is the complement of Address of operator &.

References:

Python zip

Posted on 2019-07-09 | In Python

| Words count in article 200

Experiments using Python 3.7.3

zip([1, 2, 3], [4, 5, 6])

Out: <zip at 0x21ad7134208>

list(zip([1, 2, 3], [4, 5, 6]))

Out: [(1, 4), (2, 5), (3, 6)]

zip(*zip([1, 2, 3], [4, 5, 6]))
Out: <zip at 0x21ad7265d88>

zip(*list(zip([1, 2, 3], [4, 5, 6])))
Out: <zip at 0x21ad7265dc8>

list(zip(*zip([1, 2, 3], [4, 5, 6])))
Out: [(1, 2, 3), (4, 5, 6)]

list(zip(*list(zip([1, 2, 3], [4, 5, 6]))))
Out: [(1, 2, 3), (4, 5, 6)]

a, b = zip(*zip([1, 2, 3], [4, 5, 6]))
a
Out: (1, 2, 3)
b
Out: (4, 5, 6)

Python seeds for random number generators with multiprocessing

Posted on 2019-07-09 | In Python

| Words count in article 16772

From https://stackoverflow.com/questions/29854398/seeding-random-number-generators-in-parallel-programs

If no seed is provided explicitly, numpy.random will seed itself using an OS-dependent source of randomness. Usually it will use /dev/urandom on Unix-based systems (or some Windows equivalent), but if this is not available for some reason then it will seed itself from the wall clock. Since self-seeding occurs at the time when a new subprocess forks, it is possible for multiple subprocesses to inherit the same seed if they forked at the same time, leading to identical random variates being produced by different subprocesses.

Some following texts are reprinted from [Python, NumPy, Pytorch中的多进程中每个进程的随机化种子误区] (https://blog.csdn.net/xiaojiajia007/article/details/90207113) with some modifications.

python自带的random在不同子进程中会生成不同的种子，而numpy.random不同子进程会fork相同的主进程中的种子。 pytorch中的Dataloader类的__getitem__()会在不同子进程中发生不同的torch.seed()，并且种子与多进程的worker id 有关（查看**worker_init_fn参数说明）。但是三者互不影响，必须独立地处理。因此在写自己的数据准备代码时，如果使用了 numpy中的随机化部件，一定要显示地在各个子进程中重新采样随机种子，或者使用python中的random发生随机种子。

Experiments were run on Linux-4.9.125-linuxkit-x86_64-with-Ubuntu-18.04-bionic (indeed, in a docker Virtual Machine) with Python 3.6.8, the system had 4 physical cores with 4 hyperthreads, thus 8 logical cores.

Using numpy.random module, without seeding. Identical random sequences across subprocesses, experiment is not reproducible:

import numpy as np
import random
from multiprocessing import Pool

def worker(seed=None):
	return np.random.uniform(0, 10, 4)

pool = Pool(processes=4)
print(np.array(pool.map(worker, range(15))))

Out:
[[6.49262187 8.1209342  9.30877977 2.8359707 ]
 [6.49262187 8.1209342  9.30877977 2.8359707 ]
 [6.49262187 8.1209342  9.30877977 2.8359707 ]
 [6.49262187 8.1209342  9.30877977 2.8359707 ]
 [4.0503515  5.22821427 6.1138743  7.91021459]
 [4.0503515  5.22821427 6.1138743  7.91021459]
 [3.86868331 5.71104986 8.63595764 2.10258815]
 [3.86868331 5.71104986 8.63595764 2.10258815]
 [4.09196614 5.38849459 1.88374082 6.27455603]
 [4.0503515  5.22821427 6.1138743  7.91021459]
 [7.11586367 2.66182869 0.25424771 4.46438042]
 [4.0503515  5.22821427 6.1138743  7.91021459]
 [4.09196614 5.38849459 1.88374082 6.27455603]
 [3.86868331 5.71104986 8.63595764 2.10258815]
 [7.11586367 2.66182869 0.25424771 4.46438042]]
 
 Out:
 [[3.0011753  8.94085733 1.26403167 3.33780215]
 [3.0011753  8.94085733 1.26403167 3.33780215]
 [3.0011753  8.94085733 1.26403167 3.33780215]
 [3.0011753  8.94085733 1.26403167 3.33780215]
 [5.21202135 2.77172446 8.99082513 7.37469595]
 [5.21202135 2.77172446 8.99082513 7.37469595]
 [0.74079289 3.58791385 9.92673251 8.81624304]
 [5.21202135 2.77172446 8.99082513 7.37469595]
 [5.77312088 6.18520518 1.70873513 3.25512823]
 [5.21202135 2.77172446 8.99082513 7.37469595]
 [0.74079289 3.58791385 9.92673251 8.81624304]
 [0.74079289 3.58791385 9.92673251 8.81624304]
 [5.77312088 6.18520518 1.70873513 3.25512823]
 [0.74079289 3.58791385 9.92673251 8.81624304]
 [4.69327905 2.21723652 4.15112529 2.68184457]]
 
 Out:
 [[8.42608352 4.26365989 4.23273135 0.98033274]
 [8.42608352 4.26365989 4.23273135 0.98033274]
 [8.42608352 4.26365989 4.23273135 0.98033274]
 [8.42608352 4.26365989 4.23273135 0.98033274]
 [9.60108351 3.66480316 2.29340697 6.08440055]
 [9.60108351 3.66480316 2.29340697 6.08440055]
 [3.24274242 9.52787549 0.46328866 3.53148162]
 [3.24274242 9.52787549 0.46328866 3.53148162]
 [1.74868215 6.11640213 6.69673611 3.43192459]
 [9.60108351 3.66480316 2.29340697 6.08440055]
 [9.60108351 3.66480316 2.29340697 6.08440055]
 [3.24274242 9.52787549 0.46328866 3.53148162]
 [3.24274242 9.52787549 0.46328866 3.53148162]
 [1.74868215 6.11640213 6.69673611 3.43192459]
 [1.74868215 6.11640213 6.69673611 3.43192459]]

Using numpy.random module, seeding with no arguments. Different random sequences across subprocesses, experiment is not reproducible:

import numpy as np
import random
from multiprocessing import Pool

def worker(seed=None):
	np.random.seed()
	return np.random.uniform(0, 10, 4)

pool = Pool(processes=4)
print(np.array(pool.map(worker, range(15))))

Out:
[[2.72055845 0.8802589  3.79447566 4.35878036]
 [7.07445413 0.94125983 7.35434298 0.50540998]
 [5.95277586 8.03353379 6.59704956 5.53617238]
 [9.84429197 1.59257447 1.77645623 3.78680617]
 [2.75307488 8.29017692 5.2099913  4.43252387]
 [5.38033998 9.47567343 4.90971625 9.84378286]
 [0.72352701 2.15117972 7.62379999 6.78319677]
 [4.51341569 6.5602688  4.44566849 8.48052612]
 [5.04389035 6.71856689 6.41253087 7.52004488]
 [4.20345455 5.20049146 8.0568088  7.23299742]
 [7.19610857 7.88455016 2.98487976 4.33196014]
 [6.22188102 1.66534889 2.22853642 1.2613593 ]
 [0.74199402 5.64979461 1.704669   4.86116781]
 [6.49103534 9.25281956 9.56482262 6.57539205]
 [7.51085919 7.7772543  4.46449279 0.03745542]]
 
 Out:
 [[9.39494869 4.51063974 1.61137229 2.60741085]
 [3.36653946 2.26678542 8.05428869 7.19358866]
 [2.12750234 1.69631663 1.72097737 0.80840712]
 [8.88247029 1.71885449 1.38022186 3.9497802 ]
 [4.09138763 3.12106515 6.76802096 8.58614772]
 [7.68232307 8.04693359 4.3367779  8.03027598]
 [1.05906508 4.52861658 9.36029798 1.81410039]
 [5.67021807 8.6833277  3.90648695 8.05836433]
 [5.24232829 5.46656855 3.67320429 7.95415452]
 [6.44284184 6.60178372 0.34659434 4.84729987]
 [2.53164432 9.1651901  3.23400545 9.39691859]
 [1.73036203 3.73673368 4.4327516  5.03388905]
 [7.79328932 2.68597964 8.34646328 8.43474408]
 [8.68258261 2.17114809 4.48464149 1.91976047]
 [5.15085054 8.62400053 2.16302764 8.45979093]]
 
 Out:
 [[1.65105989 5.34805454 4.20808944 5.86171254]
 [0.29864045 4.27875838 5.47215759 4.36884446]
 [7.35232009 8.40542424 6.12664336 5.82047388]
 [7.56993499 3.46371231 3.8359816  3.17833574]
 [0.3528505  4.55242452 0.76885988 4.15433463]
 [2.96784778 5.42788002 0.10388263 5.14563438]
 [5.17591987 2.51516106 5.31085603 9.16870908]
 [4.71439683 3.86082685 0.47858268 1.77623806]
 [8.05398488 1.36262726 0.77466243 9.01735709]
 [6.7966317  0.836589   3.11442613 7.24553407]
 [5.28787898 8.78236011 9.12632954 0.44383284]
 [7.81086849 1.79341761 5.2370905  3.24437723]
 [4.15437249 2.86691807 2.49408633 0.62242588]
 [1.13911367 8.81219785 6.42852335 0.24591118]
 [3.01810029 0.02716625 8.65552876 7.40824001]]

Using numpy.random module, seeding with None. Different random sequences across subprocesses, experiment is not reproducible:

import numpy as np
import random
from multiprocessing import Pool

def worker(seed=None):
	np.random.seed(None)
	return np.random.uniform(0, 10, 4)

pool = Pool(processes=4)
print(np.array(pool.map(worker, range(15))))

Out:
[[5.66061069 1.15537753 0.72941469 7.59278288]
 [7.01785468 8.79189516 5.4073534  8.2773378 ]
 [8.64154707 6.46188201 2.32864718 6.07402244]
 [3.99662848 2.78765117 7.78397263 9.37711745]
 [0.59342934 9.84564758 6.42627522 8.51653643]
 [4.81447643 6.68004144 3.771518   1.69121905]
 [5.28963199 4.32796107 4.62243399 8.53461067]
 [5.7261191  5.75896923 1.2006108  3.05607401]
 [3.98921882 2.52606706 7.53328592 6.40883981]
 [0.39736223 8.00944951 1.10709433 9.25880692]
 [7.96486569 9.95799148 3.46257317 8.41806039]
 [5.06161476 8.74233197 5.96415057 1.84453155]
 [5.22873857 7.46636046 4.01752842 7.89652396]
 [1.48290741 5.08099786 3.24118862 7.34695478]
 [6.1168209  2.50699166 8.99069415 5.02639108]]
 
 Out:
 [[9.40498577 3.98684269 1.41401997 0.44949128]
 [2.7772966  9.00474825 2.82305118 1.90224813]
 [4.41714032 2.21045351 3.65521197 6.94542113]
 [5.47109984 3.70506815 5.61401534 0.3171225 ]
 [8.86214157 4.80788473 0.85390536 8.13337085]
 [0.29973757 3.24548397 8.23583838 7.92725853]
 [6.70596311 7.74843635 6.73633974 9.58695382]
 [1.15201196 4.53826705 5.95239931 9.87546877]
 [2.16988653 3.83007323 6.94375843 6.63441491]
 [9.43285395 0.8964209  9.05141932 9.52343054]
 [4.56217306 9.53677687 2.18906585 9.22649128]
 [0.78882755 0.30431723 9.15167895 1.53454976]
 [3.83877105 4.35934966 8.15622041 4.26282909]
 [7.75727846 7.02708933 7.81748397 0.93551887]
 [8.37046773 4.16983671 4.66616811 7.31948163]]
 
 Out:
 [[5.37984038 7.66643394 1.44751028 6.08207063]
 [0.62307939 0.0381226  9.64150795 6.2145749 ]
 [9.66367544 8.87438801 7.47616606 9.1984564 ]
 [5.85743256 7.95966147 5.21179431 1.10947049]
 [4.26549026 7.34090569 6.23851296 5.11757473]
 [8.88857647 4.63363418 6.63604227 0.29794179]
 [4.68894521 0.36940943 0.58469257 5.19456297]
 [7.98095094 2.86015854 5.80023412 6.69828148]
 [6.46117453 8.18519272 9.91373532 2.91021242]
 [6.3647256  0.97542724 4.96531842 4.08462095]
 [0.58053755 7.41943139 2.12317777 2.01503869]
 [1.14085398 4.96704605 6.6241862  2.77557808]
 [1.66571531 7.51319418 0.18487467 7.46651576]
 [1.79253752 8.69946821 7.36189555 7.41694284]
 [5.88033799 3.53234725 6.13205727 3.82982333]]

Using numpy.random.RandomState function, seeding with no arguments. Different random sequences across subprocesses, experiment is not reproducible:

import numpy as np
import random
from multiprocessing import Pool

def worker(seed=None):
	local_rng = np.random.RandomState()
	return local_rng.uniform(0, 10, 4)

pool = Pool(processes=4)
print(np.array(pool.map(worker, range(15))))

Out:
[[7.47287954 1.64513499 5.59878623 0.72030859]
 [2.90500228 1.5166515  6.19064658 1.81021106]
 [3.49132686 5.00867363 1.88135732 1.43919645]
 [2.81794232 0.71414477 6.54793177 4.39803013]
 [9.54589269 2.5366685  8.28472206 4.24108638]
 [1.62176156 4.92360789 3.08844423 4.89328079]
 [3.41789022 5.57786777 1.50233856 4.27730662]
 [8.15775924 8.36672714 0.22283973 6.14913015]
 [4.85522657 1.11077142 2.35228615 5.40514862]
 [7.88593532 9.47651053 2.75304225 5.49017707]
 [7.66411174 0.47420446 2.68583698 1.90588588]
 [5.9533508  0.48065235 4.28996282 8.73984157]
 [3.02155658 5.89818692 5.92691295 4.5919441 ]
 [7.61758339 2.86797121 3.90167418 2.14272868]
 [6.26586922 9.5772994  5.0617227  9.99122278]]
 
 Out:
 [[6.99686852 0.21068811 5.40786144 5.2242592 ]
 [8.15711185 2.17645545 3.37533804 0.74046103]
 [6.71161056 7.08232828 3.40270084 8.51235743]
 [7.1352259  6.96005528 9.02192899 9.5397134 ]
 [4.28600392 1.58834436 1.66532127 2.28879876]
 [2.79706639 8.37018059 5.35126524 0.04243214]
 [0.73511349 9.48183283 3.11555672 7.64746782]
 [7.30640946 1.61987    9.50893433 6.93870779]
 [7.61393021 6.82058752 9.02091704 0.05923267]
 [7.37401727 5.54675581 5.77661225 2.17095416]
 [2.59241671 3.406191   3.24546694 7.6233631 ]
 [8.56201193 1.5353022  7.32492813 7.87631658]
 [8.05848537 6.10073285 0.78214471 9.02082671]
 [2.68440308 8.97491905 5.9364795  5.47908633]
 [0.1401708  7.82240792 1.25191982 4.93649086]]
 
 Out:
 [[4.4837213  0.41508092 8.4559796  0.76768878]
 [9.19092941 2.94394276 6.56804776 5.00689786]
 [5.16994826 5.99255444 0.48031827 5.44022494]
 [4.46061986 2.15149061 5.77681651 4.82017194]
 [9.93193371 5.91830564 1.0902706  0.18613093]
 [1.54586658 7.88407961 3.25817118 5.49232203]
 [6.06380906 7.49791775 8.60978079 3.39300875]
 [7.12861246 6.6045747  1.991714   7.4517933 ]
 [7.5530726  4.08610184 6.58530841 6.97248608]
 [7.00577556 2.96282761 1.15704372 5.14490299]
 [7.55449023 4.566907   8.33096432 7.03761284]
 [2.96287954 8.76192893 3.54855811 8.69735099]
 [4.47085146 3.06452311 9.85123796 9.42718963]
 [1.31808294 0.51110064 3.9614601  1.53873374]
 [1.76447181 3.06410052 5.47611543 6.21988505]]

Using numpy.random.RandomState function, seeding with None. Different random sequences across subprocesses, experiment is not reproducible:

import numpy as np
import random
from multiprocessing import Pool

def worker(seed=None):
	local_rng = np.random.RandomState(None)
	return local_rng.uniform(0, 10, 4)

pool = Pool(processes=4)
print(np.array(pool.map(worker, range(15))))

Out:
[[4.07408618 1.12597621 8.44412093 9.47768117]
 [8.31425196 5.16357479 2.57453859 7.60239709]
 [6.49322567 6.5871164  6.68840226 0.39065023]
 [4.29845063 4.91918739 6.67114227 3.67376173]
 [4.59138881 3.58527338 8.50141979 7.65785301]
 [8.06624153 3.31264488 8.42976967 6.95372882]
 [2.83546276 1.37710099 8.60106327 7.80681018]
 [1.25145727 5.18709428 2.58135664 0.90252444]
 [2.41142332 2.17311449 3.78452276 3.48049633]
 [2.95672863 4.60301463 7.03753239 6.78597938]
 [8.6664376  2.21796982 6.58946684 0.94534791]
 [8.36884194 2.04559324 4.72339016 8.56933231]
 [8.97105727 5.87126068 2.56020191 7.77482763]
 [9.3277643  3.43262419 6.99726298 9.72795273]
 [6.92330233 2.9847431  2.47027815 0.59152973]]
 
 Out:
 [[1.18008389 0.43223068 4.56589216 8.99434438]
 [2.99550761 5.63144958 1.67646676 0.92666382]
 [3.83009104 0.84242862 5.81445459 5.26410216]
 [3.90910634 2.75006498 9.72590854 2.09122993]
 [9.44506222 8.92497833 7.32124599 8.37280358]
 [8.67144413 2.78254613 4.52192337 9.98208005]
 [8.90881731 5.39467214 1.48090512 4.98031005]
 [9.56428628 5.21475564 5.92263732 3.75448466]
 [0.58264744 3.61762958 0.52948766 3.94044002]
 [1.61499444 1.89659109 4.13607624 9.00829716]
 [8.64408094 4.01001844 5.06963917 2.80810151]
 [3.88477879 1.53160203 7.67241388 1.18477263]
 [9.11289943 2.24493317 1.36399952 1.53810384]
 [0.58052752 8.02823927 3.54695805 5.06782047]
 [1.90818018 5.86118272 6.21231476 6.57730534]]
 
 Out:
 [[0.96447675 6.14939053 4.16792957 6.79878741]
 [1.95570024 7.78521459 1.15436798 6.61917479]
 [4.94226675 2.46316186 3.08293384 7.48405922]
 [3.5823681  7.29659    3.42391925 5.97202459]
 [5.62471571 0.39652861 2.07557334 9.11455541]
 [0.85831096 9.12297135 0.96402074 3.25984691]
 [0.53098281 7.75401869 7.94618052 6.54585171]
 [3.85699774 5.54857294 5.99321064 3.69811617]
 [5.90936257 5.98547035 7.62473906 4.45725101]
 [4.16769542 3.8463279  4.60334438 5.3734445 ]
 [0.20073633 2.52003127 0.46778973 8.90463061]
 [8.93925295 6.89342956 6.34603345 2.51997292]
 [8.94522954 8.31539651 8.07451252 4.20648038]
 [8.08665292 0.66729955 9.52094181 9.75303898]
 [0.43693905 0.52897897 4.55981209 6.21014578]]

Calling np.random.seed() within a subprocess forces the thread-local RNG (Random Number Generator) instance to seed itself again from /dev/urandom or the wall clock, which will (probably) prevent you from seeing identical output from multiple subprocesses. Best practice is to explicitly pass a different seed (or numpy.random.RandomState instance) to each subprocess.

Using numpy.random.RandomState function, seeding with different seeds explicitly passed to subprocesses. Different random sequences across subprocesses, experiment is reproducible:

import numpy as np
import random
from multiprocessing import Pool

def worker(seed=None):
	local_rng = np.random.RandomState(seed)
	return local_rng.uniform(0, 10, 4)

pool = Pool(processes=4)
print(np.array(pool.map(worker, range(15))))

Out:
[[5.48813504e+00 7.15189366e+00 6.02763376e+00 5.44883183e+00]
 [4.17022005e+00 7.20324493e+00 1.14374817e-03 3.02332573e+00]
 [4.35994902e+00 2.59262318e-01 5.49662478e+00 4.35322393e+00]
 [5.50797903e+00 7.08147823e+00 2.90904739e+00 5.10827605e+00]
 [9.67029839e+00 5.47232249e+00 9.72684360e+00 7.14815994e+00]
 [2.21993171e+00 8.70732306e+00 2.06719155e+00 9.18610908e+00]
 [8.92860151e+00 3.31979805e+00 8.21229123e+00 4.16966257e-01]
 [7.63082894e-01 7.79918792e+00 4.38409231e+00 7.23465178e+00]
 [8.73429403e+00 9.68540663e+00 8.69194540e+00 5.30855692e+00]
 [1.03741539e-01 5.01874592e+00 4.95773293e+00 1.33829529e+00]
 [7.71320643e+00 2.07519494e-01 6.33648235e+00 7.48803883e+00]
 [1.80269689e+00 1.94752415e-01 4.63218526e+00 7.24933929e+00]
 [1.54162842e+00 7.40049697e+00 2.63315015e+00 5.33739393e+00]
 [7.77702411e+00 2.37541220e+00 8.24278533e+00 9.65749198e+00]
 [5.13943344e+00 7.73165052e+00 8.70427686e+00 8.04694853e-02]]
 
 Out:
[[5.48813504e+00 7.15189366e+00 6.02763376e+00 5.44883183e+00]
 [4.17022005e+00 7.20324493e+00 1.14374817e-03 3.02332573e+00]
 [4.35994902e+00 2.59262318e-01 5.49662478e+00 4.35322393e+00]
 [5.50797903e+00 7.08147823e+00 2.90904739e+00 5.10827605e+00]
 [9.67029839e+00 5.47232249e+00 9.72684360e+00 7.14815994e+00]
 [2.21993171e+00 8.70732306e+00 2.06719155e+00 9.18610908e+00]
 [8.92860151e+00 3.31979805e+00 8.21229123e+00 4.16966257e-01]
 [7.63082894e-01 7.79918792e+00 4.38409231e+00 7.23465178e+00]
 [8.73429403e+00 9.68540663e+00 8.69194540e+00 5.30855692e+00]
 [1.03741539e-01 5.01874592e+00 4.95773293e+00 1.33829529e+00]
 [7.71320643e+00 2.07519494e-01 6.33648235e+00 7.48803883e+00]
 [1.80269689e+00 1.94752415e-01 4.63218526e+00 7.24933929e+00]
 [1.54162842e+00 7.40049697e+00 2.63315015e+00 5.33739393e+00]
 [7.77702411e+00 2.37541220e+00 8.24278533e+00 9.65749198e+00]
 [5.13943344e+00 7.73165052e+00 8.70427686e+00 8.04694853e-02]]
 
 Out:
[[5.48813504e+00 7.15189366e+00 6.02763376e+00 5.44883183e+00]
 [4.17022005e+00 7.20324493e+00 1.14374817e-03 3.02332573e+00]
 [4.35994902e+00 2.59262318e-01 5.49662478e+00 4.35322393e+00]
 [5.50797903e+00 7.08147823e+00 2.90904739e+00 5.10827605e+00]
 [9.67029839e+00 5.47232249e+00 9.72684360e+00 7.14815994e+00]
 [2.21993171e+00 8.70732306e+00 2.06719155e+00 9.18610908e+00]
 [8.92860151e+00 3.31979805e+00 8.21229123e+00 4.16966257e-01]
 [7.63082894e-01 7.79918792e+00 4.38409231e+00 7.23465178e+00]
 [8.73429403e+00 9.68540663e+00 8.69194540e+00 5.30855692e+00]
 [1.03741539e-01 5.01874592e+00 4.95773293e+00 1.33829529e+00]
 [7.71320643e+00 2.07519494e-01 6.33648235e+00 7.48803883e+00]
 [1.80269689e+00 1.94752415e-01 4.63218526e+00 7.24933929e+00]
 [1.54162842e+00 7.40049697e+00 2.63315015e+00 5.33739393e+00]
 [7.77702411e+00 2.37541220e+00 8.24278533e+00 9.65749198e+00]
 [5.13943344e+00 7.73165052e+00 8.70427686e+00 8.04694853e-02]]

Using Python’s default random module, without seeding. Different random sequences across subprocesses, experiment is not reproducible:

import numpy as np
import random
from multiprocessing import Pool

def worker(seed=None):
	res = []
	for _ in range(4):
		res.append(random.uniform(0, 10))
	return res

pool = Pool(processes=4)
print(np.array(pool.map(worker, range(15))))

Out:
[[8.75887344 7.66899898 7.89483475 8.25558853]
 [4.82128881 5.82916386 0.34424901 7.2449423 ]
 [1.97611274 2.53168446 9.00998775 8.70417685]
 [5.14779498 5.36060047 5.8851685  4.80023303]
 [4.65710235 9.25872994 8.62409027 1.41524632]
 [3.51892985 4.07925835 7.68290768 6.05086268]
 [3.02449299 3.03966278 5.02132762 2.74329596]
 [8.33139616 9.03531729 0.99170685 3.15345865]
 [6.43301109 2.69636752 0.96816997 1.46783853]
 [4.86883469 0.14235119 7.13505221 1.10815525]
 [8.15082917 0.58924962 8.5408504  3.19921991]
 [0.26685168 1.95267462 6.83847924 7.4683847 ]
 [1.55368032 7.66428747 0.4142539  8.62424906]
 [4.6734337  2.86129616 5.9863453  4.24575983]
 [6.47002111 7.72428926 6.17812488 9.30220208]]
 
 Out:
 [[1.21602317 7.98439696 9.05620556 2.83697959]
 [9.3958507  0.4410541  4.69294741 6.21151135]
 [9.35541669 8.771601   2.67657548 1.51640967]
 [3.31345616 1.83102108 4.49195719 2.65633897]
 [4.22969662 9.78383428 4.2059855  3.51768246]
 [4.57385764 3.37973751 2.6408349  0.02887051]
 [6.21546011 0.04066268 7.80298818 9.5587169 ]
 [9.2161083  8.67319937 7.76872335 0.04412195]
 [7.08384209 1.5411624  5.91919843 5.86732347]
 [8.4855305  4.04377759 0.47967926 2.67706414]
 [7.79918004 1.37033309 7.85530482 5.61566313]
 [6.8912464  6.71061425 1.79014265 9.95667328]
 [7.50138137 6.40655105 1.6787647  3.8388262 ]
 [3.84003136 5.61118883 4.591534   6.86230083]
 [5.17758892 0.29655051 6.81150415 8.37906896]]
 
 Out:
 [[6.2008191  5.01440898 1.19431646 1.46966628]
 [3.6884441  5.98279433 8.00901935 9.64732793]
 [3.52759455 3.02354741 5.7056643  6.2760637 ]
 [9.42402217 0.84254578 6.15038093 6.27544049]
 [1.60570361 0.59387563 1.78992139 3.52261129]
 [8.8900899  6.03987706 0.33050256 5.92188544]
 [3.81516281 3.86238878 4.03855837 2.06667167]
 [9.2701765  9.60381624 7.74558517 8.19339026]
 [8.20814064 9.74237378 5.66952719 6.1851182 ]
 [1.7576639  8.29651375 3.922223   8.69603672]
 [2.06396899 6.52352009 4.98091055 5.43478808]
 [5.12493255 7.36592255 4.00586071 7.15660996]
 [6.10261842 7.69178927 7.78526434 2.3107752 ]
 [2.7852262  5.06825957 3.53834605 6.29898952]
 [6.71368228 3.46953372 4.85126099 5.78676554]]

Using Python’s default random module, seeding with no arguments. Different random sequences across subprocesses, experiment is not reproducible:

import numpy as np
import random
from multiprocessing import Pool

def worker(seed=None):
	random.seed()
	res = []
	for _ in range(4):
		res.append(random.uniform(0, 10))
	return res

pool = Pool(processes=4)
print(np.array(pool.map(worker, range(15))))

Out:
[[5.81833261 9.9168923  8.93247002 9.90518765]
 [2.90123596 3.91528954 2.77872011 4.87480949]
 [6.19832288 8.34816194 6.88661735 9.47426207]
 [3.96989689 3.22604355 2.95904976 8.34141326]
 [2.296259   5.12486065 0.21062799 8.29904048]
 [5.49116889 0.30380744 7.58570793 5.30868616]
 [1.42948943 9.73802262 4.58700448 7.56218536]
 [8.13709803 2.5842345  8.28861535 8.88060518]
 [4.26313454 2.73609069 5.15700403 3.81322537]
 [0.2053581  8.2047799  3.57169662 7.92371661]
 [0.39280806 3.80576944 6.15093436 8.24473969]
 [2.46185071 3.72478437 4.06629893 0.27102934]
 [1.09800272 3.02180595 7.84631048 7.34041065]
 [3.81548485 6.13159751 1.47117271 5.15804636]
 [5.1527888  2.89648508 9.92809524 2.52398597]]
 
 Out:
 [[0.08005381 7.29528368 8.27069162 9.68627905]
 [2.73432719 3.4192688  7.74288323 7.09917194]
 [7.34550158 1.51663462 5.19498407 7.25172718]
 [6.83526373 8.5079891  3.5465491  5.74123371]
 [9.53125345 8.76415354 7.90748868 2.89947957]
 [1.72001842 3.10226006 7.86063452 4.49702623]
 [8.27568281 0.40086092 1.5762478  0.89076698]
 [0.40298249 8.74680111 4.71490165 2.76464137]
 [7.15458122 2.66073967 3.20191642 4.33114333]
 [7.72341661 7.60516553 7.98281028 7.78942491]
 [2.54719025 1.07420672 4.22804424 9.50822762]
 [3.48335869 3.16231766 2.09045784 5.83409277]
 [1.03204353 8.14189566 0.38457616 8.36397594]
 [7.00119478 6.46496712 8.23629651 1.14136567]
 [1.60403881 7.14953015 8.5172803  7.77329254]]
 
 Out:
 [[4.16353115 4.41836225 9.34061751 5.4246647 ]
 [9.16173418 8.22467751 1.30121419 3.70650714]
 [4.30027175 3.16684564 9.93558304 4.91339036]
 [3.67886406 4.64908281 7.26663676 9.2994314 ]
 [5.24570604 7.59794435 8.58020273 1.03473814]
 [6.02959303 7.53058486 6.78805157 6.69025537]
 [0.6579908  4.67686908 8.17028565 8.70974021]
 [0.58512844 9.17258716 0.05557842 1.97269102]
 [7.04964436 5.3831703  3.89731499 0.23172793]
 [2.59543961 6.7987098  1.01134134 6.64759157]
 [5.68484053 3.6320347  7.35234878 2.86689843]
 [2.55336192 5.39675933 4.25427043 9.25019247]
 [0.13522337 0.48562083 7.89920654 0.71470221]
 [2.15937089 8.53340733 6.3133162  7.40219056]
 [3.72028792 9.99808996 6.77477225 1.43962173]]

Pytorch中多个进程加载随机样本Dataloader解决方法：

https://discuss.pytorch.org/t/does-getitem-of-dataloader-reset-random-seed/8097/7 除了可选择python中的random解决外，

Instead, add this line to the top of your main script (and you need to use python 3)

import torch
import torch.multiprocessing as mp
mp.set_start_method('spawn') 

Pytorch中多个进程加载随机样本Dataloader解决方法（2022年7月16日更新）：

知乎可能95%的人还在犯的PyTorch错误

这个bug的出现需要满足以下两个条件：

PyTorch版本 < 1.9。PyTorch < 1.9: torch和random库产生随机数没有问题，numpy有问题。PyTorch >= 1.9: 官方修复以后，大家都没问题。
在Dataset的 __getitem__ 方法中使用了Numpy的随机数

DataLoader 的构造函数有一个可选参数 worker_init_fn。在加载数据之前，每个子进程都会先调用此函数。我们可以在 worker_init_fn 中设置NumPy的种子。还有一个要注意的点就是: 在默认情况下，每个子进程在epoch结束时被杀死，所有的进程资源都将丢失。在开始新的epoch时，主进程中的随机状态没有改变，用于再次初始化各个子进程，所以子进程的随机数种子和上个epoch完全相同。因此我们需要设置一个会随着epoch数目改变而改变的随机数，但是这个在实际应用中很难实现，因为在 worker_init_fn 中无法得知当前是第几个epoch。幸运的是，torch.initial_seed() 可以满足我们的需求。这个其实也是PyTorch官方的推荐作法: https://pytorch.org/docs/stable/notes/randomness.html#dataloader

为什么torch.initial_seed()可以？

在子进程中运行torch.initial_seed()，返回的就是 torch 当前的随机数种子，即 base_seed + worker_id。因为每个epoch开始时，主进程都会重新生成一个 base_seed，所以 base_seed是随epoch变化而变化的随机数。此外，torch.initial_seed()返回的是 long int 类型，而Numpy只接受 uint 类型（[0, 2**32 - 1]），所以需要对 2**32 取模。
如果我们用 torch 或者 random 生成随机数，而不是 numpy，就不用担心会遇到这个问题，因为PyTorch已经把 torch 和 random 的随机数设置为了 base_seed + worker_id。

def worker_init_fn(worker_id):
    worker_seed = torch.initial_seed() % 2**32
    np.random.seed(worker_seed)
    random.seed(worker_seed)

DataLoader(
    train_dataset,
    batch_size=batch_size,
    num_workers=num_workers,
    worker_init_fn=worker_init_fn
)

知乎可能95%的人还在犯的PyTorch错误的评论精选:

有一个人说: “用random numpy还会有锁死的问题”。他收到另一个网友的回复是: “卧槽，虽然还没实验，这句话解决了一个困扰我两年的问题！我应该就是numpy random导致死锁，worker设为0就没事。一直以为是pytorch的bug”
有一个人说: “之前遇到过，在子进程里面改用 np.random.default_rng 来获取随机数了”
有一个人说: “不过感觉只重复augment部分的随机数，本质上还是枚举了所有可能的数据增强，所以即使对性能有影响，也比较有限吧？” 我其实也有类似的想法，不过不确定。

Backward compatibility

Posted on 2019-07-09 | In Misc

| Words count in article 78

Backward compatibility, sometimes also called downward compatibility (向下兼容), is a property of a system, product, or technology that allows for interoperability with an older legacy system, or with input designed for such a system. Modifying a system in a way that does not allow backward compatibility is sometimes called “breaking” backward compatibility.

Forward compatibility or upward compatibility is a design characteristic that allows a system to accept input intended for a later version of itself.

Python multiprocessing gather

Posted on 2019-06-26 | In Python

| Words count in article 1296

This experiment was run on Linux-4.9.125-linuxkit-x86_64-with-Ubuntu-18.04-bionic (indeed, in a docker Virtual Machine) with Python 3.6.8, the system had 4 physical cores with 4 hyperthreads, thus 8 logical cores.

An incorrect way to do:

import multiprocessing, os
import numpy as np
import gym

def worker(seed, **kwargs):
    print("begin worker", seed)
    res = 0
    for i in range(100000000):
        res += seed
    print("end worker", seed)
    return res, res / 2

if __name__ == '__main__':

    nb_processes = 16

    print("There are %d CPUs, we run %d experiments in parallel." % (os.cpu_count(), nb_processes))

    pool = multiprocessing.Pool(processes=nb_processes)

    random_seeds = [10 * i for i in range(1, nb_processes + 1)]

    kwargs = {"attribute1": 0, "attribute2": 3}

    array1 = np.zeros(nb_processes)
    array2 = np.zeros(nb_processes)

    for i, seed in enumerate(random_seeds):
        array1[i], array2[i] = pool.apply_async(worker, (seed,), kwargs).get()

    print('Waiting for all subprocesses done...')
    pool.close()
    pool.join()
    print('All subprocesses done.')

    print(array1)
    print(array2)

The output was

There are 8 CPUs, we run 16 experiments in parallel.
begin worker 10
end worker 10
begin worker 20
end worker 20
begin worker 30
end worker 30
begin worker 40
end worker 40
begin worker 50
end worker 50
begin worker 60
end worker 60
begin worker 70
end worker 70
begin worker 80
end worker 80
begin worker 90
end worker 90
begin worker 100
end worker 100
begin worker 110
end worker 110
begin worker 120
end worker 120
begin worker 130
end worker 130
begin worker 140
end worker 140
begin worker 150
end worker 150
begin worker 160
end worker 160
Waiting for all subprocesses done...
All subprocesses done.
[1.0e+09 2.0e+09 3.0e+09 4.0e+09 5.0e+09 6.0e+09 7.0e+09 8.0e+09 9.0e+09
 1.0e+10 1.1e+10 1.2e+10 1.3e+10 1.4e+10 1.5e+10 1.6e+10]
[5.0e+08 1.0e+09 1.5e+09 2.0e+09 2.5e+09 3.0e+09 3.5e+09 4.0e+09 4.5e+09
 5.0e+09 5.5e+09 6.0e+09 6.5e+09 7.0e+09 7.5e+09 8.0e+09]

By doing so, only 1 core among 8 cores was used at 100%, whereas other 7 cores were almost at 0% (checked by linux command top). At a given time, only 100% (instead of 800%) of CPU charge was used, even though this 100% CPU charge may move from one core to another every time a new process started.

The correct way to do:

import multiprocessing, os
import numpy as np
import gym

def worker(seed, **kwargs):
	print("begin worker", seed)
	res = 0
	for i in range(100000000):
		res += seed
	print("end worker", seed)
	return res, res / 2

if __name__ == '__main__':

	nb_processes = 16

	print("There are %d CPUs, we run %d experiments in parallel." % (os.cpu_count(), nb_processes))

	pool = multiprocessing.Pool(processes=nb_processes)

	random_seeds = [10 * i for i in range(1, nb_processes + 1)]

	kwargs = {"attribute1": 0, "attribute2": 3}

	array1 = np.zeros(nb_processes)
	array2 = np.zeros(nb_processes)

	async_results = []
	for i, seed in enumerate(random_seeds):
		async_results.append(pool.apply_async(worker, (seed,), kwargs))

	for i in range(len(random_seeds)):
		array1[i], array2[i] = async_results[i].get()

	print('Waiting for all subprocesses done...')
	pool.close()
	pool.join()
	print('All subprocesses done.')

	print(array1)
	print(array2)

The output of the correct way was:

There are 8 CPUs, we run 16 experiments in parallel.
begin worker 10
begin worker 20
begin worker 30
begin worker 40
begin worker 50
begin worker 60
begin worker 70
begin worker 80
begin worker 110
begin worker 90
begin worker 140
begin worker 150
begin worker 160
begin worker 130
begin worker 120
begin worker 100
end worker 10
end worker 30
end worker 20
end worker 40
end worker 160
end worker 50
end worker 140
end worker 70
end worker 60
end worker 100
end worker 110
end worker 80
end worker 120
end worker 90
end worker 130
end worker 150
Waiting for all subprocesses done...
All subprocesses done.
[1.0e+09 2.0e+09 3.0e+09 4.0e+09 5.0e+09 6.0e+09 7.0e+09 8.0e+09 9.0e+09
 1.0e+10 1.1e+10 1.2e+10 1.3e+10 1.4e+10 1.5e+10 1.6e+10]
[5.0e+08 1.0e+09 1.5e+09 2.0e+09 2.5e+09 3.0e+09 3.5e+09 4.0e+09 4.5e+09
 5.0e+09 5.5e+09 6.0e+09 6.5e+09 7.0e+09 7.5e+09 8.0e+09]

By using the correct way, all 8 cores were used at 100% (checked by linux command top).

The difference is the following:

for i, seed in enumerate(random_seeds):
	array1[i], array2[i] = pool.apply_async(worker, (seed,), kwargs).get()

becomes

async_results = []
for i, seed in enumerate(random_seeds):
	async_results.append(pool.apply_async(worker, (seed,), kwargs))

for i in range(len(random_seeds)):
	array1[i], array2[i] = async_results[i].get()

Python slicing

Posted on 2019-06-24 | In Python

| Words count in article 323

a[start:stop] items start through stop-1
a[start:] items start through the rest of the array
a[:stop] items from the beginning through stop-1
a[:] a copy of the whole array
a[-1] last item in the array
a[-2:] last two items in the array
a[:-2] everything except the last two items
a[::-1] all items in the array, reversed
a[1::-1] the first two items, reversed
a[-3::-1] everything except the last two items, reversed
a[:-3:-1] the last two items, reversed
a[::2] extracts elements of list at even positions
a[1::2] extracts elements of list at odd positions
a[::3] similar to above
a[1::3] similar to above

If we adopt the notation [start:stop:step], start is always inclusive, stop is always exclusive.

a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] 

print(a[5:2])      # []
print(a[5:2:1])    # []
print(a[5:2:-1])   # [6, 5, 4] 
print(a[-5:-8:-1]) # [6, 5, 4] 
print(a[-5:-8:1])  # []
print(a[-5:-8])    # [] 
print(a[-2:-9:-2]) # [9, 7, 5, 3]    
print(a[-2:-9:-3]) # [9, 6, 3]    

One can substitute None for any of the empty spaces. For example [None:None] makes a whole copy. This is useful when you need to specify the end of the range using a variable and need to include the last item.

Slicing builtin types returns a copy but that’s not universal. Notably, [slicing NumPy arrays] (https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html) returns a view that shares memory with the original.

Large number naming systems in English/French/Chinese

Posted on 2019-06-23 | In Misc

| Words count in article 1081

English	French	Chinese	Scientific notation
million	million	百万	$10^6$
billion	milliard	十亿	$10^9$
trillion	billion	万亿(兆)	$10^{12}$
quadrillion	-	千万亿(千兆)	$10^{15}$
quintillion	trillion	百亿亿 (百京)	$10^{18}$

For English and French:

The long scales (长级差制) and short scales (短级差制) are two of several large-number naming systems for integer powers of ten that use the same words with different meanings. The long scale is based on powers of one million, whereas the short scale is based on powers of one thousand.

Most English-language countries and regions use the short scale with $10^9$ being billion.
The traditional long scale is used by most Continental European countries and by most other countries whose languages derive from Continental Europe (with the notable exceptions of Albania, Greece, Romania, and Brazil). These countries use a word similar to billion to mean $10^{12}$. Some use a word similar to milliard to mean $10^9$, while others use a word or phrase equivalent to thousand millions.

For Chinese:

References:

https://www.zhihu.com/question/36719368
https://zh.wikipedia.org/zh-cn/兆
https://zh.wikipedia.org/zh-cn/中文数字

“亿”以上的大数的说法来源于《孙子算经》。《孙子算经》在大数的递进等差上采用“万万数之”的做法，但后世也有其他的递进等差。

汉字	拼音	数值	备注
千	qiān	$10^3$
万	wàn	$10^4$
亿	yì	$10^8$	在古代，亿也可代表$10^5$。详看大数系统。亦作万万，例如马关条约中“平银贰万万两交与日本，作为赔偿军费”。
兆	zhào	$10^{12}$	在古代，兆也可代表$10^6$、$10^{16}$。因为兆也可以表示“百万”，造成其用法争议，请看国际单位制词头。
京	jīng	$10^{16}$	在古代，京也可代表$10^7$、$10^{24}$、$10^{32}$。也作经。
垓	gāi	$10^{20}$	在古代，垓也可代表$10^8$、$10^{32}$、$10^{64}$。
秭	zǐ	$10^{24}$	在古代，秭也可代表$10^9$、$10^{40}$、$10^{128}$。也作杼。尧才是对应的国际单位制词头。
穰	ráng	$10^{28}$	在古代，穰也可代表$10^{10}$、$10^{48}$、$10^{256}$。也作壤。
沟	gōu	$10^{32}$	在古代，沟也可代表$10^{11}$、$10^{56}$、$10^{512}$。
涧	jiàn	$10^{36}$	在古代，涧也可代表$10^{12}$、$10^{64}$、$10^{1024}$。
正	zhèng	$10^{40}$	在古代，正也可代表$10^{13}$、$10^{72}$、$10^{2048}$。
载	zài	$10^{44}$	在古代，载也可代表$10^{14}$、$10^{80}$、$10^{4096}$。

兆是一个中文数词。在不同的体系中分别代表百万（1000000也就是$10^6$）、万亿（1000000000000也就是$10^{12}$）、亿亿（10000000000000000也就是$10^{16}$）这三个数目。在台湾、日本、韩国普遍用“兆”来代表$10^{12}$。但在中国大陆，“兆”代表的含义往往取决于语境，在作为计算机相关单位名词如网络流量、二进制数据长度单位时“兆”则经常用于代表mega，也就是“百万”，例如兆字节（MB），兆字节每秒（MB/s）等。而在作为计数数量，衡量数量的时候则往往称为“万亿”，如“中国电子讯息产业总收入达人民币5.6兆元（万亿元，即$10^{12}$）”。

Multinomial distribution & categorical distribution

Posted on 2019-06-05 | In Mathematics

| Words count in article 352

In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for rolling a k-sided die n times. For n independent trials each of which leads to a success for exactly one of k categories, with each category having a given fixed success probability, the multinomial distribution gives the probability of any particular combination of numbers of successes for the various categories.

When k is 2 and n is 1, the multinomial distribution is the Bernoulli distribution.
When k is 2 and n is bigger than 1, it is the binomial distribution.
When k is bigger than 2 and n is 1, it is the categorical distribution.

The Bernoulli distribution models the outcome of a single Bernoulli trial. In other words, it models whether flipping a (possibly biased) coin one time will result in either a success (obtaining a head) or failure (obtaining a tail). The binomial distribution generalizes this to the number of heads from performing n independent flips (Bernoulli trials) of the same coin. The multinomial distribution models the outcome of n experiments, where the outcome of each trial has a categorical distribution, such as rolling a k-sided die n times.

In probability theory and statistics, a categorical distribution (also called a generalized Bernoulli distribution, multinoulli distribution[1]) is a discrete probability distribution that describes the possible results of a random variable that can take on one of K possible categories, with the probability of each category separately specified.

The parameters specifying the probabilities of each possible outcome are constrained only by the fact that each must be in the range 0 to 1, and all must sum to 1.

The categorical distribution is the generalization of the Bernoulli distribution for a categorical random variable, i.e. for a discrete variable with more than two possible outcomes, such as the roll of a die. On the other hand, the categorical distribution is a special case of the multinomial distribution, in that it gives the probabilities of potential outcomes of a single drawing rather than multiple drawings.

Logit

Posted on 2019-06-05 | In Mathematics

| Words count in article 542

One problem of terminology: logit

In statistics, the logit (/ˈloʊdʒɪt/ LOH-jit) function or the log-odds is the logarithm of the odds p/(1 − p) where p is the probability. It is a type of function that creates a map of probability values from $[0,1]$ to $(-\infty, +\infty)$. It is the inverse of the standard logistic function (sigmoid).

In deep learning, the term logits layer is popularly used for the last neuron layer of neural networks used for classification tasks, which produce raw prediction values as real numbers ranging from $(-\infty, +\infty)$.

When one talks about deep learning for classification tasks, the output values of the next to last layer (the layer before the final softmax activation) are called logits.

History of this term:

There have been several efforts to adapt linear regression methods to domain where output is probability value $[0,1]$ instead of any real number $(-\infty, +\infty)$. Many of such efforts focused on modeling this problem by somehow mapping the range $[0,1]$ to $(-\infty, +\infty)$ and then running the linear regression on these transformed values. In 1934 Chester Ittner Bliss used the cumulative normal distribution function to perform this mapping and called his model probit an abbreviation for “probability unit”. However, this is computationally more expensive. In 1944, Joseph Berkson used log of odds and called this function logit, abbreviation for “logistic unit” following the analogy for probit. Log odds was used extensively by Charles Sanders Peirce (late 19th century). G. A. Barnard in 1949 coined the commonly used term log-odds; the log-odds of an event is the logit of the probability of the event.

So “logit” actually means “logistic unit”.

Here is one piece of code from https://github.com/pytorch/pytorch/blob/master/torch/distributions/utils.py (2019-06-05)

def logits_to_probs(logits, is_binary=False):
    r"""
    Converts a tensor of logits into probabilities. Note that for the
    binary case, each value denotes log odds, whereas for the
    multi-dimensional case, the values along the last dimension denote
    the log probabilities (possibly unnormalized) of the events.
    """
    if is_binary:
        return torch.sigmoid(logits)
    return F.softmax(logits, dim=-1)

Logits is an overloaded term which can mean many different things.

In mathematics, logit is log-odds, it is the inverse function of (standard) logistic function.

\[l = \log \frac{p}{1-p}\] \[p = \frac{1}{1+ e^{-l}}\]

一个事件的发生比（Odds）是该事件发生的概率和不发生的概率的比值

In neural networks, it is the vector of raw (non-normalized) predictions. In context of deep learning the logits layer means the layer that feeds in to softmax.

Unfortunately the term logits is abused in deep learning. From pure mathematical perspective logit is a function that performs log-odds mapping. In deep learning people started calling the layer “logits layer” that feeds in to logit function. Then people started calling the output values of this layer “logit”, which creates the confusion of this term.

For details, see:

https://stackoverflow.com/questions/41455101/what-is-the-meaning-of-the-word-logits-in-tensorflow

Distinction between queue, stack and heap

Posted on 2019-05-23 | In Misc

| Words count in article 46

English	French	Chinese	Property
queue	file	队列	First In First Out
stack	pile	栈, 堆栈	Last In First Out
heap	tas	堆	Tree-based data structure. In a heap, the highest (or lowest) priority element is always stored at the root.

Pytorch Tensor type casting

Posted on 2019-03-31 | In Python

| Words count in article 123

In Python3 & PyTorch 1.0.0,

torch.LongTensor and torch.cuda.LongTensor means int64.

HalfTensor : float16
FloatTensor : float32
DoubleTensor : float64
ByteTensor : uint8(unsigned)
CharTensor : int8(signed)
ShortTensor : int16(signed)
IntTensor : int32(signed)
LongTensor : int64(signed)

One example of conversion from LongTensor to FloatTensor:

a = torch.LongTensor(22)
a = a.float()

Attention:

a = torch.LongTensor(22)
b = 100. * a

b.type() equals LongTensor. The implicit type casting did not work because type(a) is torch.Tensor instead of Python raw numbers or Numpy array.
The solution is as follows:

a = torch.LongTensor(22)
b = 100 * a.float()

Here, b.type() equals FloatTensor.

Do not forget to reset index of pandas DataFrame

Posted on 2019-03-03 | In Python

| Words count in article 223

Don’t forget to reset indices of pandas DataFrame after slicing operations. Otherwise, there might be key errors later.

Let’s say

Xtr: pandas DataFrame 6000 rows × 2 columns

Xtrain = Xtr.loc[:5000 - 1, :]
Xval = Xtr.loc[5000:, :]
Ytrain = Ytr.loc[:5000 - 1, :]
Yval = Ytr.loc[5000:, :]

Xtrain.loc[0, Xtrain.columns[1]]

Out: 'GGAGAATCATTTGAACCCGGGAGGTGGAGGTTGCCGTGAGCTGAGATTGCGCCATTGCACTCCAGCCTGGGCAACAAGAGCAAAACTCTGTCTCACAAAAC'

Xval.loc[0, Xval.columns[1]]

Out: KeyError: 0

The correct way is the following:

Xtrain = Xtr.loc[:5000 - 1, :]
Xval = Xtr.loc[5000:, :]
Ytrain = Ytr.loc[:5000 - 1, :]
Yval = Ytr.loc[5000:, :]

Xtrain.reset_index(drop=True, inplace=True)
Xval.reset_index(drop=True, inplace=True)
Ytrain.reset_index(drop=True, inplace=True)
Yval.reset_index(drop=True, inplace=True)

Xval.loc[0, Xval.columns[1]]

Out: 'ACCACCGTCATCTATGTGGCCGTCGTGGGCCAACACATTGGTGTGAAGCACATGGCCGACCACGTGTTGTATCTGAATCTAGGTCGACCCACTGTGCGATT'

The reason is that .loc is label-based indexer, 0 is not interpreted as the location 0 but the label 0. However, after slicing, the index of each row remains the same.

When using .loc with slices, both the start point and end point are included.

When using .iloc with slices, the start point is included, while the end point is excluded (like standard Python slicing convention).

How to deal with data type of lists/numpy array in Python

Posted on 2019-03-02 | In Python

| Words count in article 549

Python中，一个list可以包含不同类型的object。例如:

a = [0, 10, -1.]
type(a[0]) : <class 'int'>
type(a[1]) : <class 'int'>
type(a[2]) : <class 'float'>

但是需要注意的是，当我们试图将list通过np.array(), np.zeros_like(), etc.转换为numpy array的时候，如果list中只包含int，则转换得到的numpy array的dtype将也是int64。如果list中包含至少一个float，则得到的numpy array的dtype将是float64。

For example:

a = [0, 10, -1]
type(a[0]) : <class 'int'>
type(a[1]) : <class 'int'>
type(a[2]) : <class 'int'>

np.array(a).dtype : int64

a = [0, 10, -1.]
type(a[0]) : <class 'int'>
type(a[1]) : <class 'int'>
type(a[2]) : <class 'float'>

np.array(a).dtype : float64

实例

我是通过一下这段代码发现的这个问题:

def sigmoid(x):
    positive = x >= 0
    negative = x < 0
    xx = 1 / (1 + np.exp(- x[positive]))
    x[positive] = 1 / (1 + np.exp(- x[positive]))
    z = np.exp(x[negative])
    x[negative] = z / (z + 1)
    return x

def s1(x):
    return 1 / (1 + np.exp(- x))
def s2(x):
    return np.exp(x) / (np.exp(x) + 1)

a = [0, 10, -1]
b = sigmoid(np.array(a))
c = np.zeros_like(a)
d = np.zeros_like(a)

print(b.dtype)
print(c.dtype)
print(d.dtype)

for i, el in enumerate(a):
    c[i] = s1(el)
    d[i] = s2(el)
    
print("b =", b)
print("c =", c)
print("d =", d)
print("b - c", b - c)
print("b - d", b - d)


int64
int64
int64
b = [0 0 0]
c = [0 0 0]
d = [0 0 0]
b - c [0 0 0]
b - d [0 0 0]

a = [0, 10, -1.]
b = sigmoid(np.array(a))
c = np.zeros_like(a)
d = np.zeros_like(a)

print(b.dtype)
print(c.dtype)
print(d.dtype)

for i, el in enumerate(a):
    c[i] = s1(el)
    d[i] = s2(el)
    
print("b =", b)
print("c =", c)
print("d =", d)
print("b - c", b - c)
print("b - d", b - d)


float64
float64
float64
b = [0.5        0.9999546  0.26894142]
c = [0.5        0.9999546  0.26894142]
d = [0.5        0.9999546  0.26894142]
b - c [0. 0. 0.]
b - d [0. 0. 0.]

Sun Haozhe

GitHub IO blogs

GitHub