先说结论,这段代码说明GO语言的运行时和标准库在软件工程界是一个极高的水平,然后这么一段小程序:
var c = make(chan bool) var d = make(chan bool) func get(b []byte) { close(c) //告诉另一个goroutine,我的实参已经传进来了,可以去改了 <-d //等待另一个goroutine改完 fmt.Println(b[3]) } func main(){ b := []byte {1,2,3,4,5,6,7,8} go func(b []byte) { <-c //等待main函数所在的goroutine执行get传入参数 b[3] = 8 close(d) //告诉main函数所在的goroutine我已经修改完毕 fmt.Println(b[3]) } (b) get(b[:4]) fmt.Println("Hello World") }
这份代码的目的是说明,类似b[s:e]这种传入参数,其实没有经过拷贝,原因就是go的[]byte实际是一个 这样的结构体:
type sliceHeader struct { Data unsafe.Pointer Len int Cap int }
一个指针指向真正的数据,两个int说明切片内的元素数量和切片的容量。所以传递[]byte实质上只是传了一个 指针,2个int,24字节的数据。为什么要说这个呢,因为在go的标准库中大量使用这种语法:
// NewScanner returns a new Scanner to read from r. // The split function defaults to ScanLines. func NewScanner(r io.Reader) *Scanner { return &Scanner{ r: r, split: ScanLines, maxTokenSize: MaxScanTokenSize, } }
type Scanner struct { r io.Reader // The reader provided by the client. split SplitFunc // The function to split the tokens. maxTokenSize int // Maximum size of a token; modified by tests. token []byte // Last token returned by split. buf []byte // Buffer used as argument to split. start int // First non-processed byte in buf. end int // End of data in buf. err error // Sticky error. empties int // Count of successive empty tokens. scanCalled bool // Scan has been called; buffer is in use. done bool // Scan has finished. }
可以看到bufio.Scanner这个结构,他的buf和他的token都是[]byte,而且标准库的内部实现中 都是无拷贝的操作。当然go的实现者不会像我们这些新手一样,写个Scan写一堆拷贝,比如:
class Scanner { private: std::string buff; std::string token; public: bool GetLine() { // ...定位换行,处理CRLF和LF token = buff.substr(s, e); // ... } };
这样就发生了拷贝。那么到map这里,原文中的代码:
counts[input.Text()]++
input.Text()函数的原型是:
// Text returns the most recent token generated by a call to Scan // as a newly allocated string holding its bytes. func (s *Scanner) Text() string { return string(s.token) }
而string在go语言里面是纯粹的数组,[]byte到string的转换是运行时做的:
// Buf is a fixed-size buffer for the result, // it is not nil if the result does not escape. func slicebytetostring(buf *tmpBuf, b []byte) string { l := len(b) if l == 0 { // Turns out to be a relatively common case. // Consider that you want to parse out data between parens in "foo()bar", // you find the indices and convert the subslice to string. return "" } if raceenabled && l > 0 { racereadrangepc(unsafe.Pointer(&b[0]), uintptr(l), getcallerpc(unsafe.Pointer(&buf)), funcPC(slicebytetostring)) } if msanenabled && l > 0 { msanread(unsafe.Pointer(&b[0]), uintptr(l)) } s, c := rawstringtmp(buf, l) copy(c, b) return s }
这里看出一个问题:string的转化是有复制代价的,那么每次往map中送数据的时候使用string是不是进行了好多次 拷贝呢?其实编译器没有那么傻,编译器有下面这个函数:
func slicebytetostringtmp(b []byte) string { // Return a "string" referring to the actual []byte bytes. // This is only for use by internal compiler optimizations // that know that the string form will be discarded before // the calling goroutine could possibly modify the original // slice or synchronize with another goroutine. // First such case is a m[string(k)] lookup where // m is a string-keyed map and k is a []byte. // Second such case is "<"+string(b)+">" concatenation where b is []byte. // Third such case is string(b)=="foo" comparison where b is []byte. if raceenabled && len(b) > 0 { racereadrangepc(unsafe.Pointer(&b[0]), uintptr(len(b)), getcallerpc(unsafe.Pointer(&b)), funcPC(slicebytetostringtmp)) } if msanenabled && len(b) > 0 { msanread(unsafe.Pointer(&b[0]), uintptr(len(b))) } return *(*string)(unsafe.Pointer(&b)) }
可以避免临时用的string也要复制的尴尬。那么再说最后的for循环输出,如果使用C++就得像@陈硕所说,使用 引用,避免无用的复制,但是go里面,string本来就是一个类似数组指针的东西,也就直接可以像原文那样循环了。
综上可见,使用GO可以很容易写出比较优秀的代码,得益于强大的运行时和编译器。声明,以上所述标准库和运行时 代码均基于go1.7.1