1024programmer Java Briefly talk about strings and byte arrays in Golang

Briefly talk about strings and byte arrays in Golang

Foreword

String is one of the most commonly used basic data types in the Go language. Although a string is often regarded as a whole, in fact a string is a continuous memory space. We can also understand it as a An array composed of characters. Another type in the Go language that is very closely related to strings is Byte. I believe all readers are familiar with it, so I will not introduce it here.

In this section, we will introduce the implementation principles of these two basic types and their conversion relationships in detail, but the focus of the introduction here will still be mainly on strings, because this is the one we have the most contact with. A basic type and the latter is a simple uint8 type, so string will be given the largest space. It should be noted that this article will not use a lot of space to introduce UTD-8 and encoding knowledge. The main focus is on strings. Structure and implementation of common operations.

Although string is the basic type string in Go language, it is actually an array of characters. Strings in C language can be represented by char[]. As an array, it will occupy a continuous memory. Space, this continuous memory space stores some bytes, which together form a string. A string in the Go language is a read-only byte array slice. The following is a read-only “hello” character. The structure of the string in memory:

If it is a string that exists in the code, it will be marked as a read-only data SRODATA symbol during compilation. Suppose we have the following piece of code, which contains a string. When we compile this code into assembly language , you can see that the hello string has a SRODATA tag:

 $ cat main.go
 package main

 func main() {
  str := "hello"
  println([]byte(str))
 }

 $ GOOS=linux GOARCH=amd64 go tool compile -S main.go
 ...
 go.string."hello" SRODATA dupok size=5
  0x0000 68 65 6c 6c 6f hello
 ...

However, this only shows that the string that exists during compilation will be directly allocated to the read-only memory space and this memory will not be changed. However, we can actually copy this memory to other heaps at runtime. Or on the stack, change the type of the variable to []byte and then change it to string through type conversion. However, if you want to directly modify the memory space of a string type variable, the Go language does not support this operation.

In addition to today’s protagonist string, the other supporting character byte also needs a brief introduction. byte is actually very easy to understand. Each byte is 8 bits. I believe that people who have a little knowledge of programming should be familiar with it. The concept is self-explanatory, and byte arrays don’t have much to introduce, so I’ll skip them here.

The interface of strings in the Go language is actually very simple. Each string will be represented by the following StringHeader structure at runtime. There is actually a private structure stringHeader inside the runtime package, which has exactly the same The structure is only used to store data. The Data field uses the unsafe.Pointer type:

 type StringHeader struct {
  Data uintptr
  Len int
 }

Why do we say that a string is actually a read-only slice? We can take a look at the runtime representation of slices in the Go language:

 typeSliceHeader struct {
  Data uintptr
  Len int
  Cap int
 }

This structure representing slices, SliceHeader, is very similar to the structure of strings, StringHeader. Compared with the structure of slices, strings lack a Cap field representing capacity. This is because strings are read-only types and we do not It will directly append elements to the string to change its own memory space. All append operations are completed through copying.

The parsing of strings must be completed by the parser during lexical analysis. The lexical analysis stage will slice and group the strings in the source file, and convert the original meaningless character stream into a Token sequence. In Go In the language, there are two literal ways to declare a string, one is to use double quotes, the other is to use backticks:

 str1 := "this is a string"
 str2 := `this is another
 string`

Strings declared using double quotes are not much different from strings in other languages. It can only be used for simple, single-line strings and if double quotes appear inside the string, you need to use the \ symbol to avoid compilation. parsing error of the processor, and the string declared with backticks can get rid of the single-line restriction, because double quotes no longer mark the beginning and end of the string, we can use ” directly inside the string, when encountering the need to write JSON or other Very convenient in data format scenarios.

The two different declaration methods actually mean that the Go language compiler needs to be able to distinguish and correctly parse these two different string formats during the parsing stage. The scanner scanner used to parse the stringaOnStack(a[idx])) {
return a[idx]
}
s, b := rawstringtmp(buf, l)
for _, x := range a {
copy(b, x)
b = b[len(x):]
}
return s
}

If the number of non-empty strings is 1 and the current string is not on the stack or has not escaped from the call stack, then the string can be returned directly without any time-consuming operations.

But under normal circumstances, the original multiple strings will be called copy to copy all the strings to the memory space where the target string is located. The new string is actually a new memory space, different from the original Strings have no association whatsoever.

Type conversion

When we use Go language to do some parsing and serialization of data formats such as JSON, we may often convert these variables back and forth between strings and byte arrays. The cost of conversion between types is not as expensive as imagined. Small, we often see functions such as slicebytetostring appearing in flame graphs. This function is the function used to convert a byte array into a string. That is, an operation similar to string (bytes) will be converted into slicebytetostring during compilation. Function call, this function will first handle two common situations in the function body, that is, the situation where the byte length is 0 or 1:

 func slicebytetostring(buf *tmpBuf, b []byte) (str string) {
  l := len(b)
  if l == 0 {
  return ""
  }
  if l == 1 {
  stringStructOf(&str).str = unsafe.Pointer(&staticbytes[b[0]])
  stringStructOf(&str).len = 1
  return
  }

  var p unsafe.Pointer
  if buf != nil && len(b) <= len(buf) {
  p = unsafe.Pointer(buf)
  } else {
  p = mallocgc(uintptr(len(b)), nil, false)
  }
  stringStructOf(&str).str = p
  stringStructOf(&str).len = len(b)
  memmove(p, (*(*slice)(unsafe.Pointer(&b))).array, uintptr(len(b)))
  return
 }

After processing, it will be determined whether a memory space needs to be allocated for the new string based on the incoming buffer size. stringStructOf will convert the incoming string pointer into a stringStruct structure pointer, and then set the pointer held by the structure. str and string length len, and finally copy all bytes in the original byte array to the new memory space through memmove.

The conversion from string to byte array uses the stringtoslicebyte function. The implementation of this function is very simple:

 func stringtoslicebyte(buf *tmpBuf, s string) []byte {
  var b[]byte
  if buf != nil && len(s) <= len(buf) {
  *buf = tmpBuf{}
  b = buf[:len(s)]
  } else {
  b = rawbyteslice(len(s))
  }
  copy(b, s)
  return b
 }

It will use the incoming buffer or call rawbyteslice according to the length of the string to create a new byte slice. The copy keyword will copy the contents of the string to the new byte array.

Although the contents of strings and byte arrays are the same, the contents of strings are read-only. We cannot change the data stored in memory through subscripts or other forms, while the contents of byte slices can be Read and write, so no matter which type is converted to another, the contents need to be copied. The performance loss of memory copy will increase with the growth of the string array and byte length, so when doing this Be sure to pay attention to performance issues when type conversion.

String is a relatively simple data structure in the Go language. As a read-only data type, we cannot change its own structure, but we must pay attention to the performance bottleneck when doing type conversion operations. , when encountering scenarios that require extreme performance, you must try to reduce different types of conversions to avoid additional overhead.

Related articles

This work is licensed. When reprinting, please indicate the original link. When using the image, please retain all the content in the image. You can scale it appropriately and attach the link to the article where the image is located. Use Sketch to draw the image. If you have any questions about the content of this article, please leave a message in the comment system below, thank you.

Okay, that’s the entire content of this article. I hope the content of this article has certain reference and learning value for everyone’s study or work. Thank you for your support.

This article is from the internet and does not represent1024programmerPosition, please indicate the source when reprinting:https://www.1024programmer.com/784269

author: admin

Previous article
Next article

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Us

Contact us

181-3619-1160

Online consultation: QQ交谈

E-mail: [email protected]

Working hours: Monday to Friday, 9:00-17:30, holidays off

Follow wechat
Scan wechat and follow us

Scan wechat and follow us

Follow Weibo
Back to top
首页
微信
电话
搜索