Strings in WebAssembly

栏目: IT技术 · 发布时间: 4年前

Strings in WebAssembly (Wasm)

Strings in WebAssembly

Feb 4 ·16min read

Strings in WebAssembly

The importance of strings

C omputer programs can execute successfully, using only numbers. However, in order to facilitate human-computer interaction, human readable characters and words are required. This is especially the case, when we consider how humans interact with applications on the Web. One strong example of this is the fact that humans choose to use domain names, rather than numerical IP addresses, when they visit sites on the Web.

As the title of this article proclaims, we are going to talk about strings in WebAssembly (Wasm). Wasm is one of the most exciting computer programming technologies that we have seen in recent times. Wasm is a machine-close, platform-independent, low-level, assembly-like language (Reiser and Bläser, 2017), and is the first mainstream programming language to implement formal semantics, right from the start (Rossberg et al., 2018).

Strings in WebAssembly (Wasm)

Strings in WebAssembly

Strings in WebAssembly

Interestingly, there are no native strings in WebAssembly code. More specifically, Wasm does not have a string data type .

Wasm’s MVP (which will support only wasm32) has an ILP32 data model, and currently offers the following 4 data types:

  • i32, a 32-bit integer (equivalent to C++’s signed long int)
  • i64, a 64-bit integer (equivalent to C++’s signed long long int)
  • f32, 32-bit float (equivalent to C++’s float)
  • f64, 64-bit float (equivalent to C++’s double)

Whilst we will start to talk about the use of Wasm in the browser soon, it is important to always remember that fundamentally Wasm execution is defined in terms of a stack machine. The basic idea is that every type of instruction pushes and/or pops a certain number of i32, i64, f32, f64 values to and/or from the stack (MDN Web Docs — Understanding WebAssembly text format, 2020).

As we can see, the four data types above all pertain to numbers. So how do we facilitate strings in WebAssembly (Wasm), given that this is the case?

Strings in WebAssembly — how to?

N ow, it is possible to turn a high-level value (such as a string) into a set of numbers. If this is achieved, then we could pass these sets of numbers (which represent strings) back and forth between our functions. However, there are couple of issue with this.

Having this constant explicit encoding/decoding overhead is cumbersome for general high-level coding, so this is not a great long term solution. In addition, it turns out that, this approach can not actually be achieved in Wasm, at present. The reason being, whilst Wasm functions can accept many values (as arguments) into a function, Wasm functions can only return one value, at present. There is a lot of information coming up about Wasm. For now, let’s cover off some fundamentals by looking at how strings work in Rust.

Strings in Rust

The String

A String in Rust can be thought of as a Vec<u8> that is guaranteed to hold well-formed UTF-8 (Blandy and Orendorff, 2017).

The &str

A &str in Rust is a reference to a run of UTF-8 text owned by someone else; &str is a fat pointer, containing both the address of the actual data and its length. You can think of &str as being nothing more than a &[u8] that is guaranteed to hold well-formed UTF-8 (Blandy and Orendorff, 2017).

Strings at compile time — stored in the executable

A string literal is a &str that refers to preallocated text, typically stored in read-only memory, along with the programs machine code; bytes are created when the program begins execution, and last until the program ends. It is therefore impossible to modify a &str (Blandy and Orendorff, 2017).

&str can refer to any slice of any string and therefore it is appropriate to use &str as part of function arguments; allowing the caller to pass in either String or &str (Klabnik and Nichols, 2019). Like this.

fn my_function(the_string: &str) -> &str {
    // code ...
}

Strings at runtime — allocated and freed at runtime

New strings can be created at runtime, using String . A string literal can be converted to a String using the following methods. The to_string() and String::from do the same thing, so which you choose is a matter of style (Klabnik and Nichols, 2019).

let s = "the string literal".to_string();
let s = String::from("the string literal");

Converting strings to numbers

The following Rust code takes the string hello and converts it to bytes, and then prints the two versions, of the string, to the terminal.

fn main() {
    let s: String = String::from("hello");
    println!("String: {:?}", &s);
    println!("Bytes: {:?}", &s.as_bytes());
}

Output

String: "hello"
Bytes: [104, 101, 108, 108, 111]

Wasm “Hello World!” example

Given all of this information, how would we write a “Hello World!” application in Wasm, for the Web? For example, how would we pass strings back and forth between the user’s interface and the Wasm execution environment?

“Here’s the big crux … WebAssembly needs to play well with JavaScript …we need to work with and pass JavaScript objects into WebAssembly, but WebAssembly doesn’t support that at all. Currently, WebAssembly only supports integers and floats” (Williams, 2019).

Shoehorning JavaScript objects into u32 for Wasm use, is going to take a bit of grappling.

Strings in WebAssembly

Wrestling pictorial that looks surprisingly like a Crustacean . Coincidence? I think not.

Bindgen

Wasm-bindgen is a build-time dependancy for Rust. It is able to generate Rust and JavaScript code at compile time. It can also be used as an executable, called bindgen in the command line . Essentially, the wasm-bindgen tool allows JavaScript and Wasm to communicate high-level JavaScript objects like strings . As opposed to exclusively communicating number data types (Rustwasm.github.io, 2019).

How is this achieved?

Memory

“The main storage of a WebAssembly program is a large array of raw bytes, the linear memory or simply memory (Rossberg et al., 2018).

The wasm-bindgen tool abstracts away linear memory, and allows the use of native data structures between Rust and JavaScript (Wasm By Example, 2019). The current strategy is for wasm-bindgen to maintain a “heap”. This “heap” is a module-local variable which is created by wasm-bindgen, inside a wasm-bindgen-generated JavaScript file.

This next bit might seem a little confusing, just hang in there. It turns out that the first slots in this “heap” is considered a stack. This stack, like typical program execution stacks, grows down.

Temporary JS objects on the “stack”

Short-term JavaScript objects are pushed on to the stack, and their indices (position in the stack, and length) are passed to Wasm. A stack pointer is maintained to figure out where the next item is pushed (GitHub — RustWasm , 2020).

Removal is simply storing undefined/null. Because of the “stack-y” nature of this scheme it only works for when Wasm doesn’t hold onto a JavaScript object (GitHub — RustWasm , 2020).

JsValue

The Rust codebase of the wasm-bindgen library, itself, uses a special JsValue. A hand-written exported function, like the one pictured below, can take a reference to this special JsValue.

#[wasm_bindgen]
pub fn foo(a: &JsValue) {
    // ...
}

wasm-bindgen generated Rust

The Rust code that #[wasm_bindgen] generates, in relation to the hand-written Rust above, looks something like this.

#[export_name = "foo"] 
pub extern "C" fn __wasm_bindgen_generated_foo(arg0: u32) {
    let arg0 = unsafe {
        ManuallyDrop::new(JsValue::__from_idx(arg0))
    };
    let arg0 = &*arg0;
    foo(arg0);
}

Whilst the externally callable identifier is still known as foo . When called, the internal code of the wasm_bindgen- generated Rust function known as __wasm_bindgen_generated_foo is actually what is exported from the Wasm module. The wasm_bindgen- generated function takes an integer argument and wraps it in a JsValue .

It is important to remember that because of Rust’s ownership qualities, the reference to JsValue can not persist past the lifetime of the function call. Therefore the wasm-bindgen- generated Javascript needs to free the stack slot which was created as part of this function’s execution. Let’s look at the generated Javascript next.

wasm-bindgen generated JavaScript

// foo.js
import * as wasm from './foo_bg';const heap = new Array(32).fill(undefined);
heap.push(undefined, null, true, false);
let stack_pointer = 32;function addBorrowedObject(obj) {
  stack_pointer -= 1;
  heap[stack_pointer] = obj;
  return stack_pointer;
}export function foo(arg0) {
  const idx0 = addBorrowedObject(arg0);
  try {
    wasm.foo(idx0);
  } finally {
    heap[stack_pointer++] = undefined;
  }
}

The heap

As we can see the JavaScript file imports from the Wasm file.

Then we can see the aforementioned “heap” module-local variable is created. It is important to remember that this JavaScript is being generated by Rust code. If you would like to see how this is done, see line 747 in this mod.rs file . I have provided a snippet of the Rust, code that generates JavaScript, code below.

self.global(&format!("const heap = new Array({}).fill(undefined);", INITIAL_HEAP_OFFSET));

The INITIAL_HEAP_OFFSET is hard coded to 32 in the Rust file . So the array has 32 items by default.

Once created, in Javascript, this heap variable will store all of the JavaScript values that are reference-able from Wasm, at execution time.

If we look again, at the generated JavaScript, we can see that the exported function called foo , takes an arbitrary argument, arg0 . The foo function calls the addBorrowedObject (passing into it arg0 ). The addBorrowedObject function decrements the stack_pointer position by 1 (was 32, now 31) and then stores the object to that position, whilst also returning that specific position to the calling foo function.

The stack position is stored as a const called idx0. Then idx0 is passed to the wasm_bindgen- generated Wasm so that Wasm can operate with it (GitHub — RustWasm , 2020).

As we mentioned, we are still talking about Temporary JS objects on the “stack”. If we look at the last text line of generated JavaScript code we will see that the heap at the stack_pointer position is set to undefined, and then automatically (thanks to the ++ syntax) the stack pointer variable is incremented back to its original value.

So far, we have covered objects that are only temporarily used i.e. only live during one function call. Let’s look at long-lived JS objects next.

Long-lived JS objects

Here we will talk about the second half of management of JavaScript objects, again referencing the official bindgen documentation (Rustwasm.github.io, 2019).

The strict push/pop of the stack won’t work for long-lived JavaScript objects, so we need a more permanent storage mechanism.

If we look back at our original hand-written foo function example, we can see that a slight change will alter the ownership and therefore lifetime of the JsValue. Specifically, by removing the & (in our hand-written Rust) we are making the foo function take full ownership of the object, as apposed to just borrowing a reference.

// foo.rs
#[wasm_bindgen]
pub fn foo(a: JsValue) {
    // ...
}

Now, in the generated Rust, we are calling addHeapObject , instead of addBorrowedObject .

import * as wasm from './foo_bg'; // imports from wasm fileconst heap = new Array(32);
heap.push(undefined, null, true, false);
let heap_next = 36;function addHeapObject(obj) {
  if (heap_next === heap.length)
    heap.push(heap.length + 1);
  const idx = heap_next;
  heap_next = heap[idx];
  heap[idx] = obj;
  return idx;
}

The addHeapObject , uses the heap and heap_next functions to acquire a slot to store the object.

Now that we have a general understanding, using JsValue object, let’s focus specifically on strings.

Strings are passed to wasm via two arguments, a pointer and a length (GitHub — RustWasm , 2020).

The string is encoded using the TextEncoder API and then copied onto the Wasm heap. Here is a quick example of encoding a string to an array of numbers using the TextEncoder API. You can try this yourself in your browser console.

const encoder = new TextEncoder();
const encoded = encoder.encode('Tim');
encoded
// Uint8Array(3) [84, 105, 109]

Passing indices (pointer and length) instead of passing whole high level objects makes sense. As we mentioned at the start of this article, we are able to pass many values into a Wasm function, but are only allowed to return one value. So how do we return the pointer and length from a Wasm function?

There is currently an open issue on the WebAssembly GitHub, which is working on implementing and standardising multiple return values for Wasm functions.

In the meantime exporting a function that returns a string, requires a shim for both languages which are involved. In this case JavaScript and Rust both need to agree on how each side will translate to and from Wasm (in their own respective language).

The wasm-bindgen tool manages hooking up all these shims while the #[wasm_bindgen] macro takes care of the Rust shim as well (GitHub — RustWasm , 2020).

This innovation has solved the strings in WebAssembly problem, for the Web, in a very clever way. This immediately opens the door for countless Web applications to now leverage Wasm’s standout features. As development continues i.e. the formalisation of the multi-value proposal, the functionality of Wasm in and out of the browser will improve dramatically.

Let’s take a look at a couple of concrete examples of using strings in WebAssembly. These are working examples that you can try out for yourself.

Concrete examples

As the bindgen documentation says. “With the addition of wasm-pack you can run the gamut from running Rust on the web locally, publishing it as part of a larger application, or even publishing Rust-compiled-to-WebAssembly on NPM!”

Wasm-pack

Strings in WebAssembly

https://rustwasm.github.io/wasm-pack/

Wasm-pack is a brilliant and easy to use Wasm workflow tool.

Wasm-pack uses wasm-bindgen under the hood. In a nutshell, wasm-pack generates both Rust code and JavaScript code while compiling to WebAssembly. Wasm-pack allows you to talk to your WebAssembly (via JavaScript) as if it were JavaScript (Williams, 2019). Wasm-pack compiles your code using the wasm32-unknown-unknown target.

Wasm-pack (client side — web)

Here is an example of how wasm-pack facilitates string concatenation using Wasm on the web.

If we spin up an Ubuntu Linux system and perform the following, we can get on with building this demo in a few minutes.

#System housekeeping
sudo apt-get update
sudo apt-get -y upgrade
sudo apt install build-essential
#Install apache
sudo apt-get -y install apache2
sudo chown -R $USER:$USER /var/www/html
sudo systemctl start apache2
#Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
#Install wasm-pack
curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh

Once the system is set up we can create a new project in Rust

cd ~
cargo new --lib greet
cd greet

We then perform some Rust config as shown below (open the Cargo.toml file and add the following to the bottom of the file)

[lib]
name = "greet_lib"
path = "src/lib.rs"
crate-type =["cdylib"][dependencies]
wasm-bindgen = "0.2.50"

We then write the following Rust code

use wasm_bindgen::prelude::*;#[wasm_bindgen]
extern {
    fn alert(s: &str);
}#[wasm_bindgen]
pub fn greet(name: &str) {
    alert(&format!("Hello, {}!", name));
}

Finally we build the program using wasm-pack

wasm-pack build --target web

Once the code is compiled, we just need to create a HTML file to interact with and then copy the HTML, along with the contents of wasm-pack 's pkg directory over to where we are serving Apache2.

Create the following index.html file in the ~/greet/pkg directory.

<html><head>
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous" />
<script type="module">import init, { greet } from './greet_lib.js';async function run() {await init();var buttonOne = document.getElementById('buttonOne');buttonOne.addEventListener('click', function() {var input = $("#nameInput").val();alert(greet(input));}, false);}run();</script>
</head>
<body>
<div>
<div></div>
<div><b>Wasm - Say hello</b></div>
<div></div>
</div>
<hr />
<div>
<div></div>
<div>What is your name?</div>
<div> Click the button</div>
<div></div>
</div>
<div>
<div></div>
<div>
<input type="text" id="nameInput" placeholder="1" , value="1">
</div>
<div>
<button id="buttonOne">Say hello</button>
</div>
<div></div>
</div>
</body>
<scriptsrc="https://code.jquery.com/jquery-3.4.1.js" integrity="sha256-WpOohJOqMqqyKL9FccASB9O0KwACQJpFTUBLTYOVvVU=" crossorigin="anonymous">
</script>
</html>

Copy the contents of the pkg directory to where we are running Apache2

cp -rp pkg/* /var/www/html/

If we go to the address of the server we are greeted with the following page.

Strings in WebAssembly

When we add our name and click the button, we get the following response.

Strings in WebAssembly

Wasm-pack (server side — Node.js)

Now that we have seen this in action using HTML/JS and Apache2, let’s go ahead and create another demonstration. This time in the context of Node.js, following wasm-pack’s npm-browser-packages documentation .

sudo apt-get update
sudo apt-get -y upgrade
sudo apt-get -y install build-essential
sudo apt-get -y install curl
#Install Node and NPM
curl -sL https://deb.nodesource.com/setup_13.x | sudo -E bash -
sudo apt-get install -y nodejs
sudo apt-get install npm
#Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
#Install wasm-pack
curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf |
sudo apt-get install pkg-config
sudo apt-get install libssl-dev
cargo install cargo-generate
cargo generate --git https://github.com/rustwasm/wasm-pack-template

Just out of interest, the Rust code for this demonstration (which is generated by the official demonstration software) is as follows.

mod utils;
use wasm_bindgen::prelude::*;// When the `wee_alloc` feature is enabled, use `wee_alloc` as the global
// allocator.
#[cfg(feature = "wee_alloc")]
#[global_allocator]
static ALLOC: wee_alloc::WeeAlloc = wee_alloc::WeeAlloc::INIT;#[wasm_bindgen]
extern {
fn alert(s: &str);
}#[wasm_bindgen]
pub fn greet() {
    alert("Hello, tpmccallum-greet!");
}

You can build the project using the following command (the last argument is your npmjs.com username)

wasm-pack build --scope tpmccallum

To log into your npm account, via wasm-pack simply type the following command

wasm-pack login

To publish, just change into the pkg directory and run the following command

cd pkg
npm publish --access=public

Ok, so we have published a package. Let’s now go ahead and create a new application that we can use our package in. (Please note, we are using a template, so don’t make up your own app name for the following command, instead use the create-wasm-app text as shown below).

cd ~
npm init wasm-app create-wasm-app

At this stage we want to install the package, from npmjs.com. We use the following command to achieve this

npm i @tpmccallum/tpmccallum-greet

Almost there … now open the index.js and import our package, by name, like this

import * as wasm from "tpmccallum-greet";
  
wasm.greet();

Then finally, start the demo and visit localhost:8080

npm install
npm start
Strings in WebAssembly

Wider applications for Wasm

It is anticipated that “WebAssembly will find a wide range of uses in other domains. In fact, multiple other embeddings are already being developed: for sandboxing in content delivery networks, for smart contracts or decentralised cloud computing on blockchains, as code formats for mobile devices, and even as mere stand-alone engines for providing portable language runtimes” (Rossberg et al., 2018).

There is a strong chance that the MutiValue proposal which is explained in detail here , will eventually allow a Wasm function to return many values, and in turn, facilitate the implementation of a new set of interface types.

There is in-fact a proposal which, as explained here , adds a new set of interface types to WebAssembly that describe high-level values (like strings, sequences, records and variants). This new approach may achieve this without committing to a single memory representation or sharing scheme. With this approach, interface types would only be used in the interfaces of modules and would only be produced or consumed by declarative interface adapters.

The proposal indicates that it is semantically layered on top of the WebAssembly core spec (extended with the multi-value and reference types proposals). All adaptations are specified in a custom section and can be polyfilled using the JavaScript API .

Shoutout to WebAssembly Summit

On the February 10, 2020 a WebAssembly Summit will be held in Mountain View CA.

The summit, is a community event, organised by individual people in the WebAssembly community, and at the time of writing sponsored by Google and Mozilla.

The event is a one day, single track, conference about all things WebAssembly. It will be live streamed .

You can view the list of speakers and the schedule for more information; hopefully you can find some time to virtually join in with the community and watch the event live.

If you are reading this article after the summit event, please go ahead and check out their YouTube channel . The event is preserved and is available for playback.

I hope that you have enjoyed this article. Please ask any questions in the comments section or reach out via Twitter .

References

Blandy, J. and Orendorff, J. (2017). Programming Rust . O’Reilly Media Inc.

GitHub — WebAssembly. (2020). WebAssembly/interface-types . [online] Available at: https://github.com/WebAssembly/interface-types/blob/master/proposals/interface-types/Explainer.md

GitHub — RustWasm. (2020). rustwasm/wasm-bindgen. [online] Available at: https://github.com/rustwasm/wasm-bindgen/blob/master/guide/src/contributing/design/js-objects-in-rust.md

Haas, A., Rossberg, A., Schuff, D.L., Titzer, B.L., Holman, M., Gohman, D., Wagner, L., Zakai, A. and Bastien, J.F., 2017, June. Bringing the web up to speed with WebAssembly. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (pp. 185–200).

Klabnik, S. and Nichols, C. (2019). The Rust Programming Language (Covers Rust 2018) . San Francisco: No Starch Press Inc.

MDN Web Docs — Understanding WebAssembly text format. (2020). Understanding WebAssembly text format . [online] Available at: https://developer.mozilla.org/en-US/docs/WebAssembly/Understanding_the_text_format

MDN Web Docs — Web APIs. (2020). Web APIs . [online] Available at: https://developer.mozilla.org/en-US/docs/Web/API

Reiser, M. and Bläser, L., 2017, October. Accelerate JavaScript applications by cross-compiling to WebAssembly. In Proceedings of the 9th ACM SIGPLAN International Workshop on Virtual Machines and Intermediate Languages (pp. 10–17). ACM.

Rossberg, A., Titzer, B., Haas, A., Schuff, D., Gohman, D., Wagner, L., Zakai, A., Bastien, J. and Holman, M. (2018). Bringing the web up to speed with WebAssembly. Communications of the ACM, 61(12), pp.107–115.

Rustwasm.github.io. (2019). Introduction — The `wasm-bindgen` Guide . [online] Available at: https://rustwasm.github.io/docs/wasm-bindgen/ [Accessed 27 Jan. 2020].

Wasm By Example. (2019). WebAssembly Linear Memory . [online] Available at: https://wasmbyexample.dev/examples/webassembly-linear-memory/webassembly-linear-memory.rust.en-us.html

Williams, A. (2019). Rust, WebAssembly, and Javascript Make Three: An FFI Story . [online] infoq. Available at: https://www.infoq.com/presentations/rust-webassembly-javascript/

Author

Timothy McCallum


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

当下的冲击

当下的冲击

道格拉斯•洛西科夫 (Douglas Rushkoff) / 孙浩 赵晖 / 中信出版社 / 2013-10-1 / 59.00元

这是一个并不符合人本能的社会…… 为什么我们不应该更注重生活的质量而非速度? 为什么我们不用面对面的交流代替冷冰冰电脑屏幕上的文字代码? 为什么我们不可以选择一个虽然有缺陷但有血有肉的人类社会,而非一个虽趋于完美但冷漠的数字世界? 在当下的冲击面前,你正变得越来越弱智:你没有了自己的独特空间,你过多地相信真人秀节目,你成了数字化产品的奴隶并得了数字化精神病,你的生物钟也被打......一起来看看 《当下的冲击》 这本书的介绍吧!

CSS 压缩/解压工具
CSS 压缩/解压工具

在线压缩/解压 CSS 代码

在线进制转换器
在线进制转换器

各进制数互转换器

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换